This reference architecture demonstrates how to build a mechanism to monitor Amazon Elastic Compute Cloud (Amazon EC2) state changes for high performance computing (HPC) clusters across multiple AWS accounts. It includes a dashboard to help monitor the cluster status as well as each individual Amazon EC2 instance's status.
The following two use cases might use this architecture:
A customer organization has many separate divisions using separate AWS accounts deploying elastic Amazon EC2-based HPC clusters. A central IT admin group wants to monitor these resources in real time from a single centralized source to better manage workflows and to be aware of current resource use.
A third-party partner is managing HPC deployments in a customer account, but wants to help the customer meter usage, create budgets, send notifications, and improve overall visibility into their HPC use. The customers don’t want to share all logs and activities within an account with the partner, only the relevant HPC resources.