This pattern provides a security control that ensures that Amazon EMR clusters are tagged when they are created.
Amazon EMR is an Amazon Web Services (AWS) service for processing and analyzing vast amounts of data. Amazon EMR offers an expandable, low-configuration service as an easier alternative to running in-house cluster computing. You can use tagging to categorize AWS resources in different ways, such as by purpose, owner, or environment . For example, you can tag your Amazon EMR clusters by assigning custom metadata to each cluster. A tag consists of a key and value that you define. We recommend that you create a consistent set of tags to meet your organization's requirements. When you add a tag to an Amazon EMR cluster, the tag is also propagated to each active Amazon Elastic Compute Cloud (Amazon EC2) instance that is associated with the cluster. Similarly, when you remove a tag from an Amazon EMR cluster, that tag is removed from each associated, active EC2 instance as well.
The detective control monitors API calls and initiates an Amazon CloudWatch Events event for the RunJobFlow, AddTags, RemoveTags, and CreateTags APIs. The event calls AWS Lambda, which runs a Python script. The Python function gets the Amazon EMR cluster ID from the JSON input from the event and performs the following checks:
Check if the Amazon EMR cluster is configured with tag names that you specify.
If not, send an Amazon Simple Notification Service (Amazon SNS) notification to the user with the relevant information: the Amazon EMR cluster name, violation details, AWS Region, AWS account, and Amazon Resource Name (ARN) for Lambda that this notification is sourced from.