Linkedin

Monitor Amazon EMR clusters for in-transit encryption at launch

Project Overview

Project Detail

This pattern provides a security control that monitors Amazon EMR clusters at launch and sends an alert if in-transit encryption hasn't been enabled. 

Amazon EMR is a web service that makes it easy for you to run big data frameworks, such as Apache Hadoop, to process and analyze data. Amazon EMR enables you to process vast amounts of data in a cost-effective way by running mapping and reducing steps in parallel.

Data encryption prevents unauthorized users from accessing or reading data at rest or data in transit. Data at rest refers to data that is stored in media such as a local file system on each node, Hadoop Distributed File System (HDFS), or the EMR File System (EMRFS) through Amazon Simple Storage Service (Amazon S3). Data in transit refers to data that travels the network and is in flight between jobs. In-transit encryption supports open-source encryption features for Apache Spark, Apache TEZ, Apache Hadoop, Apache HBase, and Presto. You enable encryption by creating a security configuration from the AWS Command Line Interface (AWS CLI), the console, or AWS SDKs, and specifying the data encryption settings. You can provide the encryption artifacts for in-transit encryption in these two ways:

  • By uploading a compressed file of certificates to Amazon S3.

  • By referencing a custom Java class that provides encryption artifacts.

The security control that's included with this pattern monitors API calls and generates an Amazon CloudWatch Events event on the RunJobFlow action. The event calls an AWS Lambda function, which runs a Python script. The function gets the EMR cluster ID from the event JSON input, and performs the following checks to determine whether there's a security violation:

  • Checks if the EMR cluster has an Amazon EMR-specific security configuration.

  • If the cluster does have a security configuration, checks to see if encryption in transit is enabled.

  • If the cluster doesn't have a security configuration, sends an alert to an email address that you provide, by using Amazon Simple Notification Service (Amazon SNS). The notification specifies the EMR cluster name, violation details, AWS Region and account information, and the AWS Lambda ARN (Amazon Resource Name) that the notification is sourced from.

https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/monitor-amazon-emr-clusters-for-in-transit-encryption-at-launch.html?did=pg_card&trk=pg_card

To know more about this project connect with us

Monitor Amazon EMR clusters for in-transit encryption at launch