This Guidance helps customers use a Dask framework to perform input/output (I/O)-intensive workloads on high-volume data that is sparsely located across multiple AWS Regions. Instead of replicating data from its source Region to the user’s location, this Guidance uses the AWS global network to deploy a distributed computing architecture that strategically positions Dask workers as close as possible to the applicable dataset. Amazon FSx for Lustre rapidly loads and performs high I/O per second (IOPS) for scientists. To decouple the user experience from the underlying infrastructure, the architecture builds a metadata catalog through a self-managed OpenSearch domain using Amazon OpenSearch Service. This gives scientists full visibility into which datasets exist in FSx for Lustre in each of the worker Regions.