What is Azure Data Factory?

What is Azure Data Factory?

02-Jun-2021 15:08:34 pm

Azure Data Factory (ADF) is a fully managed, serverless data integration solution for mass ingesting, generating, and transforming all your data. It enables every organization in every industry to use it in a variety of use cases: data engineering, migrating their on-premises SSIS packages to Azure, operational data integration, analytics, data ingestion in a data warehouse , and many more.

If you have multiple SQL Server Integration Services (SSIS) packages for on-premises data integration, these SSIS packages run as they are in the Azure Data Factory (including custom SSIS components). This enables any developer to use Azure Data Factory for enterprise data integration needs.

Enterprise Connectors to any Data Stores

Azure Data Factory enables organizations to ingest data from a rich variety of data sources. Whether the data source is on-premises, multi-cloud, or provided by software-as-a-service (SaaS) providers, Azure Data Factory connects to them all at no additional licensing cost. Using the copy activity, you can copy data between different data stores.

Azure Data Factory enables you to reduce insight time by making it easy to connect to multiple business data sources, transform them at scale, and write processed data to the data store of choice. For example, you can use Azure Data Factory to connect to the following business applications available with Microsoft Dynamics 365 – Dynamics 365 for Marketing Sales, Customer Service, Field Service, and more.This enables Azure Data Factory to copy data from Microsoft Dynamics 365 and get the data in the right size where it is needed most to support critical business reporting. In addition to connecting to Microsoft Dynamics 365, Azure Data Factory supports a rich variety of advertising and marketing data sources: Salesforce, Marketo, Google AdWords, and more.

On-premises Data Access- For many organizations, there will be enterprise data sources that are on-premises. Azure Data Factory enables organizations to connect to these on-premises data sources using a self-hosted integration runtime (we'll cover the integration runtime concept in the next section).The Self-Hosted Integration Runtime enables organizations to move data between on-premises and cloud data sources, without requiring you to open an incoming network port. This makes it easy for anyone to install the runtime and enable hybrid cloud data integration.

Code-free Data Flow –Azure Data Factory enables any developer to accelerate the development of data transformation with a code-free data flow. Using ADF Studio, any developer can design data transformations without writing any code. To design a data flow in Azure Data Factory, you first specify the data sources you want to receive data from,And then you can apply a rich set of transformations to the data before writing it to the data store. Under the hood, Azure Data Factory largely drives these data flows for you using a Spark cluster. Whether it's working with megabytes of data (MB) to terabytes of data (TB), Azure Data Factory Spark drives data transformations at scale, without you needing to set up a Spark cluster, or tune it. In many ways, data transformation just works!

Secure Data Integration –Azure Data Factory supports secure data integration, connecting to private endpoints supported by various Azure data stores. To ease the burden of managing your own virtual network, Azure Data Factory manages the virtual network under the hood. This makes it easy for you to set up a data factory and ensure that all data integration occurs securely across the virtual network.

CI/CD Support –Azure Data Factory enables any developer to use it as part of a continuous integration and delivery (CI/CD) process. CI/CD with Azure Data Factory enables a developer to move data factory assets (pipelines, data flows, linked services, and more) from one environment (development, test, production) to another. Out of the box, Azure Data Factory provides native integration with Azure DevOps and GitHub.

Data Integration and Governance

Bringing together data integration and data governance enables organizations to gain tremendous insight into lineage, policy management, and more. Azure Data Factory integrates seamlessly with Azure Perview to provide powerful insight into ETL lineage, how data is moved through organization from different data stores, and more

For example, a data engineer might want to investigate a data issue where incorrect data has been inserted due to upstream issues. By using Azure Data Factory integration with Azure Purview, the data engineer can now identify the issue easily.

The nuts and bolts of Azure Data Factory

  • Triggers- Specify when the data pipeline runs. Different types of triggers are supported in Azure Data Factory. A scheduled trigger enables you to specify that the data pipeline runs at a specific time of day (including time zones).

      

You can also specify that the trigger fires based on a series of fixed-size, non-overlapping, contiguous time intervals using the data window trigger. A data window trigger may also depend on other data window triggers.

Storage event triggers enable you to run a data pipeline when data is created/inserted into Azure Storage. A custom event trigger extends the richness of storage events to all custom events pushed to the Azure Event Grid.

  • Pipelines and Activities – A pipeline consists of a bunch of different activities. An activity in the Azure Data Factory can help you copy data, perform data transformations using Data Flows, and perform various other calculations on Azure. You can also specify iteration and logical construction in a pipeline.
  • Integration Runtimes –An Integration Runtime (IR) is the compute infrastructure used by the Azure Data Factory to perform various data integration tasks (e.g., data movement, data flow, running SSIS packages, running code on different computes on Azure).