Azure Data Factory Overview
Azure Data Factory (ADF) is a fully managed, cloud-based data integration service that allows you to create, schedule, and orchestrate data workflows.
It is commonly used for ETL / ELT pipelines, enabling data movement and transformation across different data sources at scale.
Resource Group
A Resource Group is a logical container in Azure that holds related resources for an Azure solution.
In Azure Data Factory, the resource group typically contains:
- Azure Data Factory instance
- Storage accounts (e.g. Blob Storage, Data Lake)
- Azure SQL / Synapse resources
- Networking and security configurations
Using resource groups helps with lifecycle management, access control, and cost tracking.
Top-Level Concepts in Azure Data Factory
Azure Data Factory is built around several core components:
- Pipelines – Logical groups of activities that perform a task
- Activities – Individual processing steps (e.g. copy, transform)
- Datasets – Represent data structures used as inputs and outputs
- Data Flows – Visual data transformation logic
- Integration Runtimes – Compute infrastructure for data movement and transformation
Pipelines and Activities
A pipeline is a container for one or more activities that together perform a workflow.
An activity defines a specific action, such as:
- Copying data
- Executing a data flow
- Running a stored procedure
- Calling an external service
Pipelines support control flow, including:
- Conditional logic
- Loops
- Error handling
Linked Services and Datasets
Linked Services
A Linked Service is similar to a connection string.
It defines the connection information required for Azure Data Factory to connect to external resources such as databases, storage accounts, or SaaS services.
Examples:
- Azure Blob Storage
- Azure SQL Database
- Amazon S3
- On-premises SQL Server
Datasets
A Dataset represents a named view of data within a linked service.
Datasets identify:
- Tables
- Files
- Folders
- Documents
For example, an Azure Blob Storage dataset specifies:
- Container name
- Folder path
- File format
Azure Blob Storage
Azure Blob Storage is Microsoft’s object storage solution for the cloud, optimized for storing large amounts of unstructured data.
Unstructured data includes:
- Text files
- Images
- Videos
- Binary data
- Logs
Common Use Cases
Azure Blob Storage is designed for:
- Serving images or documents directly to browsers
- Storing files for distributed access
- Streaming video and audio
- Writing and storing log files
- Backup, restore, and disaster recovery
- Storing data for analytics and machine learning workloads
Variables in Azure Data Factory
Pipeline variables are values that can be:
- Defined at the pipeline level
- Modified during pipeline execution
They are commonly used for:
- Storing intermediate values
- Controlling workflow logic
- Tracking execution states
Parameters in Azure Data Factory
Pipeline parameters are values passed into a pipeline at runtime.
Key characteristics:
- Defined at pipeline level
- Cannot be changed during execution
- Used to make pipelines reusable
Common use cases include:
- Passing file paths
- Environment-specific values
- Dataset configuration settings
JSON Structure in ADF
Behind the Azure Data Factory UI, all pipelines, datasets, and linked services are stored as JSON definitions.
These JSON files describe:
- Activity logic
- Dependencies
- Expressions
- Parameters and variables
This enables:
- Source control integration (Git)
- CI/CD pipelines
- Automated deployments
Integration Runtime (IR)
The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities.
It is responsible for:
- Data movement
- Data transformation
- Executing data flows
Types of Integration Runtime
- Azure IR – Fully managed, runs in Azure
- Self-hosted IR – Used for on-premises or private networks
- Azure-SSIS IR – For running SSIS packages in Azure







