Azure Data Factory
Azure Data Factory can be thought of as an Orchestration tool. What that means is that you select different services and place them in a specific order. This is then called a pipeline.
For example you can have a copy command where you specify a source and a destination where that data will be copied to (ex: SQL database).
In other instances you can create a link to a data source which then will be connected so Azure Data Bricks to transform that data.If you are just copying data from point A to point B you will be using an Azure service called Integration Runtime.
Those come in two types:
Azure managed and Self hosted.
The difference between the two primarily comes does to one being spun up (created by Azure as soon as a pipeline is activated) or one that is installed on a server or VM managed by you (which is then limited to the resources that VM or server has access to).
In an instance where you need data transformation that is usually managed by a different service such as Data Bricks. That code would then need to be written in that application which is then activated by Azure Data Factory (ex: a python script created in Data Bricks). In that case Integration Runtime would still copy the source and insert the data into a target source, but the transformation will be done by Data Bricks.
This sometimes makes estimating the cost difficult as you would also need to take into account the cost of Data Bricks as well.
- Lectures 27
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 0
- Certificate No
- Assessments Yes