# Apache Airflow

> Apache Airflow - How to use the DataRobot Provider for Apache Airflow to implement a basic DAG
> orchestrating an end-to-end DataRobot AI pipeline.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.251470+00:00` (UTC).

## Primary page

- [Apache Airflow](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html): Full documentation for this topic (HTML).

## Sections on this page

- [Install the prerequisites](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#install-the-prerequisites): In-page section heading.
- [Initialize a local Airflow project](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#initialize-a-local-airflow-project): In-page section heading.
- [Load example DAGs into Airflow](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#load-example-dags-into-airflow): In-page section heading.
- [Create a connection from Airflow to DataRobot](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#create-a-connection-from-airflow-to-datarobot): In-page section heading.
- [Configure the DataRobot pipeline DAG](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#configure-the-datarobot-pipeline-dag): In-page section heading.
- [Run the DataRobot pipeline DAG](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#run-the-datarobot-pipeline-dag): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Code-first tools](https://docs.datarobot.com/en/docs/api/code-first-tools/index.html): Linked from this page.
- [DataRobot MLOps](https://docs.datarobot.com/en/docs/api/dev-learning/python/mlops/index.html): Linked from this page.
- [locate or create an API key inAPI keys and tools](https://docs.datarobot.com/en/docs/platform/acct-settings/api-key-mgmt.html#api-key-management): Linked from this page.

## Documentation content

# DataRobot provider for Apache Airflow

The combined capabilities of [DataRobot MLOps](https://docs.datarobot.com/en/docs/api/dev-learning/python/mlops/index.html) and [Apache Airflow](https://airflow.apache.org/docs/) provide a reliable solution for retraining and redeploying your models. For example, you can retrain and redeploy your models on a schedule, on model performance degradation, or using a sensor that triggers the pipeline in the presence of new data. This quickstart guide on the DataRobot provider for Apache Airflow illustrates the setup and configuration process by implementing a basic [Apache Airflow DAG (Directed Acyclic Graph)](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html) to orchestrate an end-to-end DataRobot AI pipeline. This pipeline includes creating a project, training models, deploying a model, scoring predictions, and returning target and feature drift data. In addition, this guide shows you how to import [example DAG files](https://github.com/datarobot/airflow-provider-datarobot/tree/main/datarobot_provider/example_dags) from the `airflow-provider-datarobot` repository so that you can quickly implement a variety of DataRobot pipelines.

The DataRobot provider for Apache Airflow is a Python package built from [source code available in a public GitHub repository](https://github.com/datarobot/airflow-provider-datarobot) and [published in PyPI (The Python Package Index)](https://pypi.org/project/airflow-provider-datarobot/). It is also [listed in the Astronomer Registry](https://registry.astronomer.io/providers/datarobot/versions/latest). For more information on using and developing provider packages, see the [Apache Airflow documentation](https://airflow.apache.org/docs/apache-airflow-providers/index.html). The integration uses [the DataRobot Python API Client](https://pypi.org/project/datarobot/), which communicates with DataRobot instances via REST API. For more information, see [the DataRobot Python package documentation](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/).

## Install the prerequisites

The DataRobot provider for Apache Airflow requires an environment with the following dependencies installed:

- Apache Airflow>=2.6.0, <3.0
- DataRobot Python API Client>=3.8.0rc1

To install the DataRobot provider, you can run the following command:

```
pip install airflow-provider-datarobot
```

Before you start the tutorial, install the [Astronomer command line interface (CLI) tool](https://github.com/astronomer/astro-cli#readme) to manage your local Airflow instance:

**MacOS:**
First, install Docker Desktop for [MacOS](https://docs.docker.com/desktop/install/mac-install/).

Then, run the following command:

```
brew install astro
```

**Linux:**
First, install Docker Desktop for [Linux](https://docs.docker.com/desktop/install/linux-install/).

Then, run the following command:

```
curl -sSL https://install.astronomer.io | sudo bash
```

**Windows:**
First, install Docker Desktop for [Windows](https://docs.docker.com/desktop/install/windows-install/).

Then, see the [Astro CLI README](https://github.com/astronomer/astro-cli#windows).


Next, install [pyenv](https://github.com/pyenv/pyenv#simple-python-version-management-pyenv) or another Python version manager.

## Initialize a local Airflow project

After you complete the installation prerequisites, you can create a new directory and initialize a local Airflow project there with [AstroCLI](https://github.com/astronomer/astro-cli#get-started):

1. Create a new directory and navigate to it: mkdirairflow-provider-datarobot&&cdairflow-provider-datarobot
2. Run the following command within the new directory, initializing a new project with the required files: astrodevinit
3. Navigate to therequirements.txtfile and add the following content: airflow-provider-datarobot
4. Run the following command to start a local Airflow instance in a Docker container: astrodevstart
5. Once the installation is complete and the web server starts (after approximately one minute), you should be able to access Airflow athttp://localhost:8080/.
6. Sign in to Airflow. The AirflowDAGspage appears.

## Load example DAGs into Airflow

The example DAGs don't appear on the DAGs page by default. To make the DataRobot provider for Apache Airflow's example DAGs available:

1. Download the DAG files from theairflow-provider-datarobotrepository.
2. Copy thedatarobot_pipeline_dag.pyAirflow DAG(or the entiredatarobot_provider/example_dagsdirectory) to your project.
3. Wait a minute or two and refresh the page. The example DAGs appear on theDAGspage, including thedatarobot_pipelineDAG:

## Create a connection from Airflow to DataRobot

The next step is to create a connection from Airflow to DataRobot:

1. ClickAdmin > Connectionstoadd an Airflow connection.
2. On theList Connectionpage, click+ Add a new record.
3. In theAdd Connectiondialog box, configure the following fields: FieldDescriptionConnection Iddatarobot_default(this name is used by default in all operators)Connection TypeDataRobotAPI KeyA DataRobot API token (locate or create an API key inAPI keys and tools)DataRobot endpoint URLhttps://app.datarobot.com/api/v2by default
4. ClickTestto establish a test connection between Airflow and DataRobot.
5. When the connection test is successful, clickSave.

## Configure the DataRobot pipeline DAG

The [datarobot_pipeline Airflow DAG](https://github.com/datarobot/airflow-provider-datarobot/blob/main/datarobot_provider/example_dags/datarobot_pipeline_dag.py) contains operators and sensors that automate the DataRobot pipeline steps. Each operator initiates a specific job, and each sensor waits for a predetermined action to complete:

| Operator | Job |
| --- | --- |
| CreateProjectOperator | Creates a DataRobot project and returns its ID |
| TrainModelsOperator | Triggers DataRobot Autopilot to train models |
| DeployModelOperator | Deploys a specified model and returns the deployment ID |
| DeployRecommendedModelOperator | Deploys a recommended model and returns the deployment ID |
| ScorePredictionsOperator | Scores predictions against the deployment and returns a batch prediction job ID |
| AutopilotCompleteSensor | Senses if Autopilot completed |
| ScoringCompleteSensor | Senses if batch scoring completed |
| GetTargetDriftOperator | Returns the target drift from a deployment |
| GetFeatureDriftOperator | Returns the feature drift from a deployment |

> [!NOTE] Note
> This example pipeline doesn't use every available operator or sensor; for more information, see the [Operators](https://github.com/datarobot/airflow-provider-datarobot/tree/main#operators) and [Sensors](https://github.com/datarobot/airflow-provider-datarobot/tree/main#sensors) documentation in the project `README`.

Each operator in the DataRobot pipeline requires specific parameters. You define these parameters in a configuration JSON file and provide the JSON when running the DAG.

```
{
    "training_data": "local-path-to-training-data-or-s3-presigned-url-",
    "project_name": "Project created from Airflow",
    "autopilot_settings": {
        "target": "readmitted",
        "mode": "quick",
        "max_wait": 3600
    },
    "deployment_label": "Deployment created from Airflow",
    "score_settings": {}
}
```

The parameters from `autopilot_settings` are passed directly into the [Project.set_target()](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.28.0/autodoc/api_reference.html#datarobot.models.Project.set_target) method; you can set any parameter available in this method through the configuration JSON file.

Values in the `training_data` and `score_settings` depend on the intake/output type. The parameters from `score_settings` are passed directly into the [BatchPredictionJob.score()](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.28.0/autodoc/api_reference.html#datarobot.models.BatchPredictionJob.score) method; you can set any parameter available in this method through the configuration JSON file.

For example, see the local file intake/output and Amazon AWS S3 intake/output JSON configuration samples below:

**Local file example:**
Define `training_data`

For local file intake, you should provide the local path to the `training_data`:

1
2
3
4
5
6
7
8
9
10
11

Define `score_settings`

For the scoring `intake_settings` and `output_settings`, define the `type` and provide the local `path` to the intake and output data locations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

> [!NOTE] Note
> When using the Astro CLI tool to run Airflow, you can place local input files in the `include/` directory. This location is accessible to the Airflow application inside the Docker container.

**Amazon AWS S3 example:**
Define `training_data`

For Amazon AWS S3 intake, you can generate a pre-signed URL for the training data file on S3:

In the S3 bucket, click the CSV file.
Click
Object Actions
at the top-right corner of the screen and click
Share with a pre-signed URL
.
Set the expiration time interval and click
Create presigned URL
. The URL is saved to your clipboard.
Paste the URL in the JSON configuration file as the
training_data
value:

1
2
3
4
5
6
7
8
9
10
11
12

Define `datarobot_aws_credentials` and `score_settings`

For scoring data on Amazon AWS S3, you can add your DataRobot AWS credentials to Airflow:

Click
Admin > Connections
to
add an Airflow connection
.
On the
List Connection
page, click
+ Add a new record
.
In the
Connection Type
list, click
DataRobot AWS Credentials
.
Define a
Connection Id
and enter your Amazon AWS S3 credentials.
Click
Test
to establish a test connection between Airflow and Amazon AWS S3.
When the connection test is successful, click
Save
.
You return to the
List Connections
page, where you should copy the
Conn Id
.

You can now add the Connection Id / Conn Id value (represented by `connection-id` in this example) to the `datarobot_aws_credentials` field when you [run the DAG](https://docs.datarobot.com/en/docs/api/code-first-tools/apache-airflow.html#run-the-datarobot-pipeline-dag).

For the scoring `intake_settings` and `output_settings`, define the `type` and provide the `url` for the AWS S3 intake and output data locations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

> [!NOTE] Note
> Because this pipeline creates a deployment, the output of the deployment creation step provides the `deployment_id` required for scoring.


## Run the DataRobot pipeline DAG

After completing the setup steps above, you can run a DataRobot provider DAG in Airflow using the configuration JSON you assembled:

1. On the AirflowDAGspage, locate the DAG pipeline you want to run.
2. Click the run icon for that DAG and clickTrigger DAG w/ config.
3. On theDAG conf parameterspage, enter the JSON configuration data required by the DAG. In this example, the JSON you assembled in the previous step.
4. SelectUnpause DAG when triggered, and then clickTrigger. The DAG starts running:

> [!NOTE] Note
> While running Airflow in a Docker container (e.g., using the Astro CLI tool), expect the predictions file created inside the container. To make the predictions available in the host machine, specify the output location in the `include/` directory.
