The DataRobot API provides a programmatic alternative to the web interface for creating and managing DataRobot projects. The API can be used via REST or with DataRobot's Python or R clients in Windows, UNIX, and OS X environments. This guide walks you through setting up your environment and then you can follow a sample problem that outlines an end-to-end workflow for the API.
Note
Note that the API quickstart guide uses methods for 3.x versions of DataRobot's Python client. If you are a Self-Managed AI Platform user, consult the API resources table to verify which versions of DataRobot's clients are supported for your version of the DataRobot application.
Before proceeding, access and install the DataRobot client package for Python or R. DataRobot also recommends browsing the API Reference documentation to familiarize yourself with the code-first resources available to you.
The following prerequisites are for 3.x versions of DataRobot's Python client:
Self-Managed AI Platform users may want to install a previous version of the client in order to match their installed version of the DataRobot application. Reference the available versions to map your installation to the correct version of the API client.
pip install datarobot
(Optional) If you would like to build custom blueprints programmatically, install two additional packages: graphviz and blueprint-workshop.
Using DataRobot APIs, you will execute a complete modeling workflow, from uploading a dataset to making predictions on a model deployed in a production environment.
DataRobot provides several deployment options to meet your business requirements. Each deployment type has its own set of endpoints. Choose from the tabs below:
The AI Platform (US) offering is primarily accessed by US and Japanese users. It can be accessed at https://app.datarobot.com.
API endpoint root: https://app.datarobot.com/api/v2
The AI Platform (EU) offering is primarily accessed by EMEA users. It can be accessed at https://app.eu.datarobot.com.
API endpoint root: https://app.eu.datarobot.com/api/v2
For Self-Managed AI Platform users, the API root will be the same as your DataRobot UI root. Replace {datarobot.example.com} with your deployment endpoint.
API endpoint root: https://{datarobot.example.com}/api/v2
To authenticate with DataRobot's API, your code needs to have access to an endpoint and token from the previous steps. This can be done in three ways:
drconfig.yaml is a file that the DataRobot Python and R clients automatically look for. This is DataRobot's recommended authentication method. You can instruct the API clients to look for the file in a specific location, ~/.config/datarobot/drconfig.yaml by default, or under a unique name. Therefore, you can leverage this to have multiple config files. The example below demonstrates the format of the .yaml:
Once set, close and reopen the Command Prompt or PowerShell for the changes to take effect.
To configure persisting environment variables on Windows, search for "Environment Variables" in the Start menu and select Edit the system environment variables.
Then, click Environment Variables and, under System variables, click New to add the variables shown above.
For macOS and Linux:
For macOS and Linux users, open a terminal window and set the following environment variables:
To configure persisting environment variables on macOS or Linux, edit the shell configuration file (~/.bash_profile, ~/.bashrc, or ~/.zshrc) and add the environment variables shown above. Then, save the file and restart your terminal or run source ~/.bash_profile (or use any relevant file).
Once the environment variables are set, authenticate to connect to DataRobot.
For Python:
importdatarobotasdrdr.Project.list()
For R:
library(datarobot)
For cURL:
curl --location -X GET "${DATAROBOT_API_ENDPOINT}/projects" --header "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
(Optional) Be cautious to never commit your credentials to Git.
Once you have configured your API credentials, endpoints, and environment, you can use the DataRobot API to follow this example. The example uses the Python client and the REST API (using cURL), so a basic understanding of Python3 or cURL is required. It guides you through a simple problem: predicting the miles-per-gallon fuel economy from known automobile data (e.g., vehicle weight, number of cylinders, etc.). For additional code examples, reference DataRobot's AI accelerators.
Note
Python client users should note that the following workflow uses methods introduced in version 3.0 of the client. Ensure that your client is up-to-date before executing the code included in this example.
The following sections provide sample code, for Python and cURL, that will:
Upload a dataset.
Train a model to learn from the dataset.
Test prediction outcomes on the model with new data.
Deploy the model.
Predict outcomes on the deployed model using new data.
The first step to create a project is uploading a dataset. This example uses the dataset auto-mpg.csv, which you can download here.
importdatarobotasdrdr.Client()# Set to the location of your auto-mpg.csv and auto-mpg-test.csv data files# Example: dataset_file_path = '/Users/myuser/Downloads/auto-mpg.csv'training_dataset_file_path=''test_dataset_file_path=''# Load datasettraining_dataset=dr.Dataset.create_from_file(training_dataset_file_path)# Create a new project based on datasetproject=dr.Project.create_from_dataset(training_dataset.id,project_name='Auto MPG DR-Client')
# Set to the location of your auto-mpg.csv and auto-mpg-test.csv data files
# Example: dataset_file_path = '/Users/myuser/Downloads/auto-mpg.csv'
training_dataset_file_path <- ""
test_dataset_file_path <- ""
training_dataset <- utils::read.csv(training_dataset_file_path)
test_dataset <- utils::read.csv(test_dataset_file_path)
head(training_dataset)
project <- SetupProject(dataSource = training_dataset, projectName = "Auto MPG DR-Client", maxWait = 60 * 60)
DATAROBOT_API_TOKEN=${DATAROBOT_API_TOKEN}DATAROBOT_ENDPOINT=${DATAROBOT_ENDPOINT}location=$(curl-Lsi\-XPOST\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}"\-F'projectName="Auto MPG"'\-F"file=@${DATASET_FILE_PATH}"\"${DATAROBOT_ENDPOINT}"/projects/|grep-i'Location: .*$'|\cut-d" "-f2|tr-d'\r')echo"Uploaded dataset. Checking status of project at: ${location}"whiletrue;doproject_id=$(curl-Ls\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}""${location}"\|grep-Eo'id":\s"\w+'|cut-d'"'-f3|tr-d'\r')if["${project_id}"=""]thenecho"Setting up project..."sleep10elseecho"Project setup complete."echo"Project ID: ${project_id}"breakfidone
Now that DataRobot has data, it can use the data to train and build models with Autopilot. Autopilot is DataRobot's "survival of the fittest" modeling mode that automatically selects the best predictive models for the specified target feature and runs them at increasing sample sizes. The outcome of Autopilot is not only a selection of best-suited models, but also identification of a recommended model—the model that best understands how to predict the target feature "mpg." Choosing the best model is a balance of accuracy, metric performance, and model simplicity. You can read more about the model recommendation process in the UI documentation.
# Use training data to build modelsfromdatarobotimportAUTOPILOT_MODE# Set the project's target and initiate Autopilot (runs in Quick mode unless a different mode is specified)project.analyze_and_model(target='mpg',worker_count=-1,mode=AUTOPILOT_MODE.QUICK)# Open the project's Leaderboard to monitor the progress in UI.project.open_in_browser()# Wait for the model creation to finishproject.wait_for_autopilot()model=project.get_top_model()
# Set the project target and initiate Autopilot
SetTarget(project,
target = "mpg")
# Block execution until Autopilot is complete
WaitForAutopilot(project)
model <- GetRecommendedModel(project, type = RecommendedModelType$RecommendedForDeployment)
response=$(curl-Lsi\-XPATCH\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}"\-H"Content-Type: application/json"\--data'{"target": "mpg", "mode": "quick"}'\"${DATAROBOT_ENDPOINT}/projects/${project_id}/aim"|grep'location: .*$'\|cut-d" "|tr-d'\r')echo"AI training initiated. Checking status of training at: ${response}"whiletrue;doinitial_project_status=$(curl-Ls\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}""${response}"\|grep-Eo'stage":\s"\w+'|cut-d'"'-f3|tr-d'\r')if["${initial_project_status}"=""]thenecho"Setting up AI training..."sleep10elseecho"Training AI."echo"Grab a coffee or catch up on email."breakfidoneproject_status=$(curl-Lsi\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}"\"${DATAROBOT_ENDPOINT}/projects/${project_id}/status"\|grep-Eo'autopilotDone":\strue')if["${project_status}"=""]thenecho"Autopilot training in progress..."sleep60elseecho"Autopilot training complete. Model ready to deploy."breakfidone
After building models and identifying the top performers, you can further test a model by making predictions on new data. Typically, you would test predictions with a smaller dataset to ensure the model is behaving as expected before deploying the model to production. DataRobot offers several methods for making predictions on new data. You can read more about prediction methods in the UI documentation.
This code makes predictions on the recommended model using the test set you identified in the first step (test_dataset_file_path), when you uploaded data.
# Test predictions on new datapredict_job=model.request_predictions(test_dataset_file_path)predictions=predict_job.get_result_when_complete()predictions.head()
This code makes predictions on the recommended model using the test set you identified in the first step (test_dataset_file_path), when you uploaded data.
This code makes predictions on the recommended model using the test set you identified in the first step (test_dataset_file_path), when you uploaded data.
# Test predictions on new data# shellcheck disable=SC2089prediction_location=$(curl-Lsi\
-XPOST\
-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}"\
-F"file=@${TEST_DATASET_FILE_PATH}"\"${DATAROBOT_ENDPOINT}/projects/${project_id}/predictionDatasets/fileUploads/"\|grep-i'location: .*$'|cut-d" "-f2|tr-d'\r')echo"Uploaded prediction dataset. Checking status of upload at: ${prediction_location}"whiletrue;doprediction_dataset_id=$(curl-Ls\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}""${prediction_location}"\|grep-Eo'id":\s"\w+'|cut-d'"'-f3|tr-d'\r')if["${prediction_dataset_id}"=""]thenecho"Uploading predictions..."sleep10elseecho"Predictions upload complete."echo"Predictions dataset ID: ${prediction_dataset_id}"breakfidoneprediction_request_data="{\ \"modelId\":\"${recommended_model_id}\",\ \"datasetId\":\"${prediction_dataset_id}\"\ }"predict_job=$(curl-Lsi\-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}"\--data"${prediction_request_data}"\"${DATAROBOT_ENDPOINT}/projects/${project_id}/predictions/"\|grep-i'location: .*$'|cut-d" "-f2|tr-d'\r')whiletrue;doinitial_job_response=$(curl-Ls\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}""${predict_job}"\|grep-Eo'status":\s"\w+'|cut-d'"'-f3|tr-d'\r')if["${initial_job_status}"="inprogress"]thenecho"Generating predictions..."sleep10elseecho"Predictions complete."breakfidone
curl-Ls\-XGET\-H"Authorization: Bearer ${DATAROBOT_API_TOKEN}""${predict_job}"
Deployment is the method by which you integrate a machine learning model into an existing production environment to make predictions with live data and generate insights. See the machine learning model deployment overview for more information.
When you have successfully deployed a model, you can use the DataRobot Prediction API to make predictions. This allows you to access advanced model management features such as data drift, accuracy, and service health statistics.
You can also reference a Python prediction snippet from the UI. Navigate to the Deployments page, select your deployment, and go to Predictions > Prediction API to reference the snippet for making predictions.
importrequestsfrompprintimportpprintimportjsonimportos# JSON records for example autos for which to predict mpgautos=[{"cylinders":4,"displacement":119.0,"horsepower":82.00,"weight":2720.0,"acceleration":19.4,"model year":82,"origin":1,},{"cylinders":8,"displacement":120.0,"horsepower":79.00,"weight":2625.0,"acceleration":18.6,"model year":82,"origin":1,},]# Create REST request for prediction APIprediction_server=deployment.default_prediction_serverprediction_headers={"Authorization":"Bearer {}".format(os.getenv("DATAROBOT_API_TOKEN")),"Content-Type":"application/json","datarobot-key":prediction_server['datarobot-key']}predictions=requests.post(f"{prediction_server['url']}/predApi/v1.0/deployments"f"/{deployment.id}/predictions",headers=prediction_headers,data=json.dumps(autos),)pprint(predictions.json())
# Prepare to connect to the prediction server
URL <- paste0(deployment$defaultPredictionServer$url,
"/predApi/v1.0/deployments/",
deployment$id,
"/predictions")
USERNAME <- "deployment$owners$preview$email" # This should be your DR email account
API_TOKEN <- Sys.getenv("DATAROBOT_API_TOKEN") # This is configured implicitly when you first run `library(datarobot)`
# Invoke Predictions API with the test_dataset
response <- httr::POST(URL,
body = jsonlite::toJSON(test_dataset),
httr::add_headers("datarobot-key" = deployment$defaultPredictionServer$dataRobotKey),
httr::content_type_json(),
authenticate(USERNAME, API_TOKEN, "basic"))
# Parse the results from the prediction server
predictionResults <- fromJSON(httr::content(response, as = "text"),
simplifyDataFrame = TRUE,
flatten = TRUE)$data
print(predictionResults)
After getting started with DataRobot's APIs, navigate to the user guide for overviews, Jupyter notebooks, and task-based tutorials that help you find complete examples of common data science and machine learning workflows. You can also read the reference documentation available for DataRobot's programmatic tools.
Note
Log in to GitHub before accessing these GitHub resources.