Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Use the API: Predicting fuel economy

Once you have configured your API credentials, endpoints, and environment, you can use the DataRobot API to follow this example. The example uses the Python client and the REST API (using cURL), so a basic understanding of Python3 or cURL is required. It guides you through a simple problem: predicting the miles-per-gallon fuel economy from known automobile data (e.g., vehicle weight, number of cylinders, etc.).

The following sections provide sample code, for Python and cURL, that will:

  1. Upload a dataset.
  2. Train a model to learn from the dataset.
  3. Test prediction outcomes on the model with new data.
  4. Deploy the model.
  5. Predict outcomes on the deployed model using new data.

Upload a dataset

The first step to create a project is uploading a dataset. For this example, upload the file auto-mpg.csv that you downloaded from GitHub.

import datarobot as dr
dr.Client()

# Set to the location of your auto-mpg.csv and auto-mpg-test.csv data files
# Example: dataset_file_path = '/Users/myuser/Downloads/auto-mpg.csv'
training_dataset_file_path = ''
test_dataset_file_path = ''

# Load dataset
training_dataset = dr.Dataset.create_from_file(training_dataset_file_path)

# Create a new project based on dataset
project = dr.Project.create_from_dataset(training_dataset.id, project_name='Auto MPG DR-Client')
DATAROBOT_API_TOKEN=${DATAROBOT_API_TOKEN}
DATAROBOT_ENDPOINT=${DATAROBOT_ENDPOINT}
location=$(curl -Lsi \
  -X POST \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -F 'projectName="Auto MPG"' \
  -F "file=@${DATASET_FILE_PATH}" \
  # if you are not using app2.datarobot.com endpoints
  # change `location` to `Location`
  "${DATAROBOT_ENDPOINT}"/projects/ | grep -i 'location: .*$' | \
  cut -d " " -f2 | tr -d '\r')
echo "Uploaded dataset. Checking status of project at: ${location}"
while true; do
  project_id=$(curl -Ls \
    -X GET \
    -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${location}" \
    | grep -Eo 'id":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
  if [ "${project_id}" = "" ]
  then
    echo "Setting up project..."
    sleep 10
  else
    echo "Project setup complete."
    echo "Project ID: ${project_id}"
    break
  fi
done

Train models

Now that DataRobot has data, it can use the data to train and build models with Autopilot. Autopilot is DataRobot's "survival of the fittest" modeling mode that automatically selects the best predictive models for the specified target feature and runs them at increasing sample sizes. The outcome of Autopilot is not only a selection of best-suited models, but also identification of a recommended model—the model that best understands how to predict the target feature "mpg." Choosing the best model is a balance of accuracy, metric performance, and model simplicity. You can read more about the model recommendation process in the platform documentation.

# Use training data to build models.
project.wait_for_autopilot()
model = dr.ModelRecommendation.get(project.id).get_model()
response=$(curl -Lsi \
  -X PATCH \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  --data '{"target": "mpg", "mode": "quick"}' \
  "${DATAROBOT_ENDPOINT}/projects/${project_id}/aim" | grep 'location: .*$' \
  | cut -d " " | tr -d '\r')
echo "AI training initiated. Checking status of training at: ${response}"
while true; do
  initial_project_status=$(curl -Ls \
  -X GET \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${response}" \
  | grep -Eo 'stage":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
  if [ "${initial_project_status}" = "" ]
  then
    echo "Setting up AI training..."
    sleep 10
  else
    echo "Training AI."
    echo "Grab a coffee or catch up on email."
    break
  fi
done

project_status=$(curl -Lsi \
  -X GET \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  "${DATAROBOT_ENDPOINT}/projects/${project_id}/status" \
  | grep -Eo 'autopilotDone":\strue'
  )
if [ "${project_status}" = "" ]
then
  echo "Autopilot training in progress..."
  sleep 60
else
  echo "Autopilot training complete. Model ready to deploy."
  break
fi
done

Make predictions against the model

After building models and identifying the top performers, you can further test a model by making predictions on new data. Typically, you would test predictions with a smaller dataset to ensure the model is behaving as expected before deploying the model to production. DataRobot offers several methods for making predictions on new data. You can read more about prediction methods in the platform documentation.

This code makes predictions on the recommended model using the test set you identified in the first step (test_dataset_file_path), when you uploaded data.

# Test predictions on new data
prediction_data = project.upload_dataset(test_dataset_file_path)
predict_job = model.request_predictions(prediction_data.id)
predictions = predict_job.get_result_when_complete()
predictions.head()

This code makes predictions on the recommended model using the test set you identified in the first step (test_dataset_file_path), when you uploaded data.

# Test predictions on new data
# shellcheck disable=SC2089
prediction_location=$(curl -Lsi\
-X POST \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-F "file=@${TEST_DATASET_FILE_PATH}" \
"${DATAROBOT_ENDPOINT}/projects/${project_id}/predictionDatasets/fileUploads/"\
| grep -i 'location: .*$' | cut -d " " -f2 | tr -d '\r')
echo "Uploaded prediction dataset. Checking status of upload at: ${prediction_location}"
while true; do
  prediction_dataset_id=$(curl -Ls \
  -X GET \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${prediction_location}" \
  | grep -Eo 'id":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
  if [ "${prediction_dataset_id}" = "" ]
  then
  echo "Uploading predictions..."
  sleep 10
  else
  echo "Predictions upload complete."
  echo "Predictions dataset ID: ${prediction_dataset_id}"
  break
  fi
  done
  prediction_request_data="{\
      \"modelId\":\"${recommended_model_id}\",\
      \"datasetId\":\"${prediction_dataset_id}\"\
  }"
  predict_job=$(curl -Lsi \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  --data "${prediction_request_data}" \
  "${DATAROBOT_ENDPOINT}/projects/${project_id}/predictions/"\
  | grep -i 'location: .*$' | cut -d " " -f2 | tr -d '\r')
while true; do
  initial_job_response=$(curl -Ls \
    -X GET \
    -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${predict_job}" \
    | grep -Eo 'status":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
  if [ "${initial_job_status}" = "inprogress" ]
  then
    echo "Generating predictions..."
    sleep 10
  else
    echo "Predictions complete."
    break
  fi
done

curl -Ls \
  -X GET \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${predict_job}"

Deploy the model

Deployment is the method by which you integrate a machine learning model into an existing production environment to make predictions with live data and generate insights. See the machine learning model deployment overview for more information.

# Deploy model
deployment = dr.Deployment.create_from_learning_model(
  model_id=model.id, label="MPG Prediction Server",
  description="Deployed with DataRobot client")

# View deployment stats
service_stats = deployment.get_service_stats()
print(service_stats.metrics)
# Deploy model
prediction_server = dr.PredictionServer.list()[0]
deployment = dr.Deployment.create_from_learning_model(
  model_id=model.id, label="MPG Prediction Server",
  description="Deployed with DataRobot client",
  default_prediction_server_id=prediction_server.id
)

# View deployment stats
service_stats = deployment.get_service_stats()
print(service_stats.metrics)
recommended_model_id=$(curl -s \
-X GET \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
"${DATAROBOT_ENDPOINT}/projects/${project_id}/recommendedModels"\
"/recommendedModel/" \
| grep -Eo 'modelId":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
server_data=$(curl -s -X GET \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
"${DATAROBOT_ENDPOINT}/predictionServers/")
default_server_id=$(echo $server_data \
| grep -Eo 'id":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
server_url=$(echo $server_data | grep -Eo 'url":\s".*?"' \
| cut -d '"' -f3 | tr -d '\r')
server_key=$(echo $server_data | grep -Eo 'datarobot-key":\s".*?"' \
| cut -d '"' -f3 | tr -d '\r')
request_data="{\
    \"defaultPredictionServerId\":\"${default_server_id}\",\
    \"modelId\":\"${recommended_model_id}\",\
    \"description\":\"Deployed with cURL\",\
    \"label\":\"MPG Prediction Server\"\
}"
deployment_response=$(curl -Lsi -X POST \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
--data "${request_data}" \
"${DATAROBOT_ENDPOINT}/deployments/fromLearningModel/")
deploy_response_code_202=$(echo $deployment_response | grep -Eo 'HTTP/2 202')
if [ "${deploy_response_code_202}" = "" ]
then
  deployment_id=$(echo "$deployment_response" | grep -Eo 'id":\s"\w+' \
  | cut -d '"' -f3 | tr -d '\r')
  echo "Prediction server ready."
else
  deployment_status=$(echo "$deployment_response" | grep -Eo 'location: .*$' \
  | cut -d " " | tr -d '\r')
  while true; do
    deployment_ready=$(curl -Ls \
      -X GET \
      -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" "${deployment_status}" \
      | grep -Eo 'id":\s"\w+' | cut -d '"' -f3 | tr -d '\r')
    if [ "${deployment_ready}" = "" ]
    then
      echo "Waiting for deployment..."
      sleep 10
    else
      deployment_id=$deployment_ready
      echo "Prediction server ready."
      break
    fi
  done
fi

Make predictions against the deployed model

When you have successfully deployed a model, you can use the DataRobot Prediction API to make predictions. This allows you to access advanced model management features such as data drift, accuracy, and service health statistics.

import requests
from pprint import pprint
import json
import os
# JSON records for example autos for which to predict mpg
autos = [
    {
        "cylinders": 4,
        "displacement": 119.0,
        "horsepower": 82.00,
        "weight": 2720.0,
        "acceleration": 19.4,
        "model year": 82,
        "origin": 1,
    },
    {
        "cylinders": 8,
        "displacement": 120.0,
        "horsepower": 79.00,
        "weight": 2625.0,
        "acceleration": 18.6,
        "model year": 82,
        "origin": 1,
    },
]
# Create REST request for prediction API
prediction_server = deployment.default_prediction_server
prediction_headers = {
    "Authorization": "Bearer {}".format(os.getenv("DATAROBOT_API_TOKEN")),
    "Content-Type": "application/json",
    "datarobot-key": prediction_server['datarobot-key']
}

predictions = requests.post(
    f"{prediction_server.url}/predApi/v1.0/deployments"
    f"/{deployment.id}/predictions",
    headers=prediction_headers,
    data=json.dumps(autos),
)
pprint(predictions.json())
autos='[{
  "cylinders": 4,
  "displacement": 119.0,
  "horsepower": 82.00,
  "weight": 2720.0,
  "acceleration": 19.4,
  "model year": 82,
  "origin":1
},{
  "cylinders": 8,
  "displacement": 120.0,
  "horsepower": 79.00,
  "weight": 2625.0,
  "acceleration": 18.6,
  "model year": 82,
  "origin":1
}]'
curl -X POST \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "datarobot-key: ${server_key}" \
  --data "${autos}" \
  "${server_url}/predApi/v1.0/deployments/${deployment_id}/predictions"

Learn more

Note

Log in to GitHub before accessing these GitHub resources.


Updated October 1, 2021