Run Batch Prediction jobs from Azure Blob Storage¶
The DataRobot Batch Prediction API allows users to take in large datasets and score them against deployed models running on a Prediction Server. The API also provides flexible options for file intake and output.
This page shows how you can set up a Batch Prediction job—using the DataRobot Python Client package to call the Batch Prediction API—that will score files from Azure Blob storage and write the results back to Azure Blob storage. This method also works for Azure Data Lake Storage Gen2 accounts because the underlying storage is the same.
The Batch Prediction job requires credentials to read and write to Azure Blob storage, including the name of the Azure storage account and an access key.
To obtain these credentials:
In the Azure portal for the Azure Blob Storage account, click Access keys.
Click Show keys to reveal the values of your access keys. You can use either of the keys shown (key1 or key2).
Use the following code to create a new credential object within DataRobot, used by the Batch Prediction job to connect to your Azure storage account.
AZURE_STORAGE_ACCOUNT="YOUR AZURE STORAGE ACCOUNT NAME"AZURE_STORAGE_ACCESS_KEY="AZURE STORAGE ACCOUNT ACCESS KEY"DR_CREDENTIAL_NAME=f"Azure_{AZURE_STORAGE_ACCOUNT}"# Create an Azure-specific Credential# The connection string is also found below the access key in Azure if you want to copy that directly.credential=dr.Credential.create_azure(name=DR_CREDENTIAL_NAME,azure_connection_string=f"DefaultEndpointsProtocol=https;AccountName={AZURE_STORAGE_ACCOUNT};AccountKey={AZURE_STORAGE_ACCESS_KEY};")# Use this code to look up the ID of the credential object created.credential_id=Noneforcredindr.Credential.list():ifcred.name==DR_CREDENTIAL_NAME:credential_id=cred.credential_idbreakprint(credential_id)
After creating a credential object, you can set up the Batch Prediction job.
Set the intake settings and output settings to the azure type.
Provide both attributes with the URL to the files in Blob storage that you want to read and write to (the output file does not need to exist already) and the ID of the credential object that previously set up.
The code below creates and runs the Batch Prediction job and, and when finished, provide the status of the job.
This code also demonstrates how to configure the job to return both Prediction Explanations and passthrough columns for the scoring data.
DEPLOYMENT_ID='YOUR DEPLOYMENT ID'AZURE_STORAGE_ACCOUNT="YOUR AZURE STORAGE ACCOUNT NAME"AZURE_STORAGE_CONTAINER="YOUR AZURE STORAGE ACCOUNT CONTAINER"AZURE_INPUT_SCORING_FILE="YOUR INPUT SCORING FILE NAME"AZURE_OUTPUT_RESULTS_FILE="YOUR OUTPUT RESULTS FILE NAME"# Set up your batch prediction job# Input: Azure Blob Storage# Output: Azure Blob Storagejob=dr.BatchPredictionJob.score(deployment=DEPLOYMENT_ID,intake_settings={'type':'azure','url':f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net/{AZURE_STORAGE_CONTAINER}/{AZURE_INPUT_SCORING_FILE}","credential_id":credential_id},output_settings={'type':'azure','url':"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net/{AZURE_STORAGE_CONTAINER}/{AZURE_OUTPUT_RESULTS_FILE}","credential_id":credential_id},# If explanations are required, uncomment the line belowmax_explanations=5,# If passthrough columns are required, use this linepassthrough_columns=['column1','column2'])job.wait_for_completion()job.get_status()
When the job is complete, the output file is displayed in your Blob storage container. You now have a Batch Prediction job that can read and write from Azure Blob Storage via the DataRobot Python client package and the Batch Prediction API.