Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

BigQuery adapter for Batch Predictions

Now available as a public beta feature, DataRobot supports BigQuery for batch predictions. JDBC isn't a desired option for BigQuery because it doesn't scale well for prediction write-back. Instead, you can use the BigQuery REST API to export data from a table into Google Cloud Storage (GCS) as an asynchronous job, score data with the GCS adapter, and bulk update the BigQuery table with a batch loading job.

Intake options

Parameter Example Description
type bigquery Use the BigQuery API to unload data to Google Cloud Storage and use it as intake.
dataset my_dataset The BigQuery dataset to use.
table my_table The BigQuery table from the dataset used as intake.
bucket my-bucket-in-gcs Bucket where data should be exported.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are required (otherwise optional). Refer to storing credentials securely.

Output options

Parameter Example Description
type bigquery Use Google Cloud Storage for output and the batch loading job to ingest data from GCS into a BigQuery table.
dataset my_dataset The BigQuery dataset to use.
table my_table The BigQuery table from the dataset to use for output.
bucket my-bucket-in-gcs The bucket from where data should be loaded.
credentialId 5e4bc5555e6e763beb488dba Required if explicit access credentials for this bucket are necessary (otherwise optional). Refer to storing credentials securely.

Examples

The following example scores data from a BigQuery table and sends results to a BigQuery table.

import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "..."
gcs_credential_id = "..."

intake_settings = {
    'type': 'bigquery',
    'dataset': 'my-dataset',
    'table': 'intake-table',
    'bucket': 'my-bucket',
    'credential_id': gcs_credential_id,
}

output_settings = {
    'type': 'bigquery',
    'dataset': 'my-dataset',
    'table': 'output-table',
    'bucket': 'my-bucket',
    'credential_id': gcs_credential_id,
}

job = dr.BatchPredictionJob.score(
    deployment=deployment_id,
    intake_settings=intake_settings,
    output_settings=output_settings,
    include_prediction_status=True,
    passthrough_columns=["some_col_name"],
)

print("started scoring...", job)
job.wait_for_completion()

Updated September 10, 2021
Back to top