DataRobot APIのリソース > APIリファレンスドキュメント > バッチ予測API > バッチ予測のユースケース

バッチ予測のユースケース¶

以下は、CSVファイルと外部サービスの両方のAPIコードを使用したエンドツーエンドのスコアリングのいくつかの例です。

ローカルからのCSVファイルのエンドツーエンドのスコアリング
S3でのCSVファイルのエンドツーエンドのスコアリング
AIカタログからCSVファイルへのスコアリング
JDBC PostgreSQLデータベースからのエンドツーエンドのスコアリング
Snowflakeによるエンドツーエンドのスコアリング
Synapseによるエンドツーエンドのスコアリング
BigQueryによるエンドツーエンドスコアリング

備考

これらのユースケースでは、DataRobot APIクライアントをインストールする必要があります。

ローカルからのCSVファイルのエンドツーエンドのスコアリング¶

次の例では、ローカルCSVファイルがスコアリングされ、処理の開始を待機した後にダウンロードが初期化されます。


import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "..."

input_file = "to_predict.csv"
output_file = "predicted.csv"

job = dr.BatchPredictionJob.score_to_file(
    deployment_id,
    input_file,
    output_file,
    passthrough_columns_set="all"
)

print("started scoring...", job)
job.wait_for_completion()

予測の説明¶

目的の予測の説明パラメーターをジョブ設定に追加して、予測の説明を含めることができます。


job = dr.BatchPredictionJob.score_to_file(
    deployment_id,
    input_file,
    output_file,
    max_explanations=10,
    threshold_high=0.5,
    threshold_low=0.15,
)

カスタムCSV形式¶

CSVファイルがデフォルトのCSV形式でない場合は、csvSettingsを設定して予想CSV形式を変更できます。


job = dr.BatchPredictionJob.score_to_file(
    deployment_id,
    input_file,
    output_file,
    csv_settings={
        'delimiter': ';',
        'quotechar': '\'',
        'encoding': 'ms_kanji',
    },
)

S3でのCSVファイルのエンドツーエンドのスコアリング¶


import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "616d01a8ddbd17fc2c75caf4"
credential_id = "..."

s3_csv_input_file = 's3://my-bucket/data/to_predict.csv'
s3_csv_output_file = 's3://my-bucket/data/predicted.csv'

job = dr.BatchPredictionJob.score_s3(
    deployment_id,
    source_url=s3_csv_input_file,
    destination_url=s3_csv_output_file,
    credential=credential_id
)

print("started scoring...", job)
job.wait_for_completion()

同じ機能がscore_azureおよびscore_gcpで使用可能です。資格情報IDの代わりに、credentialオブジェクト自身を指定することもできます。


credentials = dr.Credential.get(credential_id)

job = dr.BatchPredictionJob.score_s3(
    deployment_id,
    source_url=s3_csv_input_file,
    destination_url=s3_csv_output_file,
    credential=credentials,
)

予測の説明¶

目的の予測の説明パラメーターをジョブ設定に追加して、予測の説明を含めることができます。


job = dr.BatchPredictionJob.score_s3(
    deployment_id,
    source_url=s3_csv_input_file,
    destination_url=s3_csv_output_file,
    credential=credential_id,
    max_explanations=10,
    threshold_high=0.5,
    threshold_low=0.15,
)

AIカタログからCSVファイルへのスコアリング¶

入力にAIカタログを使用する場合、作成済みのデータセットのdataset_idが必要です。


import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "616d01a8ddbd17fc2c75caf4"
credential_id = "..."
dataset_id = "..."

dataset = dr.Dataset.get(dataset_id)

job = dr.BatchPredictionJob.score(
    deployment_id,
    intake_settings={
        'type': 'dataset',
        'dataset_id': dataset,
    },
    output_settings={
        'type': 'localFile',
    },
)

job.wait_for_completion()

JDBC PostgreSQLデータベースからのエンドツーエンドのスコアリング¶

以下は、表public.scoring_dataからのスコアリングデータセットで、スコアリングされたデータをpublic.scored_dataに保存します（表がすでに存在する場合）。


import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "616d01a8ddbd17fc2c75caf4"
credential_id = "..."
datastore_id = "..."

intake_settings = {
    'type': 'jdbc',
    'table': 'scoring_data',
    'schema': 'public',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
}

output_settings = {
    'type': 'jdbc',
    'table': 'scored_data',
    'schema': 'public',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
    'statement_type': 'insert'
}

job = dr.BatchPredictionJob.score(
    deployment_id,
    passthrough_columns_set='all',
    intake_settings=intake_settings,
    output_settings=output_settings,
)

print("started scoring...", job)
job.wait_for_completion()

JDBCスコアリングの詳細については、こちらを参照してください。

Snowflakeによるエンドツーエンドのスコアリング¶

以下の例では、テーブルpublic.SCORING_DATAからのスコアリングデータセットで、スコアリングされたデータをpublic.SCORED_DATAに保存します（テーブルがすでに存在する場合）。


import datarobot as dr
dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)
deployment_id = "616d01a8ddbd17fc2c75caf4"
credential_id = "..."
cloud_storage_credential_id = "..."
datastore_id = "..."
intake_settings = {
    'type': 'snowflake',
    'table': 'SCORING_DATA',
    'schema': 'PUBLIC',
    'external_stage': 'my_s3_stage_in_snowflake',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
    'cloud_storage_type': 's3',
    'cloud_storage_credential_id': cloud_storage_credential_id
}
output_settings = {
    'type': 'snowflake',
    'table': 'SCORED_DATA',
    'schema': 'PUBLIC',
    'statement_type': 'insert'
    'external_stage': 'my_s3_stage_in_snowflake',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
    'cloud_storage_type': 's3',
    'cloud_storage_credential_id': cloud_storage_credential_id
}
job = dr.BatchPredictionJob.score(
    deployment_id,
    passthrough_columns_set='all',
    intake_settings=intake_settings,
    output_settings=output_settings,
)
print("started scoring...", job)
job.wait_for_completion()

Snowflakeスコアリングの詳細については、入力および出力ドキュメントを参照してください。

Synapseによるエンドツーエンドのスコアリング¶

以下の例では、テーブルpublic.scoring_dataからのスコアリングデータセットで、スコアリングされたデータをpublic.scored_dataに保存します（テーブルがすでに存在する場合）。


import datarobot as dr
dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)
deployment_id = "616d01a8ddbd17fc2c75caf4"
credential_id = "..."
cloud_storage_credential_id = "..."
datastore_id = "..."
intake_settings = {
    'type': 'synapse',
    'table': 'SCORING_DATA',
    'schema': 'PUBLIC',
    'external_data_source': 'some_datastore',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
    'cloud_storage_credential_id': cloud_storage_credential_id
}
output_settings = {
    'type': 'synapse',
    'table': 'SCORED_DATA',
    'schema': 'PUBLIC',
    'statement_type': 'insert'
    'external_data_source': 'some_datastore',
    'data_store_id': datastore_id,
    'credential_id': credential_id,
    'cloud_storage_credential_id': cloud_storage_credential_id
}
job = dr.BatchPredictionJob.score(
    deployment_id,
    passthrough_columns_set='all',
    intake_settings=intake_settings,
    output_settings=output_settings,
)
print("started scoring...", job)
job.wait_for_completion()

Synapseスコアリングの詳細については、入力および出力ドキュメントを参照してください。

BigQueryによるエンドツーエンドスコアリング¶

次の例では、BigQueryテーブルからデータをスコアリングし、結果をBigQueryテーブルに送信しています。


import datarobot as dr

dr.Client(
    endpoint="https://app.datarobot.com/api/v2",
    token="...",
)

deployment_id = "616d01a8ddbd17fc2c75caf4"
gcs_credential_id = "6166c01ee91fb6641ecd28bd"

intake_settings = {
    'type': 'bigquery',
    'dataset': 'my-dataset',
    'table': 'intake-table',
    'bucket': 'my-bucket',
    'credential_id': gcs_credential_id,
}

output_settings = {
    'type': 'bigquery',
    'dataset': 'my-dataset',
    'table': 'output-table',
    'bucket': 'my-bucket',
    'credential_id': gcs_credential_id,
}

job = dr.BatchPredictionJob.score(
    deployment=deployment_id,
    intake_settings=intake_settings,
    output_settings=output_settings,
    include_prediction_status=True,
    passthrough_columns=["some_col_name"],
)

print("started scoring...", job)
job.wait_for_completion()

BigQueryスコアリングの詳細については、入力および出力ドキュメントを参照してください。

更新しました 2025年3月12日

このページは役に立ちましたか？

ありがとうございます。どのような点が役に立ちましたか？

より良いコンテンツを提供するには、どうすればよいでしょうか？

アンケートにご協力いただき、ありがとうございました。