Skip to content

アプリケーション内で をクリックすると、お使いのDataRobotバージョンに関する全プラットフォームドキュメントにアクセスできます。

NVIDIA GPUでNeMo Guardrailsを使用した生成AI

本機能の提供について

The NVIDIA and NeMo Guardrails integrations are premium features. この機能を有効にする方法については、DataRobotの担当者または管理者にお問い合わせください。

DataRobotでNVIDIAを使用して、パフォーマンスの高速化を実現し、最高のオープンソースモデルとガードレールを活用することで、エンドツーエンドの生成AI (GenAI) 機能をすばやく構築できます。 The DataRobot integration with NVIDIA creates an inference software stack that provides full, end-to-end Generative AI capability, ensuring performance, governance, and safety through significant functionality out-of-the-box.

Create a GenAI model on NVIDIA resources

In the Registry, you can access the version history of your DataRobot, custom, and external models, including guard models for prompt injection monitoring, sentiment and toxicity classification, PII detection, and more.

In the example above, you can see a Llama 2 model in the Model directory, with a GPU tag. Opening this registered model reveals, below, four versions of the Llama 2 model, assembled and tested for deployment on an NVIDIA Resource bundle.

From the registered model version, review model Details, where the Resource bundle surfaces information about the NVIDIA resources used by the model. Opening the Related Items panel, click View to open the Llama 2 custom model in the model workshop.

From the Model workshop, open and review the Llama 2 custom model's Assemble tab to visualize how the model was constructed. For this model, you can see that DataRobot versions are built and tested. In the Environment section, you can see that the model is running on an [NVIDIA] Triton Inference Server base environment. DataRobot has natively built in the NVIDIA Triton Inference Server to provide extra acceleration for all of your GPU-based models as you build and deploy them onto NVIDIA devices.

In the Files section, you can view, modify, or add model files. In the Runtime Parameters section, you can provide any important information to pass dynamically at runtime to the build process. Modifications to the custom model on this tab create a new minor version.

Navigating to the custom model's Settings section, review the Resources settings, a section surfacing information about the resources provided to the model. In this example, you can see that the Llama 2 model is built to be tested and deployed on an NVIDIA A10 device.

Click Edit to open the Update resource settings dialog box and, in the resource Bundle settings, review the range of NVIDIA devices available as build environments in DataRobot.

DataRobot can deploy models onto any of these NVIDIA resource bundles:

バンドル GPU VRAM CPU RAM
GPU - S 1 x NVIDIA T4 16GB 4 16GB
GPU - M 1 x NVIDIA T4 16GB 8 32GB
GPU - L 1 x NVIDIA A10 24GB 8 32GB

After assembling a model for the NVIDIA Triton Inference Server, you can open the Test tab. From there, DataRobot can verify that the container passes the Startup and Prediction error tests. DataRobot offers a wide range of custom model testing capabilities.

In addition, on the Runtime Parameters tab, you can pass parameters to the model at runtime for testing, as you can in production, to make sure that this new model container built on an NVIDIA Triton Inference Server is configured correctly and ready for production.

モデルを運用環境にデプロイします。

Now that the Llama 2 model is assembled and tested on an NVIDIA resource bundle, you can register the model. In this example the model is already registered and the custom model is linked to the registered model version. DataRobot provides a direct connection to the registry's Model directory tab to review the model there.

Back in this registry, you can see multiple builds, allowing you to share and socialize this model within your organization to seek approval for deployment. From the registered model, you can clearly see the resource bundle that this model was tested on and should be deployed to.

In addition to the versioning provided by the model directory, the registry also provides a clear model lineage through the Related Items panel, allowing you to review the Custom Model, Custom Model Version, and Training Dataset used to create the registered model. The training data provides a baseline for drift monitoring when the model is deployed. Your organization can also add custom metadata to categorize and identify models in the registry based on specific business needs and the specific controls required for each category.

Now that the Llama 2 model is assembled, registered, and reviewed, deploy it to the appropriate NVIDIA A10 device. Once a registered model is deployed—as the Llama 2 model in this example is—you can view and access the deployment from the registry, on a model version's Deployments tab:

Opening the deployment, from the Overview tab, you can see that DataRobot has governed the approval process of this model's deployment. In the model History section, the Governance tab provides a persistent record indicating that the deployment was approved. From the Logs tab, you can see if the model was ever replaced, and if so, if there was a replacement reason. In this example, this model hasn't been replaced yet, as it was just recently deployed; however, over time, DataRobot will record the history of this deployed model and any additional models deployed behind this endpoint. In addition, you can see the Resources Bundle the Llama 2 model is deployed to, alongside information about the lineage of the model, with links to the model artifacts in the registry.

デプロイ済みモデルの監視

Once a model is deployed, DataRobot provides a wide range of monitoring tools, starting with IT metrics. Informational statistics—tile values—are based on your current settings for model and time frame (selected on the slider). スライダーの間隔値が週の場合、表示されるタイル指標は週に対応します。 Clicking a metric tile updates the chart below the tiles.

モニタリング > サービスの正常性には、以下の指標がレポートされます。

統計 選択した時間枠のレポート...
予測の合計数 デプロイで作成された予測の数。
リクエストの合計数 デプロイが受信した予測リクエストの数(単一のリクエストに複数の予測リクエストが含まれる場合があります)。
xミリ秒以上のリクエスト 指定されたミリ秒よりもレスポンス時間が長かったリクエストの数。 デフォルトは2000msです。ボックスをクリックして10~100,000msの時間を入力するか、コントロールを使用して値を調整します。
レスポンス時間 DataRobotが予測リクエストの受信、リクエストの計算、およびユーザーへの応答に要した時間(ミリ秒)。 レポートにはネットワークレイテンシーの時間は含まれません。 予測リクエスト時間の中央値、あるいは90番目、95番目、または99番目のパーセンタイルを選択します。 リクエストがなかったデプロイや外部デプロイの場合は、ダッシュ(-)が表示されます。
実行時間 DataRobotが予測リクエストの計算に要した時間(ミリ秒)。 予測リクエスト時間の中央値、あるいは90番目、95番目、または99番目のパーセンタイルを選択します。
負荷(コール数/分)の中央値 / 最高値 1分あたりの要求数の中央値と最大値。
データエラーの割合 4xxエラーが発生したリクエストの割合(予測リクエスト送信の問題)。
システムエラーの割合 5xxエラーが発生した適切な形式のリクエストのパーセンテージ(DataRobot予測サーバーの問題)。
コンシューマー数 このデプロイに対して予測リクエストを行った個々のユーザー(APIキーによって識別)の数。
キャッシュヒット率 キャッシュされたモデルを使用したリクエストのパーセンテージ(その他の予測で最近使用されたモデル)。 キャッシュされていない場合、モデルのルックアップが行われるので、遅延が発生することがあります。 デフォルトで予測サーバーのキャッシュには16のモデルが保持され、制限に達した場合は最も使用頻度が低いモデルが破棄されます。

In addition to service health, DataRobot tracks data drift independently for prompts and completions. In the prompt and response word clouds, you can identify which tokens are contributing most to drift when compared to the baseline established by the training dataset uploaded during model assembly (and viewable in the registered model version).

Implement NeMo Guardrails

In addition to the metrics available out of the box, DataRobot also provides a powerful interface to create custom metrics: from scratch, from a template, or through an integration with NeMo Guardrails. The integration with NeMo provides powerful rails to ensure your model stays on topic, using interventions to block prompts and completions if they violate the "on topic" principles provided by NeMo.

Alongside NeMo Guardrails, you can integrate the other guard models provided by DataRobot and track those metrics over time. For example, DataRobot provides a template for a personally identifiable information (PII) detection guard model, allowing you to scan a prompt for PII and sanitize that input before saving it to a prompt database. With custom metrics, DataRobot can:

  • Facilitate the human feedback loop by annotating each row in a deployment monitoring system with user feedback, if provided.

  • Monitor for prompt injection so you can create interventions to prevent them from reaching the model.

  • Monitor for sentiment and toxicity in prompts and responses.

  • Calculate operational metrics, like cost, by monitoring token use.


更新しました April 2, 2024