MLOps > デプロイ > MLOpsエージェント > 管理エージェント > 環境プラグインの設定

管理エージェント環境プラグインの設定¶

管理エージェントプラグインは、特定の予測環境でモデルをデプロイおよび管理します。管理エージェントはコマンドをプラグインに送信し、プラグインでコマンドを実行してコマンドのステータスを管理エージェントに返します。このインタラクションを容易にするために、プラグインの設定中に予測環境の詳細を提供し、プラグインがその環境でコマンドを実行できるようにします。たとえば、Kubernetesプラグインでは、Kubernetesクラスターでデプロイ（コンテナ）の起動、デプロイ内のモデルの置き換え、コンテナの停止などが行えます。

MLOps管理エージェントには、次のサンプルプラグインが含まれています。

ファイルシステム用プラグイン。
Dockerプラグイン。
Kubernetesプラグイン。
テスト用プラグイン。

備考

これらのサンプルプラグインは、datarobot_bosun-*-py3-none-any.whl wheelファイルの一部としてインストールされます。

サンプルプラグインの設定¶

次のプラグインの例では、管理エージェントで使用するために追加設定が必要です。

FilesystemDockerKubernetesテスト

管理エージェントとデプロイ間の通信を可能にするために、ファイルシステムプラグインは、ローカルファイルシステムのデプロイごとに1つのディレクトリを作成し、各デプロイのモデルパッケージとconfig.yamlファイルをデプロイのローカルディレクトリにダウンロードします。これらのアーティファクトを使用して、PPSコンテナから予測を提供できます。

plugin.filesystem.conf.yaml

# The top-level directory that will be used to store each deployment directory
baseDir: "."

# Each deployment directory will be prefixed with the following string
deploymentDirPrefix: "deployment_"

# The name of the deployment config file to create inside the deployment directory.
# Note: If working with the PPS, DO NOT change this name; the PPS expects this filename.
deploymentInfoFile: "config.yml"

# If defined, this string will be prefixed to the predictions URL for this deployment,
# and the URL will be returned, with the deployment id suffixed to the end with the
# /predict endpoint.
deploymentPredictionBaseUrl: "http://localhost:8080"

# If defined, create a yaml file with the kv of the deployment.
# If the name of the file is the same as the deploymentInfoFile,
# the key values are added to the same file as the other config.
# deploymentKVFile: "kv.yaml"

Dockerプラグインは、ネイティブのDataRobotモデルとカスタムモデルを Dockerサーバーにデプロイできます。さらに、プラグインは監視エージェントを自動実行して、デプロイ済みモデルを監視し、traefikリバースプロキシを使用して、デプロイごとに1つの予測エンドポイントを提供します。

管理エージェントのDockerプラグインは、ポータブル予測サーバーの使用をサポートしているため、単一のDockerコンテナで複数のモデルを提供できます。 PPSを各デプロイのモデルが配置されている場所を示すように設定すると、デプロイを開始、停止、および管理することができます。

Dockerプラグインでは、次のことができます。

デプロイ用のモデルパッケージをDataRobotから取得する。
Dockerコンテナ内でDataRobotモデルを起動する。
Dockerコンテナをシャットダウンし、クリーンアップする。
イベントを介してステータスを報告する。
監視エージェントを使用して予測を監視する。

Dockerプラグインを設定するには、次の手順を実行します。

Dockerプラグインに必要な環境を設定します。

docker pull rabbitmq:3-management
docker pull traefik:2.3.3
docker network create bosun

監視エージェントのコンテナイメージをビルドします。
```
cd datarobot_mlops_package-*/
cd tools/agent_docker
make build 
```
DataRobot UIからポータブル予測サーバーをダウンロードします。カスタムモデルのイメージを使う予定がある場合は、イメージが構築され、Dockerサービスからアクセス可能であることを確認します。

Dockerプラグイン設定ファイルを設定します。

plugin.docker.conf.yaml

# Docker network on which to run all containers.
# This network must be created prior to running
# the agent (i.e., 'docker network create <NAME>`)
dockerNetwork: "bosun"

# Traefik image to use
traefikImage: "traefik:2.3.3"

# Address that will be reported to DataRobot
outfacingPredictionURLPrefix: "http://10.10.12.22:81"

# MLOps Agent image to use for monitoring
agentImage: "datarobot/mlops-tracking-agent:latest"

# RabbitMQ image to use for building a channel
rabbitmqImage: "rabbitmq:3-management"

# PPS base image
ppsBaseImage: "datarobot/datarobot-portable-prediction-api:latest"

# Prefix for generated images
generatedImagePrefix: "mlops_"

# Prefix for running containers
containerNamePrefix: "mlops_"

# Mapping of traefik proxy ports (not mandatory)
traefikPortMapping:
    80: 81
    8080: 8081

# Mapping of RabbitMQ (not mandatory)
rabbitmqPortMapping:
    15672: 15673
    5672: 5673

DataRobotは、追加のコードを記述することなく、Kubernetesクラスターでモデルをデプロイし管理するためのプラグインを提供します。設定情報については、tarballのtools/charts/datarobot-management-agentフォルダー内のREADMEファイルを参照してください。

plugin.k8s.conf.yaml

## The following settings are related to connecting to your Kubernetes cluster
#
# The name of the kube-config context to use (similar to --context argument of kubectl). There is a special
# `IN_CLUSTER` string to be used if you are running the plugin inside a cluster. The default is "IN_CLUSTER"
# kubeConfigContext: IN_CLUSTER

# The namespace that you want to create and manage external deployments (similar to --namespace argument of kubectl). You
# can leave this as `null` to use the "default" namespace, the namespace defined in your context, or (if running `IN_CLUSTER`)
# manage resources in the same namespace the plugin is executing in.
# kubeNamespace:

## The following settings are related to whether or not MLOps monitoring is enabled
#
# We need to know the location of the dockerized agent image that can be launched into your Kubernetes cluster.
# You can build the image by running `make build` in the tools/agent_docker/ directory and retagging the image
# and pushing it to your registry.
# agentImage: "<FILL-IN-DOCKER-REGISTRY>/mlops-tracking-agent:latest"

## The following settings are all related to accessing the model from outside the Kubernetes cluster
#
# The URL prefix used to access the deployed model, i.e., https://example.com/deployments/
# The model will be accessible via <outfacingPredictionURLPrefix/<model_id>/predict
outfacingPredictionURLPrefix: "<FILL-CORRECT-URL-FOR-K8S-INGRESS>"

# We are still using the beta Ingress resource API, so a class must be provided. If your cluster
# doesn't have a default ingress class, please provide one.
# ingressClass:

## The following settings are all related to building the finalized model image (base image + mlpkg)
#
# The location of the Portable Prediction Server base image. You can download it from DataRobot's developer
# tools section, retag it, and push it to your registry.
ppsBaseImage: "<FILL-IN-DOCKER-REGISTRY>/datarobot-portable-prediction-api:latest"

# The Docker repo to which this plugin can push finalized models. The built images will be tagged
# as follows: <generatedImageRepo>:m-<model_pkg_id>
generatedImageRepo: "<FILL-IN-DOCKER-REGISTRY>/mlops-model"

# We use Kaniko to build our finalized image. See https://github.com/GoogleContainerTools/kaniko#readme.
# The default is to use the image below.
# kanikoImage: "gcr.io/kaniko-project/executor:v1.5.2"

# The name of the Kaniko ConfigMap to use. This provides the settings Kaniko will need to be able to push to
# your registry type. See https://github.com/GoogleContainerTools/kaniko#pushing-to-different-registries.
# The default is to not use any additional configuration.
# kanikoConfigmapName: "docker-config"

# The name of the Kaniko Secret to use. This provides the settings Kaniko will need to be able to push to
# your registry type. See https://github.com/GoogleContainerTools/kaniko#pushing-to-different-registries.
# The default is to not use any additional secrets. The secret must be of the type: kubernetes.io/dockerconfigjson
# kanikoSecretName: "registry-credentials"

# The name of a service account to use for running Kaniko if you want to run it in a more secure fashion.
# See https://github.com/GoogleContainerTools/kaniko#security.
# The default is to use the "default" service account in the namespace in which the pod runs.
# kanikoServiceAccount: default

テストプラグインを設定するには、--plugin testオプションを使用して、テストプラグインによって実行される各アクションの一時ディレクトリとスリープ時間（秒単位）を設定します。たとえば、以下のテストプラグイン設定で設定されたlaunch_time_secデプロイは、デプロイ用の一時ファイルを作成し、1秒間スリープしてから返します。

plugin.test.conf.yaml

tmp_dir: "/tmp"
launch_time_sec: 1
stop_time_sec: 1
replace_model_time_sec: 1
pe_status_time_sec: 1
deployment_status_time_sec: 1
deployment_list_time_sec: 1
plugin_start_time: 1
plugin_stop_time: 1

カスタムプラグインの作成¶

管理エージェントのプラグインフレームワークは、カスタムプラグインに対応できる柔軟性を備えています。この柔軟性は、モデルをデプロイするカスタム予測環境（たとえば、標準のDockerやKubernetes環境とは異なる）がある場合に役立ちます。このような予測環境用のプラグインを実装するには、既存のプラグインを変更するか、ゼロから実装します。カスタムPythonプラグインの作成時に、filesystemプラグインをリファレンスとして使用できます。

備考

現在、カスタムJavaプラグインはサポートされていません。

カスタムプラグインを作成する場合は、次のセクションで、Pythonプラグインの作成に提供されるインターフェイス定義について説明します。

プラグインインターフェイスの実装¶

管理エージェントのPythonパッケージは、抽象基本クラス BosunPluginBaseを定義します。各管理エージェントプラグインは、この基本クラスによって定義されたインターフェイスを継承して実装する必要があります。

カスタムプラグイン（以下のSamplePlugin）の実装を開始するには、BosunPluginBase基本クラスを継承します。例として、ファイルsample_plugin.pyのsample_pluginディレクトリの下にプラグインを実装します。

class SamplePlugin(BosunPluginBase):
    def __init__(self, plugin_config, private_config_file=None, pe_info=None, dry_run=False):

Pythonプラグインの引数¶

コンストラクタは次の引数で呼び出されます。

引数	定義
`plugin_config`	プラグインに関する一般的な情報を含む辞書。詳細については、次のセクションで説明します。
`private_config_file`	`bosun-plugin-runner`スクリプトを呼び出すときに`--private-config`フラグによって渡される、プラグインのプライベート設定ファイルへのパス。このファイルはオプションであり、コンテンツは完全にカスタムプラグインに依存しています。
`pe_info`	予測環境に関する情報を含む`PEInfo`のインスタンス。このパラメーターは、特定のアクションに対して設定されていません。
`dry_run`	予行演習（開発）または実際の実行での呼び出し。

Pythonプラグインのメソッド¶

このクラスでは、次のメソッドを実装します。

備考

次の各関数の戻り値の型はActionStatusInfoです。

def plugin_start(self):

このメソッドはプラグインを初期化します。たとえば、プラグインが予測環境（Docker、Kubernetesなど）に接続できるかどうかを確認できます。ファイルシステムプラグインの場合、このメソッドはファイルシステムにbaseDirが存在するかどうかを確認します。管理エージェントは、通常、起動プロセス中に1回だけこのメソッドを呼び出します。このメソッドは、デプロイ固有のアクションが呼び出される前に呼び出されることが保証されています。

def plugin_stop(self):

このメソッドは、予測環境へのクライアント接続を閉じるなど、破棄プロセスを実装します。管理エージェントは、通常、シャットダウンプロセス中に1回だけこのメソッドを呼び出します。このプラグインメソッドは、デプロイ固有のすべてのアクションが完了した後に呼び出されることが保証されています。

def deployment_list(self):

このメソッドは、指定された予測環境で既に実行されているデプロイのリストを返します。管理エージェントは通常、起動時にこのメソッドを呼び出して、予測環境で既に実行されているデプロイを判断します。デプロイのリストは、ActionStatusInfoのdataフィールドを使用して、deployment_id -> デプロイ情報のマップとして返されます（以下で説明します）

def deployment_start(self, deployment_info):

このメソッドは、デプロイの起動プロセスを実装します。 DataRobotでデプロイが作成またはアクティブ化されると、管理エージェントはこのメソッドを呼び出します。たとえば、このメソッドはKubernetesまたはDockerサービスでコンテナを起動できます。ファイルシステムプラグインの場合、このメソッドはdeployment_<deployment_id>という名前のディレクトリを作成します。次に、デプロイのモデルとYAML設定ファイルを新しいディレクトリに配置します。プラグインは、デプロイID、理想的にはデプロイIDとモデルIDの組み合わせにより、予測環境でのデプロイを一意に識別できるようにする必要があります。たとえば、組み込みのDockerプラグインは、次の名前deployment_<deployment_id>_<model-id>のコンテナを起動します。

def deployment_stop(self, deployment_info):

このメソッドは、デプロイ停止プロセスを実装します。 DataRobotでデプロイが非アクティブ化または削除されると、管理エージェントはこのメソッドを呼び出します。たとえば、このメソッドはKubernetesまたはDockerサービスでコンテナを停止できます。 deployment_infoのデプロイIDとモデルIDは、停止する必要があるコンテナを一意に識別します。ファイルシステムプラグインの場合、このメソッドは、deployment_startメソッドによってそのデプロイ用に作成されたディレクトリを削除します。

def deployment_replace_model(self, deployment_info):

このメソッドは、デプロイのモデル置換プロセスを実装します。 DataRobotのデプロイでモデルが置き換えられると、管理エージェントはこのメソッドを呼び出します。 modelArtifactには新しいモデルへのパス、newModelIdには置き換えに使用する新しいモデルのIDが含まれています。 DockerまたはKubernetesプラグインの場合、このメソッドの潜在的な実装は、古いモデルIDのコンテナを停止してから、新しいモデルの新しいコンテナを開始する可能性があります。ファイルシステムプラグインの場合、古いデプロイディレクトリを削除し、新しいモデルで新しいディレクトリを作成します。

def pe_status(self):

このメソッドは、予測環境のステータス（KubernetesまたはDockerサービスにアクセスできるかどうかなど）をクエリーします。管理エージェントは定期的にこのメソッドを呼び出して、予測環境が良好な状態であることを確認します。エクスペリエンスを向上させるために、プラグインは予測環境自体のステータスに加えて、予測環境で実行されているデプロイのステータスのクエリーをサポートできます。この場合、デプロイのIDはpeInfo構造のdeploymentsフィールドに含まれ（後述）、各デプロイのステータスはActionStatusInfoオブジェクトのdataフィールドを使用して返されます（後述）。デプロイステータスは、deployment_idからデプロイ情報へのマップとして返されます。

def deployment_status(self):

このメソッドは、予測環境にデプロイされたデプロイのステータスをクエリーします。たとえば、デプロイに対応するコンテナがまだ稼働しているかどうかなどです。管理エージェントは定期的にこのメソッドを呼び出して、デプロイが良好な状態であることを確認します。

def deployment_relaunch(self, deployment_info):

このメソッドは、デプロイの再起動（停止および開始）のプロセスを実装します。管理エージェントのPythonパッケージでは、deployment_stopに続いてdeployment_startを呼び出すことで、このメソッドのデフォルトの実装が既に提供されています。ただし、デプロイを再起動する最良の方法がある場合は、プラグインで独自の再起動メカニズムを実装できます。

Pythonプラグインの戻り値¶

これらすべての操作の戻り値は、アクションのステータスを提供するActionStatusInfoオブジェクトです。

class ActionStatusInfo:
    def __init__(self, status, msg=None, state=None, duration=None, data=None):

このオブジェクトには次のフィールドが含まれます。

フィールド	定義
`status`	アクションのステータスを示します。値： `ActionStatus.OK`、`ActionStatus.WARN`、`ActionStatus.ERROR`、`ActionStatus.UNKNOWN`
`msg`	プラグインが管理エージェントに転送できる`string`タイプメッセージを入力します。管理エージェントはメッセージをMLOpsサービス（DataRobot）に転送します。
`state`	アクション実行後のデプロイの状態を示します。値： `ready`、`stopped`、`errored`.
`duration`	アクションの実行にかかった時間を示します。
`data`	プラグインが管理エージェントに転送できる情報を返します。現在、`deployment_list`メソッドはこのフィールドを使用して、`deployment_id`からデプロイ情報へのディクショナリ形式でデプロイを一覧表示します。このフィールドは、予測環境のステータスに加えて、予測環境で実行されているデプロイのステータスを報告するために`pe_status`メソッドでも使用できます。

備考

基本クラスは自動的にオブジェクトにtimestampを追加して、さまざまなアクションステータス値を追跡します。

bosun-plugin-runnerの使用¶

管理エージェントのPythonパッケージには、カスタムプラグインクラスを呼び出して特定のアクションを実行できるbosun-plugin-runner CLIツールが用意されています。このツールを使用すると、プラグインの開発およびデバッグ中にプラグインをスタンドアロンモードで実行できます。

例：

bosun-plugin-runner \
    --plugin sample_plugin/sample_plugin \
    --action pe_status \
    --config sample_configs/action_config_pe_status_only.yaml \
    --private-config sample_configs/sample_plugin_config.yaml \
    --status-file /tmp/status.yaml \
    --show-status

bosun-plugin-runnerは次の引数を受け入れます。

引数	定義
`--plugin`	プラグインクラスを含むモジュールを指定します。この場合、プラグインクラスはサンプル_plugin.pyファイルのサンプル_プラグインディレクトリ内にあるため、サンプル_plugin/sample_プラグインを使用しました。
`--action`	実行するアクションを指定します。ここでは、`pe_status`アクションを使用します。サポート対象のその他のアクションを以下に示します。
`--config`	指定されたアクションに使用する設定ファイルを提供します。これについては、次のセクションで詳しく説明します。プラグインが管理エージェントサービスの一部として実行されると、このファイルが生成されますが、特定のアクションを`bosun-plugin-runner`経由で手動でテストする場合は、設定ファイルを自身で生成する必要があります。
`--private-config`	プラグインのみが使用するプラグイン固有の設定ファイルを提供します。
`--status-file`	アクションの結果として生じるプラグインのステータス保存用のパスを提供します。
`--show-status`	標準出力で`--status-file`の内容を表示します。

bosun-plugin-runnerでサポートされているアクションのリストを表示するには、--list-actionsオプションを使用します。

bosun-plugin-runner --list-actions
# plugin_start
# plugin_stop
# deployment_start
# deployment_stop
# deployment_replace_model
# deployment_status
# pe_status
# deployment_list

アクション設定ファイルの作成¶

YAML設定ファイルをプラグインに渡すために--configのフラグを使用します。これは、管理エージェントがプラグインアクションを準備して呼び出す設定構造です。ただし、プラグインの開発中に、この設定ファイルを自身で作成する必要がある場合があります。

このような設定ファイルの典型的な内容を以下に示します。

pluginConfig:
  name: "ExternalCommand-1"
  type: "ExternalCommand"
  platform: "os"
  commandPrefix: "python3 sample_plugin.py"
  mlopsUrl: "https://app.datarobot.com"

peInfo:
   id: "0x2345"
   name: "Sample-PE"
   description: "some description"
   createdOn: "iso formatted date"
   createdBy: "some username"
   deployments: ["deployment-1", "deployment-2"]
   keyValueConfig:
    max_models: 5

deploymentInfo:
  id: "deployment-1"
  name: "deployment-1"
  description: "Deployment 1 for testing"
  modelId: "model-A"
  modelArtifact: "/tmp/model-A.txt"
  modelExecutionType: "dedicated"
  keyValueConfig:
    key1: "some-value-for-key-1"

アクション設定ファイルには、pluginConfig、peInfoおよびdeploymentInfoの3つのセクションが含まれています。

pluginConfigのセクションには、予測環境のID、そのタイプ、プラットフォームなど、プラグインに関する一般的な情報が含まれています。また、MLOpsサービス (DataRobot) のアドレスであるmlopsUrlが含まれる場合もあります（プラグインを接続する場合）。これは、pluginConfig辞書に変換され、コンストラクタの引数として渡されるセクションです。

peInfoのセクションには、このアクションが参照する予測環境に関する情報が含まれています。通常、この情報はpe_statusアクションに使用されます。 deploymentsキーに有効なデプロイIDが含まれている場合、プラグインは予測環境のステータスだけでなく、deploymentsの下にリストされているデプロイステータスも返すことが想定されます。

deploymentInfoのセクションには、このアクションが参照する予測環境でのデプロイに関する情報が含まれています。デプロイ関連のすべてのアクションは、このセクションを使用して、作業するデプロイとモデルを識別します。これは設定において特に重要なセクションであるため、重要なフィールドのいくつかについて説明します。

id、name、およびdescription：DataRobotで設定されたデプロイに関する情報を提供します。
modelId、 modelArtifact：モデルのIDとモデルが存在するパスを示します。管理エージェントは、deployment_startまたはdeployment_replace_modelを呼び出す前に正しいモデルをこのパスに配置します。
keyValueConfig：デプロイの追加設定を一覧表示します。この追加設定は、DataRobotのデプロイで行えます。たとえば、これを使用して、このデプロイに対応するコンテナが使用するメモリー量を指定できます。

bosun-plugin-runnerでアクションの実行¶

上記のように、プラグインの開発中に、bosun-plugin-runnerを使用してアクションを呼び出すことができます。たとえば、deployment_startアクションを呼び出す方法を以下に示します。前のセクションで説明したのと同じ設定を使用して、sample_configs/config_deployment-1_model-A.yamlファイルにダンプします。

bosun-plugin-runner \
    --plugin sample_plugin/sample_plugin \
    --config sample_configs/action_config_deployment_1_model_A.yaml \
    --private-config sample_configs/sample_plugin_config.yaml \
    --action deployment_start \
    --status-file /tmp/status.yaml \
    --show-status

このdeployment_startアクションのステータスはファイル/tmp/status.yamlにキャプチャされます

コマンドプレフィックスの設定¶

プラグインで管理エージェントの準備が整ったので、管理エージェント設定ファイルでcommandプレフィックスを次のように設定できます。

    command: "<BOSUN_VENV_PATH>/bin/bosun-plugin-runner --plugin sample_plugin --private-config <CONF_PATH>/plugin.sample_plugin_.conf.yaml"

サンプルプラグインは、管理エージェントのPythonパッケージと同じ仮想環境にインストールする必要があります。プラグインのプライベート設定ファイルのパスが正しく設定されていることを確認してください。