Skip to content

Azure with Application Insights/Azure Monitor

This section shows how to configure the chart and provision the infrastructure to observe DataRobot on Azure managed services.

要件

OIDC provider and the managed identity assigned to the resource group with the VMs running the workloads must be configured first. Refer to the Azure - Kubernetes Service (AKS) documentation in the installation guide.

Once the managed identity is in place, an Application Insights resource needs to be set, and the previously created managed identity needs to get the Monitoring Metrics Publisher role assigned in the scope of this Application Insights Resource.

When it comes to metrics, it's advisable to use Prometheus (with an Azure Monitor Workspace) and Grafana for storage and visualization, respectively, considering the limitations of Azure's Application Insights.

Note: the azure commands below assume that AZURE_LOCATION has been set.

export AZURE_LOCATION="<AZURE_LOCATION>" 

Application Insights

Create a dedicated resource group first where all observability-related resources are grouped together:

RESOURCE_GROUP_NAME="<RESOURCE_GROUP_NAME>"
az group create --name "${RESOURCE_GROUP_NAME}" 

Then, the Application Insights resource itself needs to be created:

APP_INSIGHTS_NAME="<APP_INSIGHTS_NAME>"
az monitor app-insights component create \
    --app "${APP_INSIGHTS_NAME}" \
    --resource-group "${RESOURCE_GROUP_NAME}"

APP_INSIGHTS_ID=$(az monitor app-insights component show \
    --app "${APP_INSIGHTS_NAME}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --query id \
    --output tsv) 

The metrics publisher role needs to be assigned to the principal id of the managed identity, as it was mentioned in the beginning of the section:

MONITORING_METRICS_PUBLISHER_ROLE="Monitoring Metrics Publisher"
AKS_AGENTPOOL_PRINCIPAL_ID="<PRINCIPAL_ID>"
az role assignment create \
    --role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
    --scope "${APP_INSIGHTS_ID}" \
    --assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}" 

Monitor Workspace and Grafana

The previous section already provisions the resources for publishing logs, metrics, and traces to an Application Insights resource. However, it was mentioned that a Monitor Workspace (for Prometheus) and Grafana are recommended for metrics, due to the limitations of Application Insights.

MONITOR_WORKSPACE_NAME="<MONITOR_WORKSPACE_NAME>"
az monitor account create \
    --name "$MONITOR_WORKSPACE_NAME" \
    --resource-group "$RESOURCE_GROUP_NAME"

MONITOR_WORKSPACE_ID=$(az monitor account show \
    --name "$MONITOR_WORKSPACE_NAME" \
    --resource-group "$RESOURCE_GROUP_NAME" \
    --query id \
    --output tsv) 

Next, create the Grafana instance (version 11 is the currently available one):

GRAFANA_NAME="<GRAFANA_NAME>"
az grafana create \
    --name "${GRAFANA_NAME}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --sku "Standard" \
    --grafana-major-version 11

GRAFANA_ID=$(az grafana show \
    --name "${GRAFANA_NAME}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --query id \
    --output tsv)

GRAFANA_PRINCIPAL_ID=$(az grafana show \
    --name "${GRAFANA_NAME}" \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --query identity.principalId \
    --output tsv) 

The identity attached to the Grafana instance requires the Monitoring Data Reader role:

az role assignment create \
    --role "Monitoring Data Reader" \
    --scope "${MONITOR_WORKSPACE_ID}" \
    --assignee "${GRAFANA_PRINCIPAL_ID}" 

For writing to the workspace, the Monitoring Metrics Publisher role also needs to be assigned to the managed identity, in this case to the Data Collection Rule (Azure managed) associated with the workspace:

DCR_RESOURCE_GROUP_NAME="MA_${MONITOR_WORKSPACE_NAME}_${LOCATION}_managed"
DCR_NAME="${MONITOR_WORKSPACE_NAME}"

DCR_ID=$(
    az monitor data-collection rule show \
    --name "${DCR_NAME}" \
    --resource-group "${DCR_RESOURCE_GROUP_NAME}" \
    --query id \
    --output tsv
)

az role assignment create \
    --role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
    --scope "${DCR_ID}" \
    --assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}" \ 

Retrieving configuration values

Once everything is in place, obtain the input values for the chart and for configuring the Prometheus datasource for Grafana.

Managed identity client ID

Note that the resource group here refers to the one where the managed identity lives, not the previously created observability resources.

MANAGED_IDENTITY_RESOURCE_GROUP="<MANAGED_IDENTITY_RESOURCE_GROUP>"
MANAGED_IDENTITY_NAME="<MANAGED_IDENTITY_NAME>"

CLIENT_ID=$(az identity show \
  --resource-group "${MANAGED_IDENTITY_RESOURCE_GROUP}" \
  --name "${MANAGED_IDENTITY_NAME}" \
  --query "clientId" \
  -o tsv) 

Application Insights connection string

AZURE_MONITOR_CONNECTION_STRING=$(az monitor app-insights component show \
    --app "$APP_INSIGHTS_NAME" \
    --resource-group "$RESOURCE_GROUP_NAME" \
    --query 'connectionString' \
    --output tsv) 

Prometheus read endpoint

az monitor account show \
    --resource-group "${RESOURCE_GROUP_NAME}" \
    --name "${MONITOR_WORKSPACE_NAME}" \
    --query 'metrics.prometheusQueryEndpoint' \
    --output tsv 

Prometheus write endpoint

The write URL requires a bit of a detour since it goes through the DCR, and the write URL isn't a property of the monitor:

DCR_RESOURCE_ID=$(az monitor account show \
    --resource-group "$RESOURCE_GROUP_NAME" \
    --name "$MONITOR_WORKSPACE_NAME" \
    --query 'defaultIngestionSettings.dataCollectionRuleResourceId' \
    --output tsv
)
DCE_RESOURCE_ID=$(az monitor account show \
    --resource-group "$RESOURCE_GROUP_NAME" \
    --name "$MONITOR_WORKSPACE_NAME" \
    --query 'defaultIngestionSettings.dataCollectionEndpointResourceId' \
    --output tsv
)

DCE_INGESTION_URL=$(az monitor data-collection endpoint show \
    --ids "$DCE_RESOURCE_ID" \
    --query 'metricsIngestion.endpoint' \
    --output tsv)

DCR_IMMUTABLE_ID=$(az monitor data-collection rule show \
    --ids "$DCR_RESOURCE_ID" \
    --query 'immutableId' \
    --output tsv)

PROMETHEUS_REMOTE_WRITE_ENDPOINT="${DCE_INGESTION_URL}/dataCollectionRules/${DCR_IMMUTABLE_ID}/streams/Microsoft-PrometheusMetrics/api/v1/write?api-version=2023-04-24"

echo $PROMETHEUS_REMOTE_WRITE_ENDPOINT 

Configuring the Grafana datasource

Note: Grafana administrator role is required.

  1. Open the Connectors setting on the left menu
  2. Select + Add new datasources
  3. Select Prometheus
  4. Optionally, select the default toggle to make it the default datasource
  5. For Prometheus server URL, enter the URL from Prometheus read endpoint
  6. Under authentication a. Select Azure auth b. For Authentication, select Managed Identity
  7. Click Save & Test

Full chart configuration

The following configuration is added to the datarobot-prime chart values. Replace the placeholder values with the actual values obtained in the previous sections.

The Azure Monitor connection string is sensitive and must be stored in a Kubernetes secret. The secret name and key are configurable - they just need to match what is referenced in the secrets section of the chart configuration below. 例:

kubectl create secret -n <DR_CORE_NAMESPACE> generic datarobot-azure \
  --from-literal=connection-string="<AZURE_MONITOR_CONNECTION_STRING>" 

For additional exporter configuration options, refer to the upstream OpenTelemetry documentation for the azuremonitor and prometheusremotewrite exporters.

With Azure Monitor for logs/traces and Prometheus for metrics

global:
  observability:
    auth:
      azure:
        enabled: true
        clientId: <CLIENT_ID>

    secrets:
      - envVar: AZURE_MONITOR_CONNECTION_STRING
        secretName: datarobot-azure
        secretKey: connection-string

    exporters:
      azuremonitor:
        connection_string: ${env:AZURE_MONITOR_CONNECTION_STRING}
      prometheusremotewrite:
        endpoint: <PROMETHEUS_REMOTE_WRITE_ENDPOINT>
        auth:
          authenticator: azureauth

    signals:
      logs:
        exporters: [azuremonitor]
      metrics:
        exporters: [prometheusremotewrite]
      traces:
        exporters: [azuremonitor] 

With Azure Monitor only

If you prefer to use Azure Monitor for all signals (without a separate Prometheus workspace for metrics):

global:
  observability:
    auth:
      azure:
        enabled: true
        clientId: <CLIENT_ID>

    secrets:
      - envVar: AZURE_MONITOR_CONNECTION_STRING
        secretName: datarobot-azure
        secretKey: connection-string

    exporters:
      azuremonitor:
        connection_string: ${env:AZURE_MONITOR_CONNECTION_STRING}

    signals:
      logs:
        exporters: [azuremonitor]
      metrics:
        exporters: [azuremonitor]
      traces:
        exporters: [azuremonitor] 

各パラメーターについて説明します。

Setting auth.azure.enabled: true automatically:

  • Adds the azure.workload.identity/client-id annotation with the provided clientId to all collector serviceAccounts
  • Injects the azureauth extension for authenticating with Azure services