Azure with Application Insights/Azure Monitor¶
This section shows how to configure the chart and provision the infrastructure to observe DataRobot on Azure managed services.
Requirements¶
OIDC provider and the managed identity assigned to the resource group with the VMs running the workloads must be configured first. Refer to the Azure - Kubernetes Service (AKS) documentation in the installation guide.
Once the managed identity is in place, an Application Insights resource needs to
be set, and the previously created managed identity needs to get the Monitoring
Metrics Publisher role assigned in the scope of this Application Insights
Resource.
When it comes to metrics, it's advisable to use Prometheus (with an Azure Monitor Workspace) and Grafana for storage and visualization, respectively, considering the limitations of Azure's Application Insights.
Note: the azure commands below assume that AZURE_LOCATION has been set.
export AZURE_LOCATION="<AZURE_LOCATION>"
Application Insights¶
Create a dedicated resource group first where all observability-related resources are grouped together:
RESOURCE_GROUP_NAME="<RESOURCE_GROUP_NAME>"
az group create --name "${RESOURCE_GROUP_NAME}"
Then, the Application Insights resource itself needs to be created:
APP_INSIGHTS_NAME="<APP_INSIGHTS_NAME>"
az monitor app-insights component create \
--app "${APP_INSIGHTS_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}"
APP_INSIGHTS_ID=$(az monitor app-insights component show \
--app "${APP_INSIGHTS_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query id \
--output tsv)
The metrics publisher role needs to be assigned to the principal id of the managed identity, as it was mentioned in the beginning of the section:
MONITORING_METRICS_PUBLISHER_ROLE="Monitoring Metrics Publisher"
AKS_AGENTPOOL_PRINCIPAL_ID="<PRINCIPAL_ID>"
az role assignment create \
--role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
--scope "${APP_INSIGHTS_ID}" \
--assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}"
Monitor Workspace and Grafana¶
The previous section already provisions the resources for publishing logs, metrics, and traces to an Application Insights resource. However, it was mentioned that a Monitor Workspace (for Prometheus) and Grafana are recommended for metrics, due to the limitations of Application Insights.
MONITOR_WORKSPACE_NAME="<MONITOR_WORKSPACE_NAME>"
az monitor account create \
--name "$MONITOR_WORKSPACE_NAME" \
--resource-group "$RESOURCE_GROUP_NAME"
MONITOR_WORKSPACE_ID=$(az monitor account show \
--name "$MONITOR_WORKSPACE_NAME" \
--resource-group "$RESOURCE_GROUP_NAME" \
--query id \
--output tsv)
Next, create the Grafana instance (version 11 is the currently available one):
GRAFANA_NAME="<GRAFANA_NAME>"
az grafana create \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--sku "Standard" \
--grafana-major-version 11
GRAFANA_ID=$(az grafana show \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query id \
--output tsv)
GRAFANA_PRINCIPAL_ID=$(az grafana show \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query identity.principalId \
--output tsv)
The identity attached to the Grafana instance requires the Monitoring Data
Reader role:
az role assignment create \
--role "Monitoring Data Reader" \
--scope "${MONITOR_WORKSPACE_ID}" \
--assignee "${GRAFANA_PRINCIPAL_ID}"
For writing to the workspace, the Monitoring Metrics Publisher role also needs
to be assigned to the managed identity, in this case to the Data Collection Rule
(Azure managed) associated with the workspace:
DCR_RESOURCE_GROUP_NAME="MA_${MONITOR_WORKSPACE_NAME}_${LOCATION}_managed"
DCR_NAME="${MONITOR_WORKSPACE_NAME}"
DCR_ID=$(
az monitor data-collection rule show \
--name "${DCR_NAME}" \
--resource-group "${DCR_RESOURCE_GROUP_NAME}" \
--query id \
--output tsv
)
az role assignment create \
--role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
--scope "${DCR_ID}" \
--assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}" \
Retrieving configuration values¶
Once everything is in place, obtain the input values for the chart and for configuring the Prometheus datasource for Grafana.
Managed identity client ID¶
Note that the resource group here refers to the one where the managed identity lives, not the previously created observability resources.
MANAGED_IDENTITY_RESOURCE_GROUP="<MANAGED_IDENTITY_RESOURCE_GROUP>"
MANAGED_IDENTITY_NAME="<MANAGED_IDENTITY_NAME>"
CLIENT_ID=$(az identity show \
--resource-group "${MANAGED_IDENTITY_RESOURCE_GROUP}" \
--name "${MANAGED_IDENTITY_NAME}" \
--query "clientId" \
-o tsv)
Application Insights connection string¶
AZURE_MONITOR_CONNECTION_STRING=$(az monitor app-insights component show \
--app "$APP_INSIGHTS_NAME" \
--resource-group "$RESOURCE_GROUP_NAME" \
--query 'connectionString' \
--output tsv)
Prometheus read endpoint¶
az monitor account show \
--resource-group "${RESOURCE_GROUP_NAME}" \
--name "${MONITOR_WORKSPACE_NAME}" \
--query 'metrics.prometheusQueryEndpoint' \
--output tsv
Prometheus write endpoint¶
The write URL requires a bit of a detour since it goes through the DCR, and the write URL isn't a property of the monitor:
DCR_RESOURCE_ID=$(az monitor account show \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$MONITOR_WORKSPACE_NAME" \
--query 'defaultIngestionSettings.dataCollectionRuleResourceId' \
--output tsv
)
DCE_RESOURCE_ID=$(az monitor account show \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$MONITOR_WORKSPACE_NAME" \
--query 'defaultIngestionSettings.dataCollectionEndpointResourceId' \
--output tsv
)
DCE_INGESTION_URL=$(az monitor data-collection endpoint show \
--ids "$DCE_RESOURCE_ID" \
--query 'metricsIngestion.endpoint' \
--output tsv)
DCR_IMMUTABLE_ID=$(az monitor data-collection rule show \
--ids "$DCR_RESOURCE_ID" \
--query 'immutableId' \
--output tsv)
PROMETHEUS_REMOTE_WRITE_ENDPOINT="${DCE_INGESTION_URL}/dataCollectionRules/${DCR_IMMUTABLE_ID}/streams/Microsoft-PrometheusMetrics/api/v1/write?api-version=2023-04-24"
echo $PROMETHEUS_REMOTE_WRITE_ENDPOINT
Configuring the Grafana datasource¶
Note: Grafana administrator role is required.
- Open the Connectors setting on the left menu
- Select + Add new datasources
- Select Prometheus
- Optionally, select the default toggle to make it the default datasource
- For Prometheus server URL, enter the URL from Prometheus read endpoint
- Under authentication a. Select Azure auth b. For Authentication, select Managed Identity
- Click Save & Test
Full chart configuration¶
The following configuration is added to the datarobot-prime chart values.
Replace the placeholder values with the actual values obtained in the previous
sections.
The Azure Monitor connection string is sensitive and must be stored in a
Kubernetes secret. The secret name and key are configurable - they just need
to match what is referenced in the secrets section of the chart configuration
below. For example:
kubectl create secret -n <DR_CORE_NAMESPACE> generic datarobot-azure \
--from-literal=connection-string="<AZURE_MONITOR_CONNECTION_STRING>"
For additional exporter configuration options, refer to the upstream OpenTelemetry documentation for the azuremonitor and prometheusremotewrite exporters.
With Azure Monitor for logs/traces and Prometheus for metrics¶
global:
observability:
auth:
azure:
enabled: true
clientId: <CLIENT_ID>
secrets:
- envVar: AZURE_MONITOR_CONNECTION_STRING
secretName: datarobot-azure
secretKey: connection-string
exporters:
azuremonitor:
connection_string: ${env:AZURE_MONITOR_CONNECTION_STRING}
prometheusremotewrite:
endpoint: <PROMETHEUS_REMOTE_WRITE_ENDPOINT>
auth:
authenticator: azureauth
signals:
logs:
exporters: [azuremonitor]
metrics:
exporters: [prometheusremotewrite]
traces:
exporters: [azuremonitor]
With Azure Monitor only¶
If you prefer to use Azure Monitor for all signals (without a separate Prometheus workspace for metrics):
global:
observability:
auth:
azure:
enabled: true
clientId: <CLIENT_ID>
secrets:
- envVar: AZURE_MONITOR_CONNECTION_STRING
secretName: datarobot-azure
secretKey: connection-string
exporters:
azuremonitor:
connection_string: ${env:AZURE_MONITOR_CONNECTION_STRING}
signals:
logs:
exporters: [azuremonitor]
metrics:
exporters: [azuremonitor]
traces:
exporters: [azuremonitor]
Where:
<CLIENT_ID>: see Managed Identity client ID<PROMETHEUS_REMOTE_WRITE_ENDPOINT>: see Prometheus write endpoint
Setting auth.azure.enabled: true automatically:
- Adds the
azure.workload.identity/client-idannotation with the providedclientIdto all collector serviceAccounts - Injects the
azureauthextension for authenticating with Azure services