Azure with Application Insights/Azure Monitor¶
This section shows how to configure the chart and provision the infrastructure to observe DataRobot on Azure managed services.
Requirements¶
OIDC provider and the managed identity assigned to the resource group with the VMs running the workloads must be configured first. Refer to the Azure - Kubernetes Service (AKS) documentation in the installation guide.
Once the managed identity is in place, an Application Insights resource needs to
be set, and the previously created managed identity needs to get the Monitoring
Metrics Publisher role assigned in the scope of this Application Insights
Resource.
When it comes to metrics, it’s advisable to use Prometheus (with an Azure Monitor Workspace) and Grafana for storage and visualization, respectively, considering the limitations of Azure’s Application Insights.
Note: the azure commands below assume that AZURE_LOCATION has been set.
export AZURE_LOCATION="<AZURE_LOCATION>"
Application Insights¶
A dedicated resource group can be first created where all the observability related resources will be grouped together:
RESOURCE_GROUP_NAME="<RESOURCE_GROUP_NAME>"
az group create --name "${RESOURCE_GROUP_NAME}"
Then, the Application Insights resource itself needs to be created:
APP_INSIGHTS_NAME="<APP_INSIGHTS_NAME>"
az monitor app-insights component create \
--app "${APP_INSIGHTS_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}"
APP_INSIGHTS_ID=$(az monitor app-insights component show \
--app "${APP_INSIGHTS_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query id \
--output tsv)
The metrics publisher role needs to be assigned to the principal id of the managed identity, as it was mentioned in the beginning of the section:
MONITORING_METRICS_PUBLISHER_ROLE="Monitoring Metrics Publisher"
AKS_AGENTPOOL_PRINCIPAL_ID="<PRINCIPAL_ID>"
az role assignment create \
--role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
--scope "${APP_INSIGHTS_ID}" \
--assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}"
Monitor Workspace and Grafana¶
The previous section already provisions the resources for publishing logs, metrics and traces to an Application Insights resource. However, it was mentioned that a Monitor Workspace (for Prometheus) and Grafana are recommended for metrics, due to the limitations of Application Insights.
MONITOR_WORKSPACE_NAME="<MONITOR_WORSKPACE_NAME>"
az monitor account create \
--name "$MONITOR_WORKSPACE_NAME" \
--resource-group "$RESOURCE_GROUP_NAME"
MONITOR_WORKSPACE_ID=$(az monitor account show \
--name "$MONITOR_WORKSPACE_NAME" \
--resource-group "$RESOURCE_GROUP_NAME" \
--query id \
--output tsv)
Next, the Grafana instance (version 11 the currently available one):
GRAFANA_NAME="<GRAFANA_NAME>"
az grafana create \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--sku "Standard" \
--grafana-major-version 11
GRAFANA_ID=$(az grafana show \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query id \
--output tsv)
GRAFANA_PRINCIPAL_ID=$(az grafana show \
--name "${GRAFANA_NAME}" \
--resource-group "${RESOURCE_GROUP_NAME}" \
--query identity.principalId \
--output tsv)
The identity attached to the Grafana instance requires the Monitoring Data
Reader role:
az role assignment create \
--role "Monitoring Data Reader" \
--scope "${MONITOR_WORKSPACE_ID}" \
--assignee "${GRAFANA_PRINCIPAL_ID}"
For writing to the workspace, the Monitoring Metrics Publisher role also needs
to be assigned to the managed identity, in this case to the Data Collection Rule
(Azure managed) associated with the workspace:
DCR_RESOURCE_GROUP_NAME="MA_${MONITOR_WORKSPACE_NAME}_${LOCATION}_managed"
DCR_NAME="${MONITOR_WORKSPACE_NAME}"
DCR_ID=$(
az monitor data-collection rule show \
--name "${DCR_NAME}" \
--resource-group "${DCR_RESOURCE_GROUP_NAME}" \
--query id \
--output tsv
)
az role assignment create \
--role "${MONITORING_METRICS_PUBLISHER_ROLE}" \
--scope "${DCR_ID}" \
--assignee "${AKS_AGENTPOOL_PRINCIPAL_ID}" \
Retrieving configuration values¶
Once we have all in place, we need to know the input values for our chart, as well as for configuring the Prometheus datasource for Grafana.
Managed identity client ID¶
Note that the resource group here refers to the one where the managed identity lives, not the previously created observability resources.
MANAGED_IDENTITY_RESOURCE_GROUP="<MANAGED_IDENTITY_RESOURCE_GROUP>"
MANAGED_IDENTITY_NAME="<MANAGED_IDENTITY_NAME>"
CLIENT_ID=$(az identity show \
--resource-group "${MANAGED_IDENTITY_RESOURCE_GROUP}" \
--name "${MANAGED_IDENTITY_NAME}" \
--query "clientId" \
-o tsv)
Application Insights connection string¶
AZURE_MONITOR_CONNECTION_STRING=$(az monitor app-insights component show \
--app "$APP_INSIGHTS_NAME" \
--resource-group "$RESOURCE_GROUP_NAME" \
--query 'connectionString' \
--output tsv)
Prometheus read endpoint¶
az monitor account show \
--resource-group "${RESOURCE_GROUP_NAME}" \
--name "${MONITOR_WORKSPACE_NAME}" \
--query 'metrics.prometheusQueryEndpoint' \
--output tsv
Prometheus write endpoint¶
The write URL requires a bit of a detour since it goes through the DCR, and the write URL is not a property of the monitor:
DCR_RESOURCE_ID=$(az monitor account show \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$MONITOR_WORKSPACE_NAME" \
--query 'defaultIngestionSettings.dataCollectionRuleResourceId' \
--output tsv
)
DCE_RESOURCE_ID=$(az monitor account show \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$MONITOR_WORKSPACE_NAME" \
--query 'defaultIngestionSettings.dataCollectionEndpointResourceId' \
--output tsv
)
DCE_INGESTION_URL=$(az monitor data-collection endpoint show \
--ids "$DCE_RESOURCE_ID" \
--query 'metricsIngestion.endpoint' \
--output tsv)
DCR_IMMUTABLE_ID=$(az monitor data-collection rule show \
--ids "$DCR_RESOURCE_ID" \
--query 'immutableId' \
--output tsv)
PROMETHEUS_REMOTE_WRITE_ENDPOINT="${DCE_INGESTION_URL}/dataCollectionRules/${DCR_IMMUTABLE_ID}/streams/Microsoft-PrometheusMetrics/api/v1/write?api-version=2023-04-24"
echo $PROMETHEUS_REMOTE_WRITE_ENDPOINT
Configuring the Grafana datasource¶
Note: Grafana administrator role is required.
- Open the Connectors setting on the left menu
- Select + Add new datasources
- Select Prometheus
- Optionally, select the default toggle to make it the default datasource
- For Prometheus server URL, enter the URL from Prometheus read endpoint
- Under authentication a. Select Azure auth b. For Authentication, select Managed Identity
- Click Save & Test
Full chart configuration¶
A full working example of the configuration can be found in the
datarobot-prime/charts/datarobot-observability-core/examples/aks.values.yaml
file in the DataRobot tarball.
In the minimal configuration without additional custom processors (see extending pipelines with custom processors), the values to update are the following:
CLIENT_ID: see Managed Identity client IDAZURE_MONITOR_CONNECTION_STRING: see Application Insights connection string
If Prometheus via Monitor Workspace was provisioned, the additional settings are:
* PROMETHEUS_REMOTE_WRITE_ENDPOINT: see Prometheus write
endpoint
* METRICS_EXPORTER: it must be either azuremonitor or prometheusremotewrite
(default), depending on where the metrics are going to be exported
For additional exporter configuration, check the specific exporter definition where these values are referenced, where a link to the upstream exporter documentation is included.
Once the values are set, DataRobot can be installed/upgraded by specifying the
path to this file with the -f option to the helm command.