# Service Health tab

> Service Health tab - How to use the Service Health tab, which tracks metrics for how quickly a
> deployment responds to prediction requests to find bottlenecks and assess capacity.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.578636+00:00` (UTC).

## Primary page

- [Service Health tab](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html): Full documentation for this topic (HTML).

## Sections on this page

- [Use the time range and resolution dropdowns](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#use-the-time-range-and-resolution-dropdowns): In-page section heading.
- [Understand the metric tiles](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#understand-the-metric-tiles): In-page section heading.
- [Understand the Service Health chart](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#understand-the-service-health-chart): In-page section heading.
- [View MLOps Logs](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#view-mlops-logs): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [MLOps](https://docs.datarobot.com/en/docs/classic-ui/mlops/index.html): Linked from this page.
- [Performance monitoring](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/index.html): Linked from this page.
- [Data drift](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/data-drift.html): Linked from this page.
- [Accuracy](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/deploy-accuracy.html): Linked from this page.
- [Prediction History and Service Health](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/deploy-methods/add-deploy-info.html#prediction-history-and-service-health): Linked from this page.
- [segmented analysis](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/deploy-segment.html): Linked from this page.
- [agent-monitored deployments](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/mlops-agent/monitoring-agent/index.html): Linked from this page.
- [prediction monitoring job](https://docs.datarobot.com/en/docs/classic-ui/predictions/batch/pred-monitoring-jobs/index.html): Linked from this page.
- [Deployments](https://docs.datarobot.com/en/docs/classic-ui/mlops/manage-mlops/deploy-inventory.html): Linked from this page.

## Documentation content

# Service Health tab

The Service Health tab tracks metrics about a deployment's ability to respond to prediction requests quickly and reliably. This helps identify bottlenecks and assess capacity, which is critical to proper provisioning.

For example, if a model seems to have generally slowed in its response times, the Service Health tab for the model's deployment can help. You might notice in the tab that median latency goes up with an increase in prediction requests. If latency increases when a new model is switched in, you can consult with your team to determine whether the new model can instead be replaced with one offering better performance.

To access Service Health, select an individual deployment from the deployment inventory page and, from the resulting Overview page, choose the Service Health tab. The tab provides informational [tiles](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#understanding-the-metric-tiles) and a [chart](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html#understanding-the-service-health-chart) to help assess the activity level and health of the deployment.

> [!NOTE] Time of Prediction
> The Time of Prediction value differs between the [Data drift](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/data-drift.html) and [Accuracy](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/deploy-accuracy.html) tabs and the [Service health](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/service-health.html) tab:
> 
> On the
> Service health
> tab, the "time of prediction request" is
> always
> the time the prediction server
> received
> the prediction request. This method of prediction request tracking accurately represents the prediction service's health for diagnostic purposes.
> On the
> Data drift
> and
> Accuracy
> tabs, the "time of prediction request" is,
> by default
> , the time you
> submitted
> the prediction request, which you can override with the prediction timestamp in the
> Prediction History and Service Health
> settings.

## Use the time range and resolution dropdowns

The controls—model version and data time range selectors—work the same as those available on the [Data Drift](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/data-drift.html#use-the-time-range-and-resolution-dropdowns) tab. The Service Health tab also supports [segmented analysis](https://docs.datarobot.com/en/docs/classic-ui/mlops/monitor/deploy-segment.html), allowing you to view service health statistics for individual segment attributes and values.

## Understand the metric tiles

DataRobot displays informational statistics based on your current settings for model and time frame. That is, tile values correspond to the same units as those selected on the slider. If the slider interval values are weekly, the displayed tile metrics show values corresponding to weeks. Clicking a metric tile updates the chart below.

The Service health tab reports the following metrics on the dashboard:

> [!NOTE] Service health information for external models and monitoring jobs
> Service health information is unavailable for external [agent-monitored deployments](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/mlops-agent/monitoring-agent/index.html) and deployments with predictions uploaded through a [prediction monitoring job](https://docs.datarobot.com/en/docs/classic-ui/predictions/batch/pred-monitoring-jobs/index.html).

| Statistic | Reports for selected time period... |
| --- | --- |
| Total Predictions | The number of predictions the deployment has made (per prediction node). |
| Total Requests | The number of prediction requests the deployment has received (a single request can contain multiple prediction requests). |
| Requests over... | The number of requests where the response time was longer than the specified number of milliseconds. The default is 2000 ms; click in the box to enter a time between 10 and 100,000 ms or adjust with the controls. |
| Response Time | The time (in milliseconds) DataRobot spent receiving a prediction request, calculating the request, and returning a response to the user. The report does not include time due to network latency. Select the median prediction request time or 90th, 95th, or 99th percentile. The display reports a dash if you have made no requests against it or if it's an external deployment. |
| Execution Time | The time (in milliseconds) DataRobot spent calculating a prediction request. Select the median prediction request time or 90th, 95th, or 99th percentile. |
| Median/Peak Load | The median and maximum number of requests per minute. |
| Data Error Rate | The percentage of requests that result in a 4xx error (problems with the prediction request submission). This is a component of the value reported as the Service Health Summary in the Deployments page top banner. |
| System Error Rate | The percentage of well-formed requests that result in a 5xx error (problem with the DataRobot prediction server). This is a component of the value reported as the Service Health Summary in the Deployments page top banner. |
| Consumers | The number of distinct users (identified by API key) who have made prediction requests against this deployment. |
| Cache Hit Rate | The percentage of requests that used a cached model (the model was recently used by other predictions). If not cached, DataRobot has to look the model up, which can cause delays. The prediction server cache holds 16 models by default, dropping the least-used dropped when the limit is reached. |

## Understand the Service Health chart

The chart below the tiled metrics displays individual metrics over time, helping to identify patterns in the quality of service. Clicking on a metric tile updates the chart to represent that information; you can also export it. Adjust the data range slider to narrow in on a specific period:

Some charts will display multiple metrics:

## View MLOps Logs

On the MLOps Logs tab, you can view important deployment events. These events can help diagnose issues with a deployment or provide a record of the actions leading to the current state of the deployment. Each event has a type and a status. You can filter the event log by event type, event status, or time of occurrence, and you can view more details for an event on the Event Details panel.

1. On a deployment'sService Healthpage, scroll to theRecent Activitysection at the bottom of the page.
2. In theRecent Activitysection, clickMLOps Logs.
3. UnderMLOps Logs, configure any of the following filters: ElementDescription1Set theCategoriesfilter to display log events by deployment feature:Accuracy: events related to actuals processing.Challengers: events related to challengers functionality.Monitoring: events related to general deployment actions; for example, model replacements or clearing deployment stats.Predictions: events related to predictions processing.Retraining: events related to deployment retraining functionality.The default filter displays all event categories.2Set theStatus Typefilter to display events by  status:SuccessWarningFailureInfoThe default filter displaysAnystatus type.3Set theRange (UTC)filter to display events logged within the specified range (UTC). The default filter displays the last seven days up to the current date and time. What errors are surfaced in the MLOps Logs?Actuals with missing valuesActuals with duplicate association IDActuals with invalid payloadChallenger createdChallenger deletedChallenger replay errorChallenger model validation errorCustom model deployment creation startedCustom model deployment creation completedCustom model deployment creation failedDeployment historical stats resetFailed to establish training data baselineModel replacement validation warningPrediction processing limit reachedPredictions missing required association IDReason codes (prediction explanations) preview failedReason codes (prediction explanations) preview startedRetraining policy successRetraining policy errorTraining data baseline calculation started
4. On the left panel, theMLOps Logslist displays deployment events with any selected filters applied. For each event, you can view a summary that includes the event name and status icon, the timestamp, and an event message preview.
5. Click the event you want to examine and review theEvent Detailspanel on the right. General event detailsEvent-specific detailsThis panel includes the following details:TitleStatus Type (with a success, warning, failure, or info label)TimestampMessage (with text describing the event)You can also view the following details if applicable to the current event:Model IDModel Package ID / Registered Model Version ID (with a link to the package in the Model Registry if MLOps is enabled)Catalog ID (with a link to the dataset in the AI Catalog)Challenger IDPrediction Job ID (for the related batch prediction job)Affected Indexes (with a list of indexes related to the error event)Start/End Date (for events covering a specified period; for example, resetting deployment stats) TipFor ID fields without a link, you can copy the ID by clicking the copy button.
