NextGen UI documentation > Console > Monitoring > Service health

Service health¶

The Service health tab tracks metrics about a deployment's ability to respond to prediction requests quickly and reliably. This helps identify bottlenecks and assess capacity, which is critical to proper provisioning. For example, if a model seems to have generally slowed in its response times, the Service health tab for the model's deployment can help. You might notice in the tab that median latency goes up with an increase in prediction requests. If latency increases when a new model is switched in, you can consult with your team to determine whether the new model can instead be replaced with one offering better performance.

To access Service health, select an individual deployment from the deployment inventory page and then, from the Overview, click Monitoring > Service health. The tab provides informational tiles and a chart to help assess the activity level and health of the deployment.

Time of Prediction

The Time of Prediction value differs between the Data drift and Accuracy tabs and the Service health tab:

On the Service health tab, the "time of prediction request" is always the time the prediction server received the prediction request. This method of prediction request tracking accurately represents the prediction service's health for diagnostic purposes.
On the Data drift and Accuracy tabs, the "time of prediction request" is, by default, the time you submitted the prediction request, which you can override with the prediction timestamp in the Prediction History and Service Health settings.

Understand metric tiles and chart¶

DataRobot displays informational statistics based on your current settings for model and time frame. That is, tile values correspond to the same units as those selected on the slider. If the slider interval values are weekly, the displayed tile metrics show values corresponding to weeks. Clicking a metric tile updates the chart below.

The Service health tab reports the following metrics on the dashboard:

Service health information for external models and monitoring jobs

Service health information such as latency, throughput, and error rate is unavailable for external, agent-monitored deployments or when predictions are uploaded through a prediction monitoring job.

Statistic	Reports (for selected time period)
Total Predictions	The number of predictions the deployment has made (per prediction node).
Total Requests	The number of prediction requests the deployment has received (a single request can contain multiple prediction requests).
Requests over `x` ms	The number of requests where the response time was longer than the specified number of milliseconds. The default is 2000 ms; click in the box to enter a time between 10 and 100,000 ms or adjust with the controls.
Response Time	The time (in milliseconds) DataRobot spent receiving a prediction request, calculating the request, and returning a response to the user. The report does not include time due to network latency. Select the median prediction request time or 90th, 95th, or 99th percentile. The display reports a dash if you have made no requests against it or if it's an external deployment.
Execution Time	The time (in milliseconds) DataRobot spent calculating a prediction request. Select the median prediction request time or 90th, 95th, or 99th percentile.
Median/Peak Load	The median and maximum number of requests per minute.
Data Error Rate	The percentage of requests that result in a 4xx error (problems with the prediction request submission). This is a component of the value reported as the Service Health Summary on the Deployments dashboard top banner.
System Error Rate	The percentage of well-formed requests that result in a 5xx error (problem with the DataRobot prediction server). This is a component of the value reported as the Service Health Summary on the Deployments dashboard top banner.
Consumers	The number of distinct users (identified by API key) who have made prediction requests against this deployment.
Cache Hit Rate	The percentage of requests that used a cached model (the model was recently used by other predictions). If not cached, DataRobot has to look the model up, which can cause delays. The prediction server cache holds 16 models by default, dropping the least-used model when the limit is reached.

You can configure the dashboard to focus the visualized statistics on specific segments and time frames. The following controls are available:

Control	Description
Model	Updates the dashboard displays to reflect the model you selected from the dropdown.
Range (UTC)	Sets the date range displayed for the deployment date slider. You can also drag the date slider to set the range. The range selector only allows you to select dates and times between the start date of the deployment's current version of a model and the current date.
Resolution	Sets the time granularity of the deployment date slider. The following resolution settings are available, based on the selected range: Hourly: If the range is less than 7 days. Daily: If the range is between 1-60 days (inclusive). Weekly: If the range is between 1-52 weeks (inclusive). Monthly: If the range is at least 1 month and less than 120 months.
Segment Attribute	Sets the segment to filter the dashboard by.
Segment Value	Sets a specific value within a segment to filter the dashboard by.
Refresh	Initiates an on-demand update of the dashboard with new data. Otherwise, DataRobot refreshes the dashboard every 15 minutes.
Reset	Reverts the dashboard controls to the default settings.

The chart below the metric tiles displays individual metrics over time, helping to identify patterns in the quality of service. Clicking on a metric tile updates the chart to represent that information; adjusting the data range slider focuses on a specific period:

Export charts

Click Export to download a .csv or .png file of the currently selected chart, or a .zip archive file of both (and a .json file).

The Median | Peak Load (calls/minute) chart displays two lines, one for Peak load and one for Median load over time:

Service health¶

Understand metric tiles and chart¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?