Monitoring¶
To trust a model to power mission-critical operations, you must have confidence in all aspects of model deployment. By closely tracking the performance of models in production, you can identify potential issues before they impact business operations. Monitoring ranges from whether the service is reliably providing predictions in a timely manner and without errors to ensuring the predictions themselves are reliable.
The predictive performance of a model typically starts to diminish as soon as it’s deployed. For example, someone might be making live predictions on a dataset with customer data, but the customer’s behavioral patterns might have changed due to an economic crisis, market volatility, natural disaster, or even the weather. Models trained on older data that no longer represents the current reality might not just be inaccurate, but irrelevant, leaving the prediction results meaningless or even harmful. Without dedicated production model monitoring, the user cannot know or detect when this happens. If model accuracy starts to decline without detection, the results can impact a business, expose it to risk, and destroy user trust.
Topic | Describes how to |
---|---|
Service health | Track model-specific deployment latency, throughput, and error rate. |
Data drift | Monitor model accuracy based on data distribution. |
Accuracy | Analyze performance of a model over time. |
Fairness | Monitor deployments to recognize when protected features fail to meet predefined fairness criteria. |
Usage | Track prediction processing progress for use in accuracy, data drift, and predictions over time analysis. |
Custom metrics | Create and monitor custom business or performance metrics or add pre-made metrics. |
Data exploration | Explore and export a deployment's stored prediction data, actuals, and training data to compute and monitor custom business or performance metrics. |
Monitoring jobs | Monitor deployments running and storing feature data and predictions outside of DataRobot. |
Deployment reports | Generate reports, immediately or on a schedule, to summarize the details of a deployment, such as its owner, how the model was built, the model age, and the humility monitoring status. |
Segmented analysis | Filters service health, data drift, and accuracy statistics into unique segment attributes and values to identify potential issues in your training and prediction data. |
Generative model monitoring | Use the text generation target type for DataRobot custom and external models to monitor generative Large Language Models (LLMs), allowing you to make predictions, monitor model performance statistics, explore data, and create custom metrics. |
Batch monitoring | View monitoring statistics organized into batches instead of monitoring all predictions as a whole, over time. |
LLM custom metric reference | Review detailed descriptions, including formulas for calculating DataRobot's custom metrics for LLM evaluation. |