Recommended Monitoring Endpoints¶
The following URLs are recommendations only. DataRobot supports a variety of cluster sizes and layouts, so you should work with DataRobot Customer Success to determine the best monitoring solution to ensure your users are not affected by operational issues.
Services¶
We recommend the following service routes are monitored:
| Service | URL |
|---|---|
| GUI Application Health | /v1/health/?service=app&text=true |
| Upload Service Health | /v1/health/?service=appupload&text=true |
| Data Ingestion Manager Health | /v1/health/?service=datasetsserviceapi&text=true |
| Internal API Endpoint | /v1/health/?service=internalapi&text=true |
| Public API Server | /v1/health/?service=publicapi&text=true |
| Mongo Services Health | /v1/health/?service=mongo&text=true |
| Redis Service Health | /v1/health/?service=redis&text=true |
| Synchronous Prediction API | /v1/health/?service=predictionapi&text=true |
| Dedicated Prediction API Status | /v1/health/?service=dedicatedpredictionnginx&text=true |
| ModMon rsyslog Master | /v1/health/?service=modmonrsyslogmaster&text=true |
| ModMon database | /v1/health/?service=pgsql&text=true |
| Rabbit Status | /v1/health/?service=rabbit&text=true |
| Tableau Extension Status | /v1/health/?service=tableauextension&text=true |
| Worker Task Manager Status | /v1/health/?service=taskmanager&text=true |
| Elasticsearch | /v1/health/?service=elasticsearch&text=true |
Note
If you are not sure what type of prediction service your cluster supports, contact DataRobot Support.
Test Jobs¶
We recommend you regularly check the following routes for health, as they determine if the full end-to-end worker systems are correctly functioning:
| End-to-End Test | URL |
|---|---|
| Modeling Jobs | /v1/health/?name=Secure%20Worker%20Ping%20Job&text=true |
| Data Analysis Jobs | /v1/health/?name=EDA%20Worker%20Ping%20Job&text=true |
| Dedicated Prediction route | /v1/health/?name=Dedicated%20Prediction%20Server%20Health%20route&text=true |
Host-Based Checks¶
We also recommend that you include checks against specific hosts (one for each in your cluster). This will allow you to quickly determine where a failure has occurred:
/v1/health/?address=myhost.com
Fine Grained Checks¶
We recommend having separate monitors for checks of important, business sensitive applications like your dedicated prediction nodes.
Further Support¶
DataRobot maintains a cloud deployment with a high number of simultaneous users and exceptional rates of simultaneous modeling and predictions. If you have questions about how to set up monitoring on your cluster or would like advice on how your particular situation could be better monitored, contact DataRobot Support.