Recommended Monitoring Endpoints¶
The following URLs are recommendations only, as DataRobot supports a variety of cluster sizes and layouts, we recommend that you work with DataRobot Customer Success to determine the best monitoring solution to ensure your Data Scientists are not affected by operational issues interrupting the ability of DataRobot to do work.
Services¶
We recommend the following service routes are monitored:
| サービス | URL |
|---|---|
| GUI Application Health | /v1/health/?service=app&text=true |
| Upload Service Health | /v1/health/?service=appupload&text=true |
| Data Ingestion Manager Health | /v1/health/?service=datasetsserviceapi&text=true |
| Internal API Endpoint | /v1/health/?service=internalapi&text=true |
| Public API Server | /v1/health/?service=publicapi&text=true |
| Mongo Services Health | /v1/health/?service=mongo&text=true |
| Redis Service Health | /v1/health/?service=redis&text=true |
| Synchronous Prediction API | /v1/health/?service=predictionapi&text=true |
| Dedicated Prediction API Status | /v1/health/?service=dedicatedpredictionnginx&text=true |
| ModMon rsyslog Master | /v1/health/?service=modmonrsyslogmaster&text=true |
| ModMon database | /v1/health/?service=pgsql&text=true |
| Rabbit Status | /v1/health/?service=rabbit&text=true |
| Tableau Extension Status | /v1/health/?service=tableauextension&text=true |
| Worker Task Manager Status | /v1/health/?service=taskmanager&text=true |
| Elasticsearch | /v1/health/?service=elasticsearch&text=true |
Note: If you are not sure what type of prediction service your cluster supports, please reach out to DataRobot Customer Success using: support@datarobot.com.
Test Jobs¶
We recommend you regularly check the following routes for health, as they determine if the full end-to-end worker systems are correctly functioning:
| End-to-End Test | URL |
|---|---|
| Modeling Jobs | /v1/health/?name=Secure%20Worker%20Ping%20Job&text=true |
| Data Analysis Jobs | /v1/health/?name=EDA%20Worker%20Ping%20Job&text=true |
| Dedicated Prediction route | /v1/health/?name=Dedicated%20Prediction%20Server%20Health%20route&text=true |
Host-Based Checks¶
We also recommend that you include checks against specific hosts (one for each in your cluster). This will allow you to quickly determine where a failure has occurred:
/v1/health/?address=myhost.com
Fine Grained Checks¶
We recommend having separate monitors for checks of important, business sensitive applications like your dedicated prediction nodes.
Further Support¶
At DataRobot, we’ve been maintaining a cloud deployment for high loads of simultaneous users and exceptional rates of simultaneous modeling and predictions. The only way we’ve learned how to maintain this system is through vigilance and good monitoring. If you have questions about how to set up monitoring on your cluster or would like advice on how your particular situation could be better monitored, please reach out to DataRobot Customer Success at support@datarobot.com, and we’ll be happy to help you set up a great solution that fits your needs.