Sample assets¶
Learn DataRobot faster using these sample datasets. In some cases, full tutorials using these assets are available, allowing you to try it yourself, step-by-step. Datasets are organized by problem type.
Generative¶
| Name | Description | Asset link(s) | Learn more | |
|---|---|---|---|---|
| Space station research | A ZIP file of space station research papers and a CSV of evaluation prompts. | Retrieval augmented generation (RAG) | Download .zip | Video Walkthrough |
| Medical Research Abstracts | A ZIP file containing individual text files. Each text file is the abstract of a medical research paper. | RAG | Download .zip | AI Accelerator |
| Technical Documentation | A ZIP file containing the technical documentation for DataRobot as of late 2023. | RAG | Download .zip | Walkthrough |
| Kaggle "Wikipedia Movie Plots" | Several ZIP files, roughly 600 small text files each, containing movie plot summaries for some American, Japanese, and Indian movies from 1915 to 2017. | Build your own vector databases and LLM blueprints | Download dramas .zip Download random .zip Download romances .zip Download comedies .zip |
Video walkthrough |
Time series¶
| Name | Description | Features | Asset link(s) | Learn more | |
|---|---|---|---|---|---|
| Car Sales, GUI and Code | The monthly sales volume for many vehicle makes and models with additional contextual variables. | Mulitseries, multivariate time series | Numeric | Short and fuller versions of data; a Python notebook | Video Walkthrough |
| Demand forecasting by SKU by store | Weekly units sold by store and SKU for 50 categorized products | SKU-level demand forecasting | Numeric, categorical | Training File Scoring File Calendar File |
AI Accelerator |
Regression¶
| Name | Description | Features | Asset link(s) | Learn more | |
|---|---|---|---|---|---|
| Fuel Efficiency | Predict the miles per gallon (MPG) based on other vehicle attributes. | Regression | Numeric | Training Data | API quickstart |
| Wine Quality | Predict the quality score for white wines based on chemical composition. | Regression | Numeric | Training Data Scoring File |
— |
| Developer Salaries | Predict developer salaries based on the Stack Overflow Developer Survey 2019. | Regression | Numeric, categorical, text | Training Data | — |
Classification¶
| Name | Description | Features | Asset link(s) | Learn more | |
|---|---|---|---|---|---|
| Hospital Readmissions | Predict whether a patient will be 'readmitted' to the hospital after being discharged. | Binary classification | Numeric, categorical, text | Training Data | Walkthrough |
| Loan Default | Predict whether a loan 'is_bad' based on information provided on an application. | Binary classification | Numeric, categorical, text | Training Data Scoring File |
Walkthrough |
| Flight Delays | Predict whether an airline departure will be delayed by 30 minutes or more. | Binary classification | Numeric, categorical | Training Scoring |
AI Accelerator |
Multiclass / multilabel classification¶
These projects can only be completed in DataRobot Classic.
| Name | Description | Features | Asset link(s) | Learn more | |
|---|---|---|---|---|---|
| Plant Disease | A ZIP file with several hundred images of plant leaves, organized into folders by disease class. | Multiclass | Images | Download | — |
| Apparel Multilabel | Pictures of clothing which fit into multiple categories (for example, 'blue' and 'dress'). | Multilabel | Images | Download | — |