Skip to content

Sample assets

Learn DataRobot faster using these sample datasets. In some cases, full tutorials using these assets are available, allowing you to try it yourself, step-by-step. Datasets are organized by problem type.

Generative

Name Description Usage Asset link(s) Learn more
Space station research A ZIP file of space station research papers and a CSV of evaluation prompts. Retrieval augmented generation (RAG) Download .zip Video
Walkthrough
Medical Research Abstracts A ZIP file containing individual text files. Each text file is the abstract of a medical research paper. RAG Download .zip AI Accelerator
Technical Documentation A ZIP file containing the technical documentation for DataRobot as of late 2023. RAG Download .zip Walkthrough
Kaggle "Wikipedia Movie Plots" Several ZIP files, roughly 600 small text files each, containing movie plot summaries for some American, Japanese, and Indian movies from 1915 to 2017. Build your own vector databases and LLM blueprints Download dramas .zip
Download random .zip
Download romances .zip
Download comedies .zip
Video walkthrough

Time series

Name Description Usage Features Asset link(s) Learn more
Car Sales, GUI and Code The monthly sales volume for many vehicle makes and models with additional contextual variables. Mulitseries, multivariate time series Numeric Short and fuller versions of data; a Python notebook Video
Walkthrough
Demand forecasting by SKU by store Weekly units sold by store and SKU for 50 categorized products SKU-level demand forecasting Numeric, categorical Training File
Scoring File Calendar File
AI Accelerator

Regression

Name Description Usage Features Asset link(s) Learn more
Fuel Efficiency Predict the miles per gallon (MPG) based on other vehicle attributes. Regression Numeric Training Data API quickstart
Wine Quality Predict the quality score for white wines based on chemical composition. Regression Numeric Training Data
Scoring File
Developer Salaries Predict developer salaries based on the Stack Overflow Developer Survey 2019. Regression Numeric, categorical, text Training Data

Classification

Name Description Usage Features Asset link(s) Learn more
Hospital Readmissions Predict whether a patient will be 'readmitted' to the hospital after being discharged. Binary classification Numeric, categorical, text Training Data Walkthrough
Loan Default Predict whether a loan 'is_bad' based on information provided on an application. Binary classification Numeric, categorical, text Training Data
Scoring File
Walkthrough
Flight Delays Predict whether an airline departure will be delayed by 30 minutes or more. Binary classification Numeric, categorical Training
Scoring
AI Accelerator

Multiclass / multilabel classification

These projects can only be completed in DataRobot Classic.

Name Description Usage Features Asset link(s) Learn more
Plant Disease A ZIP file with several hundred images of plant leaves, organized into folders by disease class. Multiclass Images Download
Apparel Multilabel Pictures of clothing which fit into multiple categories (for example, 'blue' and 'dress'). Multilabel Images Download