Fundamentals of working with data¶
High-quality data is integral to the ML workflow—from importing and cleaning data to transforming and engineering features, from scoring with prediction datasets to deploying on a prediction server—data is critical.
DataRobot also provides tools to help you clean and prepare your data, including Data Prep (formerly Paxata), a general purpose data preparation tool, and time series data prep, which helps mitigate the effects of gaps in your time series data.
You can transform data directly in DataRobot, as well as discover and generate new features from multiple datasets using Feature Discovery. Here you define relationships among features in multiple datasets. You can then view the lineage of derived features:
DataRobot helps you assess the quality of the relationship configurations and provides you with tools to review and manage the derived features, like the Feature Lineage window shown here.
Data sources¶
To begin the modeling process, you import from data sources, for example, from the AI Catalog, configured data connections, local files, and URLs:
You can:
-
Import data into the AI Catalog then select a dataset from there.
-
Import data directly from a connected data source.
-
Import a local file or specify a URL from where to pull the data.
After you upload the data, DataRobot conducts exploratory data analysis, analyzing the quality of the data, and displays the results in the Data Quality Assessment window.