# End-to-end Feature Discovery

> End-to-end Feature Discovery - How Feature Discovery helps you combine datasets of different
> granularities and perform automated feature engineering.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.540784+00:00` (UTC).

## Primary page

- [End-to-end Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html): Full documentation for this topic (HTML).

## Sections on this page

- [Takeaways](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#takeaways): In-page section heading.
- [Load the datasets to AI Catalog](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#load-the-datasets-to-ai-catalog): In-page section heading.
- [Add secondary datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#add-secondary-datasets): In-page section heading.
- [Define relationships](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#define-relationships): In-page section heading.
- [Build your models](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#build-your-models): In-page section heading.
- [Review derived features](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#review-derived-features): In-page section heading.
- [Score models built with Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/enrich-data-using-feature-discovery.html#score-models-built-with-feature-discovery): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [Data](https://docs.datarobot.com/en/docs/classic-ui/data/index.html): Linked from this page.
- [Transform data](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/index.html): Linked from this page.
- [Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/index.html): Linked from this page.
- [Time series modeling](https://docs.datarobot.com/en/docs/classic-ui/modeling/time/index.html): Linked from this page.
- [Feature Reductiontab](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-gen.html#disable-feature-reduction): Linked from this page.

## Documentation content

# End-to-end Feature Discovery

This page describes how Feature Discovery helps you combine datasets of different granularities and perform automated feature engineering.

More often than not, features are split across multiple data assets. Bringing these data assets together can take a lot of work—joining them and then running machine learning models on top. It's even more difficult when the datasets are of different granularities. In this case, you have to aggregate to join the data successfully.

Feature Discovery solves this problem by automating the procedure of joining and aggregating your datasets. After defining how the datasets need to be joined, you leave feature generation and modeling to DataRobot.

The examples below use data taken from Instacart, an online aggregator for grocery shopping. The business problem is to predict whether a customer is likely to purchase a banana.

## Takeaways

This page shows how to:

- Add datasets to a project
- Define relationships
- Set join conditions
- Configure time-aware settings
- Review features that are generated during Feature Discovery
- Score models built using Feature Discovery

## Load the datasets to AI Catalog

The examples on this page use these datasets:

| Table | Description |
| --- | --- |
| Users | Information on users and whether or not they bought bananas on particular order dates. |
| Orders | Historical orders made by a user. A User record is joined with multiple Order records. |
| Transactions | Specific products bought by the user in an order. An Order record is joined with multiple Transaction records. |

Each of these tables has a different unit of analysis, which defines the who or what you're predicting, as well as the level of granularity of the prediction. This shows how to join the tables together so that you have a suitable unit of analysis that produces good results.

Start by loading the primary dataset—the dataset containing the target feature you want to predict.

1. Go to theAI Catalogand for each dataset you want to upload, clickAdd to catalog. You can add the data in various ways, for example, by connecting to a data source or uploading a local file.
2. Once all of your datasets are uploaded, select the dataset you want to be your primary dataset and clickCreate projectin the upper right.

## Add secondary datasets

Once you upload your datasets to the AI Catalog, you can add the secondary datasets to the primary dataset in the project you created.

1. In the project you created, specify your target, then underSecondary Datasets, clickAdd datasets.
2. On theSpecify prediction pointpage of theRelationship editor, select the feature that indexes your primary dataset by time underSelect date feature to use as prediction point. Then clickSet up as prediction point. In this dataset, the date feature istime.
3. In theAdd datasetspage of theRelationship editor, selectAI Catalog.
4. In theAdd datasetswindow, clickSelectnext to each dataset you want to add, then clickAdd.
5. ClickContinueto finalize your selection.

## Define relationships

Next, create relationships between your datasets by specifying the conditions for joining the datasets, for example, the columns on which they are joined. You can also configure time-aware settings if needed for your data.

1. On theDefine Relationshipspage, click a secondary dataset to highlight it, then click the plus sign that appears at the bottom of the primary dataset tile.
2. Set join conditions—in this case, specify the columns for joining. DataRobot recommends theuser_idcolumn for the join. ClickSave and configure time-aware. Build complex relationships with multiple join conditionsInstead of a single column, you can add a list of features for more complex joining operations. Click+ join conditionand select features to build complex relationships.
3. Select the time feature from the secondary dataset and the feature derivation window, and clickSave. SeeTime series modelingfor details on setting time-aware options.
4. Repeat these steps to add any other secondary datasets. In this example, the three datasets are joined with these relationships:

## Build your models

Now that the secondary datasets are in place and DataRobot knows how to join them, you can go back to the project and begin modeling.

1. ClickContinue to projectin the top right. Back on the mainDatapage, you can see underSecondary Datasetsthat two relationships have been defined for theOrderssecondary dataset and one relationship has been defined for theTransactionssecondary dataset.
2. ClickStartto begin modeling. DataRobot loads the secondary datasets and discovers features: In the next section, you'll learn how to analyze them.

## Review derived features

DataRobot automatically generates hundreds of features and removes features that might be redundant or have a low impact on model accuracy.

> [!NOTE] Note
> To prevent DataRobot from removing less informative features, turn off supervised feature reduction on the [Feature Reductiontab](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-gen.html#disable-feature-reduction) of the Feature Discovery Settings page.

You can begin reviewing the derived features once EDA2 completes.

1. On theDatatab, click a derived feature and view theHistogramtab. Derived feature names include the dataset alias and the type of transformation. In this example, the transformation is the unique count of orders by the day of the month.
2. Click theFeature Lineagetab to see how this feature was created.
3. Scroll to the top of theDatapage and open theFeature Discoverytab. Click the menu iconand use the actions described below to learn more about how DataRobot processed Feature Discovery:

**Download SQL:**
To understand how the derived features are constructed, click Download SQL.

[https://docs.datarobot.com/en/docs/images/tu-fd-download-sql.png](https://docs.datarobot.com/en/docs/images/tu-fd-download-sql.png)

**Download dataset:**
To download the new dataset with the derived features, click Download dataset.

[https://docs.datarobot.com/en/docs/images/tu-fd-download-dataset.png](https://docs.datarobot.com/en/docs/images/tu-fd-download-dataset.png)

**Feature Derivation log:**
To understand the process DataRobot used to derive and prune the features, click Feature Derivation log.

[https://docs.datarobot.com/en/docs/images/tu-fd-select-feature-derivation-log.png](https://docs.datarobot.com/en/docs/images/tu-fd-select-feature-derivation-log.png)

The Feature Derivation Log shows information about the features processed, generated, and removed, along with the reasons why features were removed. You can optionally save the log by clicking Download:

[https://docs.datarobot.com/en/docs/images/tu-fd-feature-derivation-log.png](https://docs.datarobot.com/en/docs/images/tu-fd-feature-derivation-log.png)


## Score models built with Feature Discovery

When scoring models built with Feature Discovery, you need to ensure the secondary datasets are up-to-date and that feature derivation will complete without problems.

To make predictions on models built with Feature Discovery:

1. In theModelspage, click theLeaderboardtab and click the model you selected for deployment.
2. ClickPredict, then underPrediction Datasets, clickImport data fromand import the scoring dataset. The dataset must have the same schema as the dataset used to create the project. The target column is optional and you don't need to upload secondary datasets at this point.
3. After the dataset is uploaded, clickCompute Predictions.
4. To change the default configuration for the secondary datasets, underSecondary datasets configuration, clickChange. Updating the secondary dataset configuration is necessary if the scoring data has a different time period and is not joinable with the secondary datasets used in the training phase.
5. To add a new configuration, clickcreate new.
6. To replace secondary dataset, on theSecondary Datasets Configurationwindow, locate the secondary dataset and clickReplace.

> [!NOTE] Note
> If you need to replace a secondary dataset, do so before uploading your scoring dataset to DataRobot. If not, DataRobot will use the default settings to compute the joins and perform feature derivation.
