# Set up Feature Discovery projects

> Set up Feature Discovery projects - How to create a project from multiple datasets. You define the
> relationships. Feature Discovery aggregates the secondary datasets to enrich the primary dataset.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.541602+00:00` (UTC).

## Primary page

- [Set up Feature Discovery projects](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html): Full documentation for this topic (HTML).

## Sections on this page

- [Get started with Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#get-started-with-feature-discovery): In-page section heading.
- [Sample use case](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#sample-use-case): In-page section heading.
- [Add datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#add-datasets): In-page section heading.
- [View dataset details](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#view-dataset-details): In-page section heading.
- [Manually define relationships](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#manually-define-relationships): In-page section heading.
- [Set join conditions](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#set-join-conditions): In-page section heading.
- [Set feature derivation windows](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#set-feature-derivation-windows): In-page section heading.
- [Automatically generate relationships](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#automatically-generate-relationships): In-page section heading.
- [Work with datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#work-with-datasets): In-page section heading.
- [Primary datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#primary-datasets): In-page section heading.
- [Secondary datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#secondary-datasets): In-page section heading.
- [Test relationship quality](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#test-relationship-quality): In-page section heading.
- [Start the project](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#start-the-project): In-page section heading.
- [Share assets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#share-assets): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [Data](https://docs.datarobot.com/en/docs/classic-ui/data/index.html): Linked from this page.
- [Transform data](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/index.html): Linked from this page.
- [Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/index.html): Linked from this page.
- [file requirements](https://docs.datarobot.com/en/docs/reference/data-ref/file-types.html#feature-discovery-file-import-sizes): Linked from this page.
- [Feature Discovery integration with Snowflake](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-snowflake.html): Linked from this page.
- [Feature engineering controls and feature reduction](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-adv-opt.html): Linked from this page.
- [Time-aware feature engineering](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-time.html): Linked from this page.
- [Derived features](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-gen.html): Linked from this page.
- [Making predictions on models that have derived features](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-predict.html): Linked from this page.
- [Info](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog-asset.html#work-with-metadata): Linked from this page.
- [roles](https://docs.datarobot.com/en/docs/reference/misc-ref/roles-permissions.html): Linked from this page.

## Documentation content

# Set up Feature Discovery projects

Feature Discovery is based on relationships—between datasets and the features within those datasets. DataRobot provides an intuitive relationship editor that allows you to build and visualize these relationships. The end product is a multitude of additional features that result from these linkages. These derived features can then train more accurate models and generate better predictions. DataRobot’s Feature Discovery engine analyzes the graphs and the included datasets to determine a feature engineering “recipe,” and from that recipe generates secondary features for training and predictions.

> [!NOTE] Note
> See the Feature Discovery [file requirements](https://docs.datarobot.com/en/docs/reference/data-ref/file-types.html#feature-discovery-file-import-sizes) for dataset sizes information.

Review the next section to [get started with Feature Discovery](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#get-started-with-feature-discovery) or skip to the step-by-step instructions that describe how to:

1. Add datasets to a project .
2. Create relationships .
3. Set join conditions .
4. Assess the quality of relationship configurations .
5. Start the project .

You can also take a deeper dive into:

- Feature Discovery integration with Snowflake .
- Feature engineering controls and feature reduction .
- Time-aware feature engineering .
- Derived features .
- Making predictions on models that have derived features .

## Get started with Feature Discovery

In most cases, all you need to start a Feature Discovery project is a simple primary dataset that includes:

- The target (column that you want to predict).
- An identifier (for example, customer_id or transaction_id ) to link the dataset to additional related datasets. This key serves as the basis of dataset joins.
- An optional time index—a date feature in the primary dataset—to support time-aware Feature Discovery . This date feature is used as the prediction point for generating new features.

Each record of the primary dataset represents the desired unit of analysis. From this primary dataset, DataRobot guides you through creating relationships to additional datasets, called secondary datasets.

Secondary datasets have features that can potentially enrich the primary dataset. While it may be the case that both primary and secondary datasets have one-to-one relationships when they are added, it is not required. In most cases, DataRobot aggregates and then summarizes features in the secondary datasets, and, from there, enriches the primary dataset.

### Sample use case

The following sections use an example to illustrate how DataRobot automatically discovers new features from multiple datasets to predict whether a loan will default. In the primary dataset, CreditRisk - Loan Applications, the is-bad column is the project target. The relation between the datasets is the CustID column.

Two additional relational datasets, CreditRisk - Credit Inquiries and CreditRisk - Tradeline Accounts, are the secondary datasets used for Feature Discovery.

Once model building begins, DataRobot runs through EDA2, adding newly created features to the [Datapage](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-gen.html). The Data page provides a variety of information about all the resulting project data, both new and old.

## Add datasets

From the AI Catalog, select the primary dataset and click Create project. Then, enter the target feature.

> [!NOTE] Note
> This procedure shows how to load datasets using the AI Catalog, so to begin, make sure all the assets are in the catalog. Alternatively, you can use the drag-and-drop method to upload datasets. If you do so, all datasets that you upload are automatically registered to the AI Catalog.

A valid Feature Discovery project requires at least one secondary dataset —the following tabs describe how to load additional datasets into the project from both the Start page and the relationship editor:

**From the Start page:**
On the
Start
page, click
Add datasets
to add one or more
additional
datasets to the project.
On the
Specify prediction point
page of the
relationship editor
, optionally
Select a date feature to use as a prediction point
. This date/time feature from the primary dataset serves as a reference date for feature derivation windows.
Note
The step to specify a prediction point does not display if you have already specified a prediction point for the project.
For an in-app explanation of prediction points, expand
Show Example
.
Click
Set up as prediction point
for a time-aware Feature Discovery project or
Continue without prediction point
for a non time-aware project.
Note
Although you can select the same date feature used for the out-of-time validation (OTV) partition as the prediction point, clicking
Continue without prediction point
automatically uses the OTV partition feature when generating new features.
If you
add or edit the prediction point
, DataRobot accounts for that change when generating new features.
In the
Add datasets
page of the
relationship editor
, select a data import method under
Add Data From
.
This example shows how to add a dataset from the
AI Catalog
.
From the
AI Catalog
, select the datasets you want to include by clicking
Select
. Use the search functionality to easily locate datasets for selection. When finished, click
Add
.
Click
Continue
to finalize your selection. The secondary datasets you select on this page are immediately added to the configuration, so if you reload the page without clicking
Continue
, the data is not lost.
The
Define Relationships
page displays the datasets.

Best practice suggests continuing within this editor to define relationships. You can, however, click Continue to project to return to the Start screen.

[https://docs.datarobot.com/en/docs/images/safer-secondary-datasets-on-start-page.png](https://docs.datarobot.com/en/docs/images/safer-secondary-datasets-on-start-page.png)

The datasets display and you can see the number of relationships that have been defined.

At any time, you can click Define relationships to return to the Define Relationships page.

**From the relationship editor:**
If your project has more than one secondary dataset, you can add more datasets after saving. From the Define Relationships page:

Click
Add datasets
and select a data import method.
This example shows how to add a dataset from the
AI Catalog
.
From the
AI Catalog
, select the datasets you want to include by clicking
Select
. Use the search functionality to easily locate datasets for selection. When finished, click
Add
.
The
Define Relationships
page displays the datasets.


Each dataset displayed on the canvas has a menu with shortcuts to dataset-related tasks. See details of working with [primary datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#primary-datasets) and [secondary datasets](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#secondary-datasets).

After adding secondary datasets to your project, [define the relationships](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#define-relationships) between the datasets.

### View dataset details

You can access dataset details directly from the relationship editor using one of the following methods:

**Brief description:**
On the dataset tile, hover over the line beneath the dataset name to display metadata for the dataset.

[https://docs.datarobot.com/en/docs/images/safer-dataset-brief-details.png](https://docs.datarobot.com/en/docs/images/safer-dataset-brief-details.png)

**Detailed description:**
Click the menu icon on the top right of the dataset tile and select Details to open the [Info](https://docs.datarobot.com/en/docs/classic-ui/data/ai-catalog/catalog-asset.html#work-with-metadata) page in the AI Catalog. From here you can access the profile, feature lists, relationships, version history, and comments associated with the dataset.

[https://docs.datarobot.com/en/docs/images/safer-dataset-details-menu-item.png](https://docs.datarobot.com/en/docs/images/safer-dataset-details-menu-item.png)

You can also delete the dataset from this menu.

[https://docs.datarobot.com/en/docs/images/safer-define-relationships-menu.png](https://docs.datarobot.com/en/docs/images/safer-define-relationships-menu.png)


## Manually define relationships

Once all datasets are loaded, the next step is to define relationships on the Define Relationships page. The primary dataset is on the canvas while any secondary sets are listed in the left pane. After establishing a relationship between two datasets, you can define the relationship by setting [join conditions](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#set-join-conditions) and [feature derivation windows (FDW)](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#set-feature-derivation-windows) for time-aware feature engineering.

To define relationships:

1. Click a secondary dataset to highlight it; notice the addition of a plus sign on the primary set.
2. Click the plus sign. DataRobot adds the selected secondary dataset to the canvas and opens the configuration editor. The following table describes the elements of theCreate new relationshippage: ElementDescription1Secondary dataset for joinSets the secondary dataset used in the join. Change via the dropdown to any added dataset. Changes are reflected in the canvas below.2Primary dataset for joinSets the primary dataset used in the join.3Suggested join conditionSets the join condition (feature) for the corresponding dataset (listed above the condition). DataRobot suggests up to five conditions, each of which is editable. Use the dropdown to select a new feature; use the trash icon () to delete the join.4Add join conditionProvides a manual join configuration option.5Save or Save and configure time-awareSaves the relationship configuration.Saveis the option if there is no date feature or you did not set a prediction point. If you did set aprediction pointfrom the primary dataset, theSave and configure time-awarebutton displays.6Canvas display controlsZooms in or out, or resets the default display size.7Dataset menu optionsProvides access to a variety of actions that can be enacted on aprimaryorsecondarydataset.8Join edit launchOpens the relationship editor, allowing you to define or modify the relationship between the datasets joined by the line you clicked.9Primary iconIndicates, with a bullseye icon, that this is the primary dataset.10Tour launchOpens a short tour that provides an overview of configuring Feature Discovery.11Continue to projectReturns to theStartscreen where you can revise your time-aware settings, set advanced options, set a modeling mode, and start the modeling process.

### Set join conditions

If tables in your datasets are well-formed, DataRobot automatically detects compatible features and creates up to five "suggested" joins. You can modify the suggested join using the dropdowns associated with each join key.

You can also manually create join keys by clicking Add join condition. In the resulting dialog, select a join feature from each dataset from the feature dropdown.

Once you've added all of your secondary datasets and selected your relationship configuration settings, click Save and configure time-aware or Save for a non time-aware project.

- If the project is not time-aware, the Start page displays.
- If the project is time-aware, the Time-aware feature engineering page displays where you can configure FDWs .

### Set feature derivation windows

After adding secondary datasets to a time-aware project, you can define the FDWs—a rolling window of past values used to generate features before the prediction point. The FDW constrains the time history—in the example below, no further back than 30 days, no more recent than 2 days.

1. ClickSelect time featureto choose a time index feature for the secondary dataset.
2. Configure the FDWs. You can configure up to three FDWs for each dataset, but each window must be unique. To add a FDW, clickAdd window. Once set, the FDW is reflected in the dataset's tile on the canvas: These time-aware settings ensure that the generated features are based only on records that occur before the prediction point. For more details, seeTime-aware feature engineering.

## Automatically generate relationships

To automatically generate relationships in a Feature Discovery project, make sure all [secondary datasets are added](https://docs.datarobot.com/en/docs/classic-ui/data/transform-data/feature-discovery/fd-overview.html#add-datasets), and then click Generate Relationships at the top of the Define Relationships page.

Once ARD is complete, relationships are automatically added to the primary dataset.

> [!NOTE] Note
> If you click Generate Relationships without adding any secondary datasets to the project, the button displays "Generating relationships" indefinitely.

## Work with datasets

Once a dataset is added to the canvas, you can modify and refine its configuration. Primary datasets appear on the canvas by default, but all secondary datasets must be added.

### Primary datasets

> [!NOTE] Note
> Be sure to save your configuration before using the menu options. Unsaved changes are lost when you leave a page.

Working from the canvas, you can select the menu option on the dataset tile. The primary dataset allows you to add a relationship or edit the prediction point:

| Option | Description |
| --- | --- |
| Add relation | Choose Add relation when you don't have any previous relationships configured to open the Create new relationship page. This is the equivalent of selecting the dataset from the list on the left and clicking the plus sign on the primary's canvas tile. Once the page opens, select a secondary dataset from the dropdown and it is added to the canvas. |
| Edit prediction point | Select Edit prediction point to choose a different date feature to use as your prediction point. |

### Secondary datasets

When a secondary dataset has been selected and moved to the canvas, a menu option becomes available on its tile. The table below describes the options available from the menu:

| Option | Description |
| --- | --- |
| Add relation | Opens the relationship editor and allows you to select a dataset (from any available in the left pane) to join with. |
| Edit alias | Allows you to set an alias for the dataset. The string displays on the canvas as the secondary dataset name. The alias does not change the display in the left-pane dataset list or the relationship editor pages. |
| Configure dataset | Opens the dataset configuration editor, where you can set dataset details. |
| Configure time-awareness | Opens the time-aware feature engineering configuration dialog, where you can select a time index for the secondary dataset or confirm that the correct date/time feature is selected. |
| Details | Click to open the Info window for the dataset in the AI Catalog. |
| Delete | Deletes the dataset, and all its relationships, from the current relationship configuration. The dataset is still available to the configuration and listed in the left panel. |

Selecting Configure dataset from a secondary dataset menu opens the Dataset Editor.

From here you can:

- Change the dataset alias. If not manually set, DataRobot auto-generates an alias based on the file name. Click in the box to modify the alias; the alias for the primary dataset cannot be modified.
- Choose a snapshot policy, either Latest, Fixed, or Dynamic, to use for this project. By default, the selected snapshot policy will apply atprediction time.
- Choose a feature list to apply against the corresponding dataset. Use this option to limit the size of the table by selecting relevant features. You can create new feature lists from theAI Catalog.

## Test relationship quality

After configuring at least one secondary dataset, you can test the quality of those relationship configurations to identify and resolve potential problems early in the creation process. The Relationship Quality Assessment tool verifies join keys, dataset selection, and time-aware settings before EDA2 begins.

Click the Review configuration button to trigger the Relationship Quality Assessment.

A progress indicator (loading spinner) displays on each dataset and on the Review Configuration button, which is disabled, to indicate that an assessment is currently running.

Once the assessment is complete, DataRobot marks all tested datasets. Those with identified issues display a yellow warning icon and those with no identified issues display a green tick.

Select the warning icon to view a summary of the issues with suggested potential fixes. A summary of the issues identified during the assessment is displayed at the top of the window.

> [!NOTE] Sampling percentage
> To improve run times, DataRobot subsamples approximately 10% of the primary dataset, speeding up the computation without impacting the enrichment rate estimation accuracy or the results of the assessment. The sampling percentage is included at the top of the report.

To open the detailed report, click the orange arrow on the right. DataRobot breaks down the assessment by category, providing additional information to diagnose the issue. If a secondary dataset has multiple FDWs, a detailed report is created for each one.

To resolve warnings, click the orange link displayed below each warning— Review dataset, Review relationship, or Review window settings—and a pane appears at the top of the relationship editor allowing you to modify relationship configurations.

After EDA2 completes and model building begins, you can view the most recent Relationship Quality Assessment in the Data > Feature Discovery tab.

## Start the project

1. Once you are happy with the definition of the relationship(s), clickContinue to projectto return to theStartscreen. TheSecondary Datasetssection provides visual queues that provide details about the secondary datasets. Visual queueIndicates1Datasets with blue textThe dataset is in use and part of the project.2Datasets with white textThe dataset is loaded but not part of the relationship definition.3Linked datasetsThe number of datasets linked with this dataset.4Number of datasets and relationshipsThe number of secondary datasets and how many have relationships defined.
2. ClickStart. DataRobot conducts feature engineering as part of EDA2 and begins generating model blueprints.

## Share assets

As with any DataRobot project, you can share Feature Discovery projects (depending on your permissions). The assignable [roles](https://docs.datarobot.com/en/docs/reference/misc-ref/roles-permissions.html) provide different levels of permission for the recipient. Unique to Feature Discovery projects, however, is the ability to share engineering graphs and datasets as well.

To share a project, click the share icon [https://docs.datarobot.com/en/docs/images/icon-share.png](https://docs.datarobot.com/en/docs/images/icon-share.png). For the recipient to interact with the project, they must have access to the additional assets. By default, assets are not shared. Check to enable sharing relationships and datasets, or DataRobot provides a warning:

Note that in addition to the assigned role, the listing of project users also indicates whether project assets have been shared.
