Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

View column lineage

Enable column lineage mode in the Steps pane to identify the project steps that resulted in the selected column.

To view the lineage of a column:

  1. Hover over the column icon.

  2. Click the Show Lineage Mode link that displays.

    Data Prep outlines the step-level transformations that contributed to the selected column's state.

    Use the outlines to identify the steps that affected the column or changed its data. If there are steps in the Editor that did not affect the column, those steps are grayed out, collapsed, and labeled to note the number of collapsed steps.

Lineage mode options

Following are options you can use when working in lineage mode.

  • Click any grayed-out step to expand the associated collapsed steps.

  • Click Show all Steps in the orange lineage mode header to expand all collapsed steps in your script.

  • Click X in the lineage mode header to close lineage mode.

Note

Lineage mode closes automatically when you mute a step in the Steps Editor pane or begin making new transformations in the project.

Example

A project has the following steps:

  1. Import a base dataset for customer contact information with a column for int’l cell numbers. In that column, all numbers follow this format: +44-2071838750.

  2. Perform a split operation on the dash in the int’l cell numbers column to create two new columns: country code and cell number.

  3. Rename the first newly created column: country code.

  4. Perform a Find + Replace operation on the country code column to remove the preceding + character.

  5. Rename the second newly created column: cell number.

  6. Use the column tool to hide the original int’l cell numbers column.

    When you enable Column Lineage mode for the cell number column, the second and fifth steps above are highlighted in the Steps Editor pane because those steps directly affect the data in the cell number column—the second step is the origin for the data and the fifth step is the new column name. All other steps are grayed out and collapsed because they do not affect the column.

Note

In addition to lineage mode, a column’s header color provides a quick reference to indicate the original data source for the column’s data. The color of the input step for the data source is used to identify all columns originating from that source. If there is no input data source for the column, for example, the column was created as the result of a compute column operation, then the column is color-coded with the project’s color.


Updated October 28, 2021
Back to top