Find and replace in columns¶
The Data Prep find and replace operations allows you to locate and replace text within specified columns. This example shows a find and replace operation on the "Medical Specialty" column where
? values are replaced by blanks.
You can also find and replace across multiple or all columns. See Multiple find and replace operations for details.
Find and replace¶
To find and replace text in a single column:
Locate the column where you want to find and replace values.
Hover over the column menu icon and click find + replace.
In the Find field, specify a value to find.
Alternatively, double-click a cell that has the value you want to find; the fields for find and replace are then automatically populated with the value from that cell.
In the using field, select how to match the text specified in the Find field:
- Contains: The match can be anywhere in the cell value.
- Starts With: The match must be at the beginning of the cell value.
- Equals: The match must be exact.
- Ends With: The match must be at the end of the cell value.
In the and replace field, select what portion of the match to replace:
- the entire cell: Replaces the entire contents of the cell.
- the match: Replaces only the matched portion of the cell.
Data Prep generates a copy of the original column that reflects the changes, for example:
Click Save at the top to accept the changes.
Example 1: original cell value = "123456" Find: "123" and replace with Match "321" and the result is: "321456"
Example 2: original cell value = "123456" Find: "123" and replace Entire Cell with "321" and the result is: "321"
If the Steps tool is turned off, the Save button is displayed in the Find + Replace pane. If you enable the Steps tool, the Save button is located at the top of the Steps pane.
Multiple find and replace operations¶
If you want to continue making replacements and save all of the find and replacement transformations to the current step, click the plus (+) to continue.
Each additional find and replace operation is iterative—a subsequent find operates on the results of the previous find and replace operation and all of the transformations are applied in order, from top to bottom as they are displayed in the Find + Replace pane.
Examples of multiple find and replace operations¶
In the following example, the first operation finds all instances of "memorial" and then replaces the entire cell—not just the matched text—with a null value:
find "Detroit" and replace with "San Francisco" find "San Francisco" and replace with "San Jose" The result is that "Detroit" is transformed into "San Jose"
Note that you can rearrange the sequence of your transformations at any time by dragging and dropping a find and replace operation to another position in the panel.
To save all find and replace transformations for this step, click Save. Or to remove any find and replace operation for this step, click the minus button for that operation.
Important notes for find and replace¶
By default, capitalization is ignored in the search. For example, the text cat matches the text CaT. If capitalization is important, deselect the Ignore case checkbox.
By default, you are limited to 250 find and replace operations in a single project step. If you exceed this number, an error message is displayed in the Step tool and you cannot continue find and replace transformations on the column. Contact your system administrator if you need to increase the limit.
The highlight feature on the grid is only enabled for a single Find + Replace transformation; it is disabled if you add another Find + Replace transformation.
Find and replace across multiple columns¶
If you need to perform a search and replace across your entire dataset or a specific set of columns, you use the advanced Find + Replace pane. Here are some examples where the advanced function is useful:
- The dataset has both "incorporated" and "Inc" everywhere. You want to standardize the entire dataset to have only the "Inc" value.
- The dataset has "incorporated" everywhere, and for the most part that's accurate. But you need to change the value to "Inc" for some specific columns in the dataset.
- You've pulled two datasets into your project—one has "NA" and the other uses blanks to represent non-applicable values. You want to change all of the "NA" values into blanks.
To find and replace across multiple columns:
Hover over any column's menu icon and click find + replace.
Click the column name that appears in the Find + Replace pane.
In the advanced Find + Replace pane that displays, click the check box next to each column that you want to include in the find and replace operation.
The rest of the steps for find and replace across multiple columns are the same as the steps for find and replace for a single column. See Find and replace.
Finding and replacing by Name or Criteria¶
In the advanced Find + Replace pane, you can select multiple columns by either Name or Criteria.
Find and replace by Name¶
Finding and replacing by Name applies the replace operation only to the specific columns you select.
To select columns by Name:
- Click the check box adjacent to the column(s) that you want to select.
- Click the top-most check box to select all columns.
- Use the Columns and Types filters at the top of the panel to quickly filter down to the columns you want to select for the operation.
- Use the search function to locate a column by name.
Find and replace by Criteria¶
Finding and replacing by Criteria applies the replace operation to any column that meets the criteria you specify.
For example, if you have String type columns in your dataset and you specify the replace operation for String type columns, then all existing columns of this type in your dataset—and any new String type columns that are introduced to the dataset prior to this step—will be dynamically replaced.
To select columns based on criteria:
- Optionally specify the data type of the column—Boolean, DateTime, Number or String.
- Optionally specify the pattern for the column name—contains, starts with, equals or ends with.
Notice the header message updates to indicate the number of columns you have selected based on that criteria. You may later notice the number of selected columns increases or decreases if new data is brought into an earlier step that introduces or removes columns that meet your criteria.
If you switch between the Name and Criteria options before saving the replace operation, Data Prep retains your selections and provides a Restore last selection link that returns you to your initial selection method.