Change column values¶
In Data Prep, you can change data values using the Change into operation in the column menu. This example shows a change operation on the "Medical Specialty" column that changes text to uppercase.
Use the Change into operation to select one or more columns and then change the data in those columns to:
- Capital case
- Numeric values
- Unescaped HTML
- Custom values
- Trim leading and trailing spaces from cells in the column
- Collapse consecutive, multiple spaces into a single space
Change values in a single column¶
To find and replace text in a single column:
Locate the column where you want to change values.
Hover over the column menu icon , then hover over Change into and select the change you want to make.
Data Prep generates a copy of the original column that reflects the changes, for example:
Click Save at the top to accept the changes.
Change values in multiple columns¶
If you need to perform a change values across your entire dataset or a specific set of columns, you use the advanced Change into pane. Here are some examples where the advanced function is useful:
- The dataset has both "incorporated" and "Inc" everywhere. You want to standardize the entire dataset to have only the "Inc" value.
- The dataset has "incorporated" everywhere, and for the most part that's accurate. But you need to change the value to "Inc" for some specific columns in the dataset.
- You've pulled two datasets into your Project—one has "NA" and the other uses blanks to represent non-applicable values. You want to change all of the "NA" values into blanks.
To find and replace across multiple columns:
Hover over any column's menu icon and click find + replace.
Click the column name that appears in the Find + Replace pane.
In the advanced Find + Replace pane that displays, click the check box next to each column that you want to include in the find and replace operation.
The rest of the steps for find and replace across multiple columns are the same as the steps for find and replace for a single column. See Find and replace.
Changing values by Name or Criteria¶
In the advanced Find + Replace pane, you can select multiple columns by either Name or Criteria.
Change values by Name:¶
Finding and replacing by Name applies the replace operation only to the specific columns you select.
To select columns by Name:
- Click the check box adjacent to the column(s) that you want to select.
- Click the top-most check box to select all columns.
- Use the Columns and Types filters at the top of the panel to quickly filter down to the columns you want to select for the operation.
- Use the search function to locate a column by name.
Change values by Criteria:¶
Finding and replacing by Criteria applies the replace operation to any column that meets the criteria you specify.
For example, if you have String type columns in your dataset and you specify the replace operation for String type columns, then all existing columns of this type in your dataset—and any new String type columns that are introduced to the dataset prior to this Step—will be dynamically replaced.
To select columns based on criteria:
- Optionally specify the data type of the column—Boolean, DateTime, Number or String.
- Optionally specify the pattern for the column name—contains, starts with, equals or ends with.
Notice the header message updates to indicate the number of columns you have selected based on that criteria. You may later notice the number of selected columns increases or decreases if new data is brought into an earlier Step that introduces or removes columns that meet your criteria.
If you switch between the Name and Criteria options before saving the replace operation, Data Prep retains your selections and provides a Restore last selection link that returns you to your initial selection method.
Example: Change into Numeric¶
This column operation converts all numbers stored as text strings into numeric values. By doing this, mathematical operations can be performed on values in this column when, as numbers stored as text, these actions would otherwise be considered invalid.
Numbers stored as strings appear left-aligned within a cell and in black text; numbers stored as numeric values are right-aligned and appear in green.
When this operation is applied to cells that cannot be converted to numeric values, it will have no effect. In a column with both text and numbers on different rows, only those rows that can be converted will be changed.
If a value that appears suitable for conversion is not successfully converted, it is likely that there are non-number characters somewhere in the cell.
The following are examples of characters that can inhibit the transformation:
Leading or trailing spaces. These can be removed by using the column opreation for "White Space trim leading and trailing" before your apply the "Transform into numeric operation".
The "White Space trim leading and trailing" operation examines all rows for spaces at both the beginning and end of the text string. Where it finds them, they are removed—leaving only the value in the cell.
Intermediate characters (such as commas or spaces.) Operations such as Column split or a Compute columns that uses REGEX may be required, first, in order to successfully create a column of numeric values.
A single period (“.”) in a cell of numbers will be interpreted as a decimal point. These strings will be able to be converted into numeric values without requiring any other operations.