Data Quality Handling Report¶
The Data Quality Handling Report can be found in a model's Describe division.
The report includes the following information based on the training data:
|Feature Name||Displays the feature name. Every feature in the dataset is listed, as well as transformed and OTV derived features.|
|Variable Type||The feature's variable type.|
|Row Count||Reports the number of rows in which the feature is missing from the training data. Click the column heading to change the sort order frequency.|
|Percentage||Reports, as a percentage, the number of rows in which the feature is missing from the training data. Click the column heading to change the sort order frequency.|
|Data Transformation Information||Lists the imputation task applied to the feature as well as the applied value. If more than one imputation task applies, all tasks are listed.|
Additionally, you can:
- Use Search to find a specific feature.
- Filter by column header.
The Data Quality Handling Report tab reports on the following supported tasks:
- Numeric values imputed
- Numeric data cleansing
- Ordinal encoding of categorical variables
- Categorical Embedding
- Category Count
- One-Hot Encoding
- VW encoding of categorical variables
The task information that can be returned in the Data Transformation Information column includes:
the name of the task.
the imputed value inserted in the place of the missing value. Different preprocessing tasks have different strategies for assigning the value to use for imputation. In some cases, this can be tuned on the Advanced Tuning tab.
if DataRobot created a missing indicator feature, it displays
Missing indicator treated as feature. This indicates that DataRobot created a new feature inside the blueprint with 1s in the rows where values in the original feature were missing and 0s where the original feature had a value. Sometimes the pattern of rows containing missing values is predictive and can increase accuracy when input into the model.
(categorical features only) if DataRobot treated missing values as infrequent values, it displays
Missing values treated as infrequent. This means that a row with a missing value is handled as if that row had a categorical value that did not occur very often in the feature. Different blueprints may handle infrequent values in categorical features differently.
(categorical features only) if DataRobot treated infrequent values as missing values, it displays
Infrequent values treated as missing. This means that a row with an infrequent value is handled as if that row had a missing value for that feature.
for categorical features, if missing values were ignored, DataRobot displays
Missing values ignored.