Row weighting blocks¶
You can configure row weight to help improve performance for your models. The Row Weighting parameter, weight_expr, is available within the Prediction Model Parameters and can be modified as part of tuning Eureqa models.
This Eureqa model-level row weight is separate from the DataRobot project-level row weighting (as set from Advanced options). If set, the DataRobot project-level row weight affects how DataRobot calculates the validation score for the Eureqa models (e.g., performance on out-of-sample data) but has no effect on how these models are optimized.
When to use a row weight¶
The following are some common scenarios for which using a row weight may help to improve performance:
Suppose for each data point you have a confidence value that you determined while collecting the data or computed in some other program. Create a variable (i.e., a column) containing those confidence values and designate it as the row weight variable. DataRobot will weight the data accordingly, giving more weight to those values with higher confidence.
Suppose you want to give extra weight to a few important data points. You could give those points more weight by adding a new column to your data before you upload it to DataRobot. This new variable should label important rows with 10, 100, or 1000 (or some other weight) and set the remaining rows to 1.
Suppose you want to balance your data by giving more weight to rare events than to common ones. More specifically, suppose you want to model credit card fraud, and 99.99% of the data points are legitimate transactions while 0.01% are fraudulent. You could create a variable whose value is 1 in rows representing legitimate transactions and 9999 (i.e., 99.99% / 0.01%) in rows representing fraud, thereby creating equal pressure to model both legitimate and fraudulent cases.
Row weight variable¶
Include a row weight variable in your dataset before it is uploaded to DataRobot to reference it as a row weight variable during model creation. Then, when tuning the model, type the name of that variable as the row weight variable. This tells DataRobot to weight the rest of the data in each row in proportion to the value of the row weight variable in that row.
Row weight expression¶
Some row weighting schemes can be more easily achieved with a row weight expression than with a row weight variable. When defined, DataRobot will evaluate that row weight expression using the values in that row, and then weight the row with the result.
1 / occurrences (variable_name)¶
This expression provides a quick way to balance data. To illustrate, let's imagine a toy dataset containing just three values of one variable:
|1 / occurrences (x)
The value returned by occurrences(x) is the number of times a particular value of x occurs in the dataset; in this case, it would return 2 in the first row, 2 in the second row, and 1 in the third row. Selecting 1/occurrences(x) as your row weight would therefore give the first row a weight of 1/2, the second row a weight of 1/2, and the third row a weight of 1.
Returning to the credit card fraud example (shown above), you could create variable z with a value of 0 in rows representing legitimate transactions and 1 in rows representing fraudulent ones. Selecting a row weight of 1/occurrences(x) would then automatically create equal pressure to model legitimate and fraudulent transactions. If new data is added, weights are automatically adjusted to maintain the balance.
The special variable <row> takes on the value of the row number. Using this as the row weight will give the first row a weight of 1, the second row a weight of 2, and so on.
Row weighting can improve results for sparse datasets in which the target behavior only happens very rarely (such as for fraud and failures). Using this row weighting expression will help isolate and highlight those sparse signals in the data.
Other row weight expressions¶
Aside from the special row weighting variable, the best option for creating a custom row weight is to derive a new variable (feature), use a custom expression to populate the column automatically with the desired row weights, and use that new derived variable as the row weight variable directly. For information on deriving a new variable, see the documentation for feature transformations.
The following example expressions assume the dataset contains variables x and y:
- abs( x ) gives row weights in proportion to the absolute value of x.
- 1 / abs( x-y ) gives row weights in inverse proportion to the difference between x and y.
- 1 / <row> gives row 1 a weight of 1, row 2 a weight of 1/2, row 3 a weight of 1/3, ...
- 0.5 + 0.5 * ( <row> <= 100 ) gives row 1 through 100 a weight of 1 and the remaining rows a weight of 0.5. (Note that <= returns 1 if satisfied, 0 if unsatisfied.)