Offset/exposure with Gamma distributions¶
How does DataRobot treat exposure and offset in model training with the target following a Gamma distribution?
The target is total claim cost while
exposure = claim count. So, in DataRobot, one can either set exposure equal to “claim count” or set
offset = ln(“claim count”). Should I reasonably expect that both scenarios are mathematically equivalent?
Yes, they are mathematically equivalent. You either multiply by the exposure or add the
Thanks, that was my impression as well. However, I did an experiment, setting up projects using the two approaches with the same feature list. One project seems to overpredict the target, while the other underpredicts. If they are mathematically equal, what might have caused the discrepancy?
Odd. Are you using the same error metric in both cases?
Yes, both projects used the recommended metric—Gamma Deviance.
Can you manually compare predictions and actuals by downloading the validation or holdout set predictions?
Upon further checking, I see I used the wrong feature name (for the exposure feature) in the project with the exposure setting. After fixing that, predictions from both projects match (by downloading from the Predict tab).
I did notice, however, that the Lift Charts are different.
That is likely a difference in how we calculate offset vs. exposure for Lift. I would encourage making your own Lift Charts in a notebook. Then you could use any method you want for handling weights, offset, and exposure in the Lift Chart.
We do have a great AI Accelerator for customizing lift charts.
Amazing. Thank you!