Insights¶
class datarobot.insights.ShapMatrix¶
Class for SHAP Matrix calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
property matrix : Any¶
SHAP matrix values.
property base_value : float¶
SHAP base value for the matrix values
property columns : List[str]¶
List of columns associated with the SHAP matrix
property link_function : str¶
Link function used to generate the SHAP matrix
classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)¶
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- entity_id (
- Returns: Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)¶
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source. - max_wait (
int
) – The number of seconds to wait for the result.
- entity_id (
- Returns: Entity of the newly or already computed insights.
- Return type:
Self
classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values. - Return type:
TypeVar
(T
, bound= APIObject)
classmethod from_server_data(data, keep_attrs=None)¶
Override from_server_data to handle paginated responses
- Return type:
Self
classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)¶
Return the first matching insight based on the entity id and kwargs.
- Parameters:
- entity_id (
str
) – The ID of the entity to retrieve generated insights. - source (
str
) – The source type to use when retrieving the insight. - quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- entity_id (
- Returns: Previously computed insight.
- Return type:
Self
classmethod get_as_csv(entity_id, **kwargs)¶
Retrieve a specific insight represented in CSV format.
- Parameters:
- entity_id (
str
) – ID of the entity to retrieve the insight. - **kwargs (
Any
) – Additional keyword arguments to pass to the retrieve function.
- entity_id (
- Returns: The retrieved insight.
- Return type:
str
classmethod get_as_dataframe(entity_id, **kwargs)¶
Retrieve a specific insight represented as a pandas DataFrame.
- Parameters:
- entity_id (
str
) – ID of the entity to retrieve the insight. - **kwargs (
Any
) – Additional keyword arguments to pass to the retrieve function.
- entity_id (
- Returns: The retrieved insight.
- Return type:
DataFrame
get_uri()¶
This should define the URI to their browser based interactions
- Return type:
str
classmethod list(entity_id)¶
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights. - Returns: List of newly or previously computed insights.
- Return type:
List[Self]
open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
sort(key_name)¶
Sorts insights data
- Return type:
None
class datarobot.insights.ShapPreview¶
Class for SHAP Preview calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
property previews : List[Dict[str, Any]]¶
SHAP preview values.
- Returns: preview – A list of the ShapPreview values for each row.
- Return type:
List[Dict[str
,Any]]
property previews_count : int¶
The number of shap preview rows.
- Return type:
int
classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, prediction_filter_row_count=None, prediction_filter_percentiles=None, prediction_filter_operand_first=None, prediction_filter_operand_second=None, prediction_filter_operator=None, feature_filter_count=None, feature_filter_name=None, **kwargs)¶
Return the first matching ShapPreview insight based on the entity id and kwargs.
- Parameters:
- entity_id (
str
) – The ID of the entity to retrieve generated insights. - source (
str
) – The source type to use when retrieving the insight. - quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching. - prediction_filter_row_count (
Optional[int]
) – The maximum number of preview rows to return. - prediction_filter_percentiles (
Optional[int]
) – The number of percentile intervals to select from the total number of rows. This field will supersede predictionFilterRowCount if both are present. - prediction_filter_operand_first (
Optional[float]
) – The first operand to apply to filtered predictions. - prediction_filter_operand_second (
Optional[float]
) – The second operand to apply to filtered predictions. - prediction_filter_operator (
Optional[str]
) – The operator to apply to filtered predictions. - feature_filter_count (
Optional[int]
) – The maximum number of features to return for each preview. - feature_filter_name (
Optional[str]
) – The names of specific features to return for each preview.
- entity_id (
- Returns: List of newly or already computed insights.
- Return type:
List[Any]
classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)¶
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- entity_id (
- Returns: Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)¶
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source. - max_wait (
int
) – The number of seconds to wait for the result.
- entity_id (
- Returns: Entity of the newly or already computed insights.
- Return type:
Self
classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values. - Return type:
TypeVar
(T
, bound= APIObject)
classmethod from_server_data(data, keep_attrs=None)¶
Override from_server_data to handle paginated responses
- Return type:
Self
get_uri()¶
This should define the URI to their browser based interactions
- Return type:
str
classmethod list(entity_id)¶
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights. - Returns: List of newly or previously computed insights.
- Return type:
List[Self]
open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
sort(key_name)¶
Sorts insights data
- Return type:
None
class datarobot.insights.ShapImpact¶
Class for SHAP Impact calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
sort(key_name='-impact_normalized')¶
Sorts insights data by key name.
- Parameters:
key_name (
str
) – item key name to sort data. One of ‘feature_name’, ‘impact_normalized’ or ‘impact_unnormalized’. Starting with ‘-’ reverses sort order. Default ‘-impact_normalized’ - Return type:
None
property shap_impacts : List[List[Any]]¶
SHAP impact values
- Returns: A list of the SHAP impact values
- Return type:
shap impacts
property base_value : List[float]¶
A list of base prediction values
property capping : Dict[str, Any] | None¶
Capping for the models in the blender
property link : str | None¶
Shared link function of the models in the blender
property row_count : int | None¶
Number of SHAP impact rows. This is deprecated.
classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)¶
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- entity_id (
- Returns: Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)¶
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source. - max_wait (
int
) – The number of seconds to wait for the result.
- entity_id (
- Returns: Entity of the newly or already computed insights.
- Return type:
Self
classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values. - Return type:
TypeVar
(T
, bound= APIObject)
classmethod from_server_data(data, keep_attrs=None)¶
Override from_server_data to handle paginated responses
- Return type:
Self
classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)¶
Return the first matching insight based on the entity id and kwargs.
- Parameters:
- entity_id (
str
) – The ID of the entity to retrieve generated insights. - source (
str
) – The source type to use when retrieving the insight. - quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- entity_id (
- Returns: Previously computed insight.
- Return type:
Self
get_uri()¶
This should define the URI to their browser based interactions
- Return type:
str
classmethod list(entity_id)¶
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights. - Returns: List of newly or previously computed insights.
- Return type:
List[Self]
open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
class datarobot.insights.ShapDistributions¶
Class for SHAP Distributions calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
property features : List[Dict[str, Any]]¶
SHAP feature values
- Returns: features – A list of the ShapDistributions values for each row
- Return type:
List[Dict[str
,Any]]
property total_features_count : int¶
Number of shap distributions features
- Return type:
int
classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)¶
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- entity_id (
- Returns: Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)¶
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
- entity_id (
str
) – The ID of the entity to compute the insight. - source (
str
) – The source type to use when computing the insight. - data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight. - external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight. - entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”. - quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source. - max_wait (
int
) – The number of seconds to wait for the result.
- entity_id (
- Returns: Entity of the newly or already computed insights.
- Return type:
Self
classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values. - Return type:
TypeVar
(T
, bound= APIObject)
classmethod from_server_data(data, keep_attrs=None)¶
Override from_server_data to handle paginated responses
- Return type:
Self
classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)¶
Return the first matching insight based on the entity id and kwargs.
- Parameters:
- entity_id (
str
) – The ID of the entity to retrieve generated insights. - source (
str
) – The source type to use when retrieving the insight. - quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- entity_id (
- Returns: Previously computed insight.
- Return type:
Self
get_uri()¶
This should define the URI to their browser based interactions
- Return type:
str
classmethod list(entity_id)¶
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights. - Returns: List of newly or previously computed insights.
- Return type:
List[Self]
open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
sort(key_name)¶
Sorts insights data
- Return type:
None
Types¶
class datarobot.models.RocCurveEstimatedMetric¶
Typed dict for estimated metric
class datarobot.models.AnomalyAssessmentRecordMetadata¶
Typed dict for record metadata
class datarobot.models.AnomalyAssessmentPreviewBin¶
Typed dict for preview bin
class datarobot.models.ShapleyFeatureContribution¶
Typed dict for shapley feature contribution
class datarobot.models.AnomalyAssessmentDataPoint¶
Typed dict for data points
class datarobot.models.RegionExplanationsData¶
Typed dict for region explanations
Anomaly assessment¶
class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord¶
Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.
Added in version v2.25.
- Variables:
- record_id (
str
) – The ID of the record. - project_id (
str
) – The ID of the project record belongs to. - model_id (
str
) – The ID of the model record belongs to. - backtest (
int
or"holdout"
) – The backtest of the record. - source (
"training"
or"validation"
) – The source of the record - series_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects. - status (
str
) – The status of the insight. One ofdatarobot.enums.AnomalyAssessmentStatus
- status_details (
str
) – The explanation of the status. - start_date (
str
orNone
) – The ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED. - end_date (
str
orNone
) – The ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED. - prediction_threshold (
float
orNone
) – The threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. - preview_location (
str
orNone
) – The URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED. - latest_explanations_location (
str
orNone
) – The URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED. - delete_location (
str
) – The URL to delete anomaly assessment record and relevant insight data.
- record_id (
classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶
Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.
- Parameters:
- project_id (
str
) – The ID of the project record belongs to. - model_id (
str
) – The ID of the model record belongs to. - backtest (
int
or"holdout"
) – The backtest to filter records by. - source (
"training"
or"validation"
) – The source to filter records by. - series_id (
Optional[str]
) – The series id to filter records by. Can be specified for multiseries projects. - limit (
Optional[int]
) – 100 by default. At most this many results are returned. - offset (
Optional[int]
) – This many results will be skipped. - with_data_only (
bool
,False by default
) – Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.
- project_id (
- Returns: The anomaly assessment record.
- Return type:
AnomalyAssessmentRecord
classmethod compute(project_id, model_id, backtest, source, series_id=None)¶
Request anomaly assessment insight computation on the specified subset.
- Parameters:
- project_id (
str
) – The ID of the project to compute insight for. - model_id (
str
) – The ID of the model to compute insight for. - backtest (
int
or"holdout"
) – The backtest to compute insight for. - source (
"training"
or"validation"
) – The source to compute insight for. - series_id (
Optional[str]
) – The series id to compute insight for. Required for multiseries projects.
- project_id (
- Returns: The anomaly assessment record.
- Return type:
AnomalyAssessmentRecord
delete()¶
Delete anomaly assessment record with preview and explanations.
- Return type:
None
get_predictions_preview()¶
Retrieve aggregated predictions statistics for the anomaly assessment record.
- Return type:
AnomalyAssessmentPredictionsPreview
get_latest_explanations()¶
Retrieve latest predictions along with shap explanations for the most anomalous records.
- Return type:
AnomalyAssessmentExplanations
get_explanations(start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
- start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
- end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
- points_count (
Optional[int]
) – The number of the rows to return.
- start_date (
- Return type:
AnomalyAssessmentExplanations
get_explanations_data_in_regions(regions, prediction_threshold=0.0)¶
Get predictions along with explanations for the specified regions, sorted by predictions in descending order.
- Parameters:
- regions (
list
ofAnomalyAssessmentPreviewBin
) – For each region explanations will be retrieved and merged. - prediction_threshold (
Optional[float]
) – If specified, only points with score greater or equal to the threshold will be returned.
- regions (
- Returns: dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
- Return type:
RegionExplanationsData
class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations¶
Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.
Added in version v2.25.
- Variables:
- record_id (
str
) – The ID of the record. - project_id (
str
) – The ID of the project record belongs to. - model_id (
str
) – The ID of the model record belongs to. - backtest (
int
or"holdout"
) – The backtest of the record. - source (
"training"
or"validation"
) – The source of the record. - series_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects. - start_date (
str
orNone
) – The ISO-formatted datetime of the first row in thedata
. Will be None of there is no data in the specified range. - end_date (
str
orNone
) – The ISO-formatted datetime of the last row in thedata
. Will be None of there is no data in the specified range. - shap_base_value (
float
) – Shap base value. - count (
int
) – The number of points indata
. - data (
array
of DataPoint objects orNone
) – The list of DataPoint objects in the specified date range.
- record_id (
Notes
DataPoint
contains:
shap_explanation
: None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.timestamp
(str) : ISO-formatted timestamp for the row.prediction
(float) : The output of the model for this row.
ShapleyFeatureContribution
contains:
feature_value
(str) : the feature value for this row. First 50 characters are returned.strength
(float) : the shap value for this feature and row.feature
(str) : the feature name.
classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
- project_id (
str
) – The ID of the project. - record_id (
str
) – The ID of the anomaly assessment record. - start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
- end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
- points_count (
Optional[int]
) – The number of the rows to return.
- project_id (
- Return type:
AnomalyAssessmentExplanations
class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview¶
Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.
Added in version v2.25.
- Variables:
- record_id (
str
) – The ID of the record. - project_id (
str
) – The ID of the project record belongs to. - model_id (
str
) – The ID of the model record belongs to. - backtest (
int
or"holdout"
) – The backtest of the record. - source (
"training"
or"validation"
) – The source of the record - series_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects. - start_date (
str
) – the ISO-formatted timestamp of the first prediction in the subset. - end_date (
str
) – the ISO-formatted timestamp of the last prediction in the subset. - preview_bins (
list
ofpreview_bin objects.
) – The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.
- record_id (
Notes
PreviewBin
contains:
start_date
(str) : the ISO-formatted datetime of the start of the bin.end_date
(str) : the ISO-formatted datetime of the end of the bin.avg_predicted
(float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.max_predicted
(float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.frequency
(int) : the number of the rows in the bin.
classmethod get(project_id, record_id)¶
Retrieve aggregated predictions over time.
- Parameters:
- project_id (
str
) – The ID of the project. - record_id (
str
) – The ID of the anomaly assessment record.
- project_id (
- Return type:
AnomalyAssessmentPredictionsPreview
find_anomalous_regions(max_prediction_threshold=0.0)¶
Sort preview bins by max_predicted value and select those with max predicted value : greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
- Parameters:
max_prediction_threshold (
Optional[float]
) – Return bins with maximum anomaly score greater or equal to max_prediction_threshold. - Returns: preview_bins – Filtered and sorted preview bins
- Return type:
list
ofpreview_bin
Confusion chart¶
class datarobot.models.confusion_chart.ConfusionChart¶
Confusion Chart data for model.
Notes
ClassMetrics
is a dict containing the following:
class_name
(string) name of the classactual_count
(int) number of times this class is seen in the validation datapredicted_count
(int) number of times this class has been predicted for the validation dataf1
(float) F1 scorerecall
(float) recall scoreprecision
(float) precision scorewas_actual_percentages
(list of dict) one vs all actual percentages in format specified below. : *other_class_name
(string) the name of the other class *percentage
(float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)was_predicted_percentages
(list of dict) one vs all predicted percentages in format specified below. : *other_class_name
(string) the name of the other class *percentage
(float) the percentage of the times this class was actual predicted (from 0 to 1)confusion_matrix_one_vs_all
(list of list) 2d list representing 2x2 one vs all matrix. : * This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like: *[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
- Variables:
- source (
str
) – Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’. - raw_data (
dict
) – All of the raw data for the Confusion Chart - confusion_matrix (
list
oflist
) – The N x N confusion matrix - classes (
list
) – The names of each of the classes - class_metrics (
list
ofdicts
) – List of dicts with schema described asClassMetrics
above. - source_model_id (
str
) – ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used
- source (
Lift chart¶
class datarobot.models.lift_chart.LiftChart¶
Lift chart data for model.
Notes
LiftChartBin
is a dict containing the following:
actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
- Variables:
- source (
str
) – Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’. - bins (
list
ofdict
) – List of dicts with schema described asLiftChartBin
above. - source_model_id (
str
) – ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used - target_class (
Optional[str]
) – For multiclass lift - target class for this lift chart data. - data_slice_id (
string
orNone
) – The slice to retrieve Lift Chart for; if None, retrieve unsliced data.
- source (
classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶
Overwrite APIObject.from_server_data to handle lift chart data retrieved from either legacy URL or /insights/ new URL.
- Parameters:
- data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place - use_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/liftChart/ URL to the format used in the legacy URL.
- data (
Data slices¶
class datarobot.models.data_slice.DataSlice¶
Definition of a data slice
- Variables:
- id (
str
) – ID of the data slice. - name (
str
) – Name of the data slice definition. -
filters (
list[DataSliceFiltersType]
) –List of DataSliceFiltersType with params : - operand (str) Name of the feature to use in the filter. - operator (str) Operator to use in the filter - eq, in, <, or >. - values (Union[str, int, float]) Values to use from the feature. * project_id (
str
) – ID of the project that the model is part of.
- id (
classmethod list(project, offset=0, limit=100)¶
List the data slices in the same project
- Parameters:
- project (
Union[str
,Project]
) – ID of the project or Project object from which to list data slices. - offset (
Optional[int]
) – Number of items to skip. - limit (
Optional[int]
) – Number of items to return.
- project (
- Returns: data_slices
- Return type:
list[DataSlice]
Examples
>>> import datarobot as dr
>>> ... # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slices
[DataSlice(...), DataSlice(...), ...]
classmethod create(name, filters, project)¶
Creates a data slice in the project with the given name and filters
- Parameters:
- name (
str
) – Name of the data slice definition. -
filters (
list[DataSliceFiltersType]
) –List of filters (dict) with params: : - operand (str) : Name of the feature to use in filter. - operator (str) : Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’. - values (Union[str, int, float]) : Values to use from the feature. * project (
Union[str
,Project]
) – Project ID or Project object from which to list data slices. * Returns: data_slice – The data slice object created * Return type:DataSlice
- name (
Examples
>>> import datarobot as dr
>>> ... # set up your Client and retrieve a project
>>> data_slice = dr.DataSlice.create(
>>> ... name='yes',
>>> ... filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
>>> ... project=project,
>>> ... )
>>> data_slice
DataSlice(
filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
id=646d1296bd0c543d88923c9d,
name=yes,
project_id=646d0ea0cd8eb2355a68b0e5
)
delete()¶
Deletes the data slice from storage
:rtype: None
Examples
>>> import datarobot as dr
>>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f')
>>> data_slice.delete()
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project) # project object or project_id
>>> data_slice = data_slices[0] # choose a data slice from the list
>>> data_slice.delete()
request_size(source, model=None)¶
Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source
- Parameters:
- source (
INSIGHTS_SOURCES
) – Subset of data (partition or “source”) on which to apply the data slice for estimating available rows. - model (
Optional[Union[str
,Model]]
) – Model object or ID of the model. It is only required when source is “training”.
- source (
- Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
- Return type:
StatusCheckJob
Examples
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project) # project object or project_id
>>> data_slice = data_slices[0] # choose a data slice from the list
>>> status_check_job = data_slice.request_size("validation")
Model is required when source is ‘training’
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project) # project object or project_id
>>> data_slice = data_slices[0] # choose a data slice from the list
>>> status_check_job = data_slice.request_size("training", model)
get_size_info(source, model=None)¶
Get information about the data slice applied to a source
- Parameters:
- source (
INSIGHTS_SOURCES
) – Source (partition or subset) to which the data slice was applied - model (
Optional[Union[str
,Model]]
) – ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.
- source (
- Returns: slice_size_info – Information of the data slice applied to a source
- Return type:
DataSliceSizeInfo
Examples
>>> import datarobot as dr
>>> ... # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slice = slices[0] # can be any slice in the list
>>> data_slice_size_info = data_slice.get_size_info("validation")
>>> data_slice_size_info
DataSliceSizeInfo(
data_slice_id=6493a1776ea78e6644382535,
messages=[
{
'level': 'WARNING',
'description': 'Low Observation Count',
'additional_info': 'Insufficient number of observations to compute some insights.'
}
],
model_id=None,
project_id=646d0ea0cd8eb2355a68b0e5,
slice_size=1,
source=validation,
)
>>> data_slice_size_info.to_dict()
{
'data_slice_id': '6493a1776ea78e6644382535',
'messages': [
{
'level': 'WARNING',
'description': 'Low Observation Count',
'additional_info': 'Insufficient number of observations to compute some insights.'
}
],
'model_id': None,
'project_id': '646d0ea0cd8eb2355a68b0e5',
'slice_size': 1,
'source': 'validation',
}
>>> import datarobot as dr
>>> ... # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("validation")
When using source=’training’, the model param is required.
>>> import datarobot as dr
>>> ... # set up your Client
>>> model = dr.Model.get(project_id, model_id)
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr
>>> ... # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model_id)
classmethod get(data_slice_id)¶
Retrieve a specific data slice.
- Parameters:
data_slice_id (
str
) – The identifier of the data slice to retrieve. - Returns: data_slice – The required data slice.
- Return type:
DataSlice
Examples
>>> import datarobot as dr
>>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9')
DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
id=648b232b9da812a6aaa0b7a9,
name=test,
project_id=644bc575572480b565ca42cd
)
class datarobot.models.data_slice.DataSliceSizeInfo¶
Definition of a data slice applied to a source
- Variables:
- data_slice_id (
str
) – ID of the data slice - project_id (
str
) – ID of the project - source (
str
) – Data source used to calculate the number of rows (slice size) after applying the data slice’s filters - model_id (
Optional[str]
) – ID of the model, required when source (subset) is ‘training’ - slice_size (
int
) – Number of rows in the data slice for a given source - messages (
list[DataSliceSizeMessageType]
) – List of user-relevant messages related to a data slice
- data_slice_id (
Datetime trend plots¶
class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata¶
Accuracy over Time metadata for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - forecast_distance (
int
orNone
) – The forecast distance for which the metadata was retrieved. None for OTV projects. - resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved. - backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details. - holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details. - backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details. - holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
- project_id (
Notes
Backtest/holdout status is a dict containing the following:
- training: string
: Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
: Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict : Start and end dates for the backtest/holdout training.
- validation: dict : Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None : The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None : The datetime of the end of the chart data (exclusive). None if chart data is not computed.
class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot¶
Accuracy over Time plot for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details. - statistics (
dict
) – Statistics for plot. See statistics info in Notes for more details. - calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
- actual: float or None : Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None : Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None : Indicates number of values averaged in bin.
Statistics is a dict containing the following:
- durbin_watson: float or None : The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic
Calendar event is a dict containing the following:
- name: string : Name of the calendar event.
- date: datetime : Date of the calendar event.
- series_id: string or None : The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview¶
Accuracy over Time plot preview for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
- actual: float or None : Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None : Average prediction of the model in the bin. None if there are no entries in the bin.
class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata¶
Forecast vs Actual plots metadata for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved. - backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details. - holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details. - backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details. - holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
- project_id (
Notes
Backtest/holdout status is a dict containing the following:
- training: dict
: Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value. - validation: dict
: Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
Backtest/holdout metadata is a dict containing the following:
- training: dict : Start and end dates for the backtest/holdout training.
- validation: dict : Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None : The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None : The datetime of the end of the chart data (exclusive). None if chart data is not computed.
class datarobot.models.datetime_trend_plots.ForecastVsActualPlot¶
Forecast vs Actual plot for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - forecast_distances (
list
ofint
) – A list of forecast distances that were retrieved. - resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details. - calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
- actual: float or None : Average actual value of the target in the bin. None if there are no entries in the bin.
- forecasts: list of float : A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
- error: float or None : Average absolute residual value of the bin. None if there are no entries in the bin.
- normalized_error: float or None : Normalized average absolute residual value of the bin. None if there are no entries in the bin.
- frequency: int or None : Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string : Name of the calendar event.
- date: datetime : Date of the calendar event.
- series_id: string or None : The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview¶
Forecast vs Actual plot preview for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
- actual: float or None : Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None : Average prediction of the model in the bin. None if there are no entries in the bin.
class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata¶
Anomaly over Time metadata for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved. - backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details. - holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details. - backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details. - holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
- project_id (
Notes
Backtest/holdout status is a dict containing the following:
- training: string
: Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
: Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict : Start and end dates for the backtest/holdout training.
- validation: dict : Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None : The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None : The datetime of the end of the chart data (exclusive). None if chart data is not computed.
class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot¶
Anomaly over Time plot for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details. - calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
- predicted: float or None : Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None : Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string : Name of the calendar event.
- date: datetime : Date of the calendar event.
- series_id: string or None : The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview¶
Anomaly over Time plot preview for datetime model.
Added in version v2.25.
- Variables:
- project_id (
string
) – The project ID. - model_id (
string
) – The model ID. - prediction_threshold (
float
) – Only bins with predictions exceeding this threshold are returned in the response. - start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive). - end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive). - bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
- project_id (
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime : The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime : The datetime of the end of the bin (exclusive).
External scores and insights¶
class datarobot.ExternalScores¶
Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.
Added in version v2.21.
- Variables:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model - dataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised case - actual_value_column (
Optional[str]
) – For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset. - scores (
list
ofdicts in a form
of{'label': metric_name, 'value': score}
) – Scores on the dataset.
- project_id (
Examples
List all scores for a dataset
from datarobot.models.external_dataset_scores_insights.external_scores import ExternalScores
scores = ExternalScores.list(project_id, dataset_id=dataset_id)
classmethod create(project_id, model_id, dataset_id, actual_value_column=None)¶
Compute an external dataset insights for the specified model.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model for which insights is requested - dataset_id (
str
) – id of the dataset for which insights is requested - actual_value_column (
Optional[str]
) – actual values column label, for unsupervised projects only
- project_id (
- Returns: job – an instance of created async job
- Return type:
Job
classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)¶
Fetch external scores list for the project and optionally for model and dataset.
- Parameters:
- project_id (
str
) – id of the project - model_id (
Optional[str]
) – if specified, only scores for this model will be retrieved - dataset_id (
Optional[str]
) – if specified, only scores for this dataset will be retrieved - offset (
Optional[int]
) – this many results will be skipped, default: 0 - limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- project_id (
- Return type:
List
[ExternalScores
] - Returns:
A list of
External Scores
objects
classmethod get(project_id, model_id, dataset_id)¶
Retrieve external scores for the project, model and dataset.
- Parameters:
- project_id (
str
) – id of the project - model_id (
str
) – if specified, only scores for this model will be retrieved - dataset_id (
str
) – if specified, only scores for this dataset will be retrieved
- project_id (
- Return type:
ExternalScores
- Returns:
External Scores
object
class datarobot.ExternalLiftChart¶
Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.
Added in version v2.21.
LiftChartBin
is a dict containing the following:
actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.- Variables:
- dataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised case- bins (
list
ofdict
) – List of dicts with schema described asLiftChartBin
above.
classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶
Retrieve list of the lift charts for the model.
- Parameters:
- project_id (
str
) – id of the project - model_id (
str
) – if specified, only lift chart for this model will be retrieved - dataset_id (
Optional[str]
) – if specified, only lift chart for this dataset will be retrieved - offset (
Optional[int]
) – this many results will be skipped, default: 0 - limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- project_id (
- Return type:
List
[ExternalLiftChart
] - Returns:
A list of
ExternalLiftChart
objects
classmethod get(project_id, model_id, dataset_id)¶
Retrieve lift chart for the model and prediction dataset.
- Parameters:
- project_id (
str
) – project id - model_id (
str
) – model id - dataset_id (
str
) – prediction dataset id with target or actual value column for unsupervised case
- project_id (
- Return type:
ExternalLiftChart
- Returns:
ExternalLiftChart
object
class datarobot.ExternalRocCurve¶
ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.
Added in version v2.21.
- Variables:
- dataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised case - roc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve. - negative_class_predictions (
list
offloat
) – List of predictions from example for negative class - positive_class_predictions (
list
offloat
) – List of predictions from example for positive class
- dataset_id (
classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶
Retrieve list of the roc curves for the model.
- Parameters:
- project_id (
str
) – id of the project - model_id (
str
) – if specified, only lift chart for this model will be retrieved - dataset_id (
Optional[str]
) – if specified, only lift chart for this dataset will be retrieved - offset (
Optional[int]
) – this many results will be skipped, default: 0 - limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- project_id (
- Return type:
List
[ExternalRocCurve
] - Returns:
A list of
ExternalRocCurve
objects
classmethod get(project_id, model_id, dataset_id)¶
Retrieve ROC curve chart for the model and prediction dataset.
- Parameters:
- project_id (
str
) – project id - model_id (
str
) – model id - dataset_id (
str
) – prediction dataset id with target or actual value column for unsupervised case
- project_id (
- Return type:
ExternalRocCurve
- Returns:
ExternalRocCurve
object
Feature association¶
class datarobot.models.FeatureAssociationMatrix¶
Feature association statistics for a project.
Notes
Projects created prior to v2.17 are not supported by this feature.
- Variables:
- project_id (
str
) – Id of the associated project. - strengths (
list
ofdict
) – Pairwise statistics for the available features as structured below. - features (
list
ofdict
) – Metadata for each feature and where it goes in the matrix.
- project_id (
Examples
import datarobot as dr
# retrieve feature association matrix
feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id)
feature_association_matrix.strengths
feature_association_matrix.features
# retrieve feature association matrix for a metric, association type or a feature list
feature_association_matrix = dr.FeatureAssociationMatrix.get(
project_id,
metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN,
association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION,
featurelist_id=featurelist_id,
)
classmethod get(project_id, metric=None, association_type=None, featurelist_id=None)¶
Get feature association statistics.
- Parameters:
- project_id (
str
) – Id of the project that contains the requested associations. - metric (
enums.FEATURE_ASSOCIATION_METRIC
) – The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO. - association_type (
enums.FEATURE_ASSOCIATION_TYPE
) – The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION. - featurelist_id (
str
orNone
) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)
- project_id (
- Returns: Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.
- Return type:
FeatureAssociationMatrix
classmethod create(project_id, featurelist_id)¶
Compute the Feature Association Matrix for a Feature List
- Parameters:
- project_id (
str
) – The ID of the project that the feature list belongs to. - featurelist_id (
str
) – The ID of the feature list for which insights are requested.
- project_id (
- Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
- Return type:
StatusCheckJob
Feature association matrix details¶
class datarobot.models.FeatureAssociationMatrixDetails¶
Plotting details for a pair of passed features present in the feature association matrix.
Notes
Projects created prior to v2.17 are not supported by this feature.
- Variables:
- project_id (
str
) – Id of the project that contains the requested associations. - chart_type (
str
) – Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’ - values (
list
) – The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample. - features (
list
) – A list of the requested features, [feature1, feature2] - types (
list
) – The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC” - featurelist_id (
str
) – Id of the feature list to lookup FAM details for.
- project_id (
classmethod get(project_id, feature1, feature2, featurelist_id=None)¶
Get a sample of the actual values used to measure the association between a pair of features
Added in version v2.17.
- Parameters:
- project_id (
str
) – Id of the project of interest. - feature1 (
str
) – Feature name for the first feature of interest. - feature2 (
str
) – Feature name for the second feature of interest. - featurelist_id (
str
) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.
- project_id (
- Returns: The feature association plotting for provided pair of features.
- Return type:
FeatureAssociationMatrixDetails
Feature association featurelists¶
class datarobot.models.FeatureAssociationFeaturelists¶
Featurelists with feature association matrix availability flags for a project.
- Variables:
- project_id (
str
) – Id of the project that contains the requested associations. - featurelists (
list fo dict
) – The featurelists with the featurelist_id, title and the has_fam flag.
- project_id (
classmethod get(project_id)¶
Get featurelists with feature association status for each.
- Parameters:
project_id (
str
) – Id of the project of interest. - Returns: Featurelist with feature association status for each.
- Return type:
FeatureAssociationFeaturelists
Feature effects¶
class datarobot.models.FeatureEffects¶
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
- Variables:
- project_id (
string
) – The project that contains requested model - model_id (
string
) – The model to retrieve Feature Effects for - source (
string
) – The source to retrieve Feature Effects for - data_slice_id (
string
orNone
) – The slice to retrieve Feature Effects for; if None, retrieve unsliced data - feature_effects (
list
) – Feature Effects for every feature - backtest_index (
string
,required only for DatetimeModels,
) – The backtest index to retrieve Feature Effects for.
- project_id (
Notes
featureEffects
is a dict containing the following:
feature_name
(string) Name of the featurefeature_type
(string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetimefeature_impact_score
(float) Feature impact scoreweight_label
(string) optional, Weight label if configured for the project else nullpartial_dependence
(List) Partial dependence resultspredicted_vs_actual
(List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples
partial_dependence
is a dict containing the following:
is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) partial dependence results in the following format
data
is a list of dict containing the following:
label
(string) Contains label for categorical and numeric features as stringdependence
(float) Value of partial dependence
predicted_vs_actual
is a dict containing the following:
is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) pred vs actual results in the following format
data
is a list of dict containing the following:
label
(string) Contains label for categorical features for numeric features contains range or numeric value.bin
(List) optional, For numeric features contains labels for left and right bin limitspredicted
(float) Predicted valueactual
(float) Actual value. Actual value is null for unsupervised timeseries modelsrow_count
(int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.
classmethod from_server_data(data, *args, use_insights_format=False, **kwargs)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.
- Parameters:
- data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place - use_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/featureEffects/ URL to the format used in the legacy URL.
- data (
class datarobot.models.FeatureEffectMetadata¶
Feature Effect Metadata for model, contains status and available model sources.
Notes
source
is expected parameter to retrieve Feature Effect. One of provided sources
shall be used.
class datarobot.models.FeatureEffectMetadataDatetime¶
Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.
Notes
feature effect metadata per backtest
contains:
status
: str.backtest_index
: str.sources
: List[str].
source
is expected parameter to retrieve Feature Effect. One of provided sources
shall be used.
backtest_index
is expected parameter to submit compute request and retrieve Feature Effect.
One of provided backtest indexes shall be used.
- Variables:
data (
list[FeatureEffectMetadataDatetimePerBacktest]
) – List feature effect metadata per backtest
class datarobot.models.FeatureEffectMetadataDatetimePerBacktest¶
Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.
Payoff matrix¶
class datarobot.models.PayoffMatrix¶
Represents a Payoff Matrix, a costs/benefit scenario used for creating a profit curve.
- Variables:
- project_id (
str
) – id of the project with which the payoff matrix is associated. - id (
str
) – id of the payoff matrix. - name (
str
) – User-supplied label for the payoff matrix. - true_positive_value (
float
) – Cost or benefit of a true positive classification - true_negative_value (
float
) – Cost or benefit of a true negative classification - false_positive_value (
float
) – Cost or benefit of a false positive classification - false_negative_value (
float
) – Cost or benefit of a false negative classification
- project_id (
Examples
import datarobot as dr
# create a payoff matrix
payoff_matrix = dr.PayoffMatrix.create(
project_id,
name,
true_positive_value=100,
true_negative_value=10,
false_positive_value=0,
false_negative_value=-10,
)
# list available payoff matrices
payoff_matrices = dr.PayoffMatrix.list(project_id)
payoff_matrix = payoff_matrices[0]
classmethod create(project_id, name, true_positive_value=1, true_negative_value=1, false_positive_value=-1, false_negative_value=-1)¶
Create a payoff matrix associated with a specific project.
- Parameters:
project_id (
str
) – id of the project with which the payoff matrix will be associated - Returns: payoff_matrix – The newly created payoff matrix
- Return type:
PayoffMatrix
classmethod list(project_id)¶
Fetch all the payoff matrices for a project.
- Parameters:
project_id (
str
) – id of the project - Returns:
A list of
PayoffMatrix
objects - Return type:
List
ofPayoffMatrix
- Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
classmethod get(project_id, id)¶
Retrieve a specified payoff matrix.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - id (
str
) – id of the payoff matrix
- project_id (
- Return type:
PayoffMatrix
- Returns:
PayoffMatrix
object representing specifiedpayoff matrix
- Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
classmethod update(project_id, id, name, true_positive_value, true_negative_value, false_positive_value, false_negative_value)¶
Update (replace) a payoff matrix. Note that all data fields are required.
- Parameters:
- project_id (
str
) – id of the project to which the payoff matrix belongs - id (
str
) – id of the payoff matrix - name (
str
) – User-supplied label for the payoff matrix - true_positive_value (
float
) – True positive payoff value to use for the profit curve - true_negative_value (
float
) – True negative payoff value to use for the profit curve - false_positive_value (
float
) – False positive payoff value to use for the profit curve - false_negative_value (
float
) – False negative payoff value to use for the profit curve
- project_id (
- Returns: PayoffMatrix with updated values
- Return type:
payoff_matrix
- Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
classmethod delete(project_id, id)¶
Delete a specified payoff matrix.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - id (
str
) – id of the payoff matrix
- project_id (
- Returns: response – Empty response (204)
- Return type:
requests.Response
- Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values. - Return type:
TypeVar
(T
, bound= APIObject)
classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters:
- data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place - keep_attrs (
iterable
) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- data (
- Return type:
TypeVar
(T
, bound= APIObject)
Prediction explanations¶
class datarobot.PredictionExplanationsInitialization¶
Represents a prediction explanations initialization of a model.
- Variables:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model the prediction explanations initialization is for - prediction_explanations_sample (
list
ofdict
) – a small sample of prediction explanations that could be generated for the model
- project_id (
classmethod get(project_id, model_id)¶
Retrieve the prediction explanations initialization for a model.
Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model the prediction explanations initialization is for
- project_id (
- Returns: prediction_explanations_initialization – The queried instance.
- Return type:
PredictionExplanationsInitialization
- Raises: ClientError – If the project or model does not exist or the initialization has not been computed.
classmethod create(project_id, model_id)¶
Create a prediction explanations initialization for the specified model.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model for which initialization is requested
- project_id (
- Returns: job – an instance of created async job
- Return type:
Job
delete()¶
Delete this prediction explanations initialization.
class datarobot.PredictionExplanations¶
Represents prediction explanations metadata and provides access to computation results.
Examples
prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id)
for row in prediction_explanations.get_rows():
print(row) # row is an instance of PredictionExplanationsRow
- Variables:
- id (
str
) – id of the record and prediction explanations computation result - project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model the prediction explanations are for - dataset_id (
str
) – id of the prediction dataset prediction explanations were computed for - max_explanations (
int
) – maximum number of prediction explanations to supply per row of the dataset - threshold_low (
float
) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset - threshold_high (
float
) – the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset - num_columns (
int
) – the number of columns prediction explanations were computed for - finish_time (
float
) – timestamp referencing when computation for these prediction explanations finished - prediction_explanations_location (
str
) – where to retrieve the prediction explanations - source (
str
) – For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.
- id (
classmethod get(project_id, prediction_explanations_id)¶
Retrieve a specific prediction explanations metadata.
- Parameters:
- project_id (
str
) – id of the project the explanations belong to - prediction_explanations_id (
str
) – id of the prediction explanations
- project_id (
- Returns: prediction_explanations – The queried instance.
- Return type:
PredictionExplanations
classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)¶
Create prediction explanations for the specified dataset.
In order to create PredictionExplanations for a particular model and dataset, you must first:
- Compute feature impact for the model via
datarobot.Model.get_feature_impact()
- Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
- Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
and threshold_low
are optional filters applied to speed up
computation. When at least one is specified, only the selected outlier rows will have
prediction explanations computed. Rows are considered to be outliers if their predicted
value (in case of regression projects) or probability of being the positive
class (in case of classification projects) is less than threshold_low
or greater than
thresholdHigh
. If neither is specified, prediction explanations will be computed for
all rows.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model for which prediction explanations are requested - dataset_id (
str
) – id of the prediction dataset for which prediction explanations are requested - threshold_low (
Optional[float]
) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows. - threshold_high (
Optional[float]
) – the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows. - max_explanations (
Optional[int]
) – the maximum number of prediction explanations to supply per row of the dataset, default: 3. - mode (
PredictionExplanationsMode
, optional) – mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- project_id (
- Returns: job – an instance of created async job
- Return type:
Job
classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)¶
Create prediction explanations for the the dataset used to train the model.
This can be retrieved by calling dr.Model.get().featurelist_id
.
For OTV and timeseries projects, datetime_prediction_partition
is required and limited to the
first backtest (‘0’) or holdout (‘holdout’).
In order to create PredictionExplanations for a particular model and dataset, you must first:
- Compute Feature Impact for the model via
datarobot.Model.get_feature_impact()
/- Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
.- Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
.
threshold_high
and threshold_low
are optional filters applied to speed up
computation. When at least one is specified, only the selected outlier rows will have
prediction explanations computed. Rows are considered to be outliers if their predicted
value (in case of regression projects) or probability of being the positive
class (in case of classification projects) is less than threshold_low
or greater than
thresholdHigh
. If neither is specified, prediction explanations will be computed for
all rows.
- Parameters:
- project_id (
str
) – The ID of the project the model belongs to. - model_id (
str
) – The ID of the model for which prediction explanations are requested. - dataset_id (
str
) – The ID of the prediction dataset for which prediction explanations are requested. - threshold_low (
Optional[float]
) – The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows. - threshold_high (
Optional[float]
) – The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows. - max_explanations (
Optional[int]
) – The maximum number of prediction explanations to supply per row of the dataset (default: 3). - mode (
PredictionExplanationsMode
, optional) – The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1). - datetime_prediction_partition (
str
) – Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.
- project_id (
- Returns: job – An instance of created async job.
- Return type:
Job
classmethod list(project_id, model_id=None, limit=None, offset=None)¶
List of prediction explanations metadata for a specified project.
- Parameters:
- project_id (
str
) – id of the project to list prediction explanations for - model_id (
Optional[str]
) – if specified, only prediction explanations computed for this model will be returned - limit (
int
orNone
) – at most this many results are returned, default: no limit - offset (
int
orNone
) – this many results will be skipped, default: 0
- project_id (
- Returns: prediction_explanations
- Return type:
list[PredictionExplanations]
get_rows(batch_size=None, exclude_adjusted_predictions=True)¶
Retrieve prediction explanations rows.
- Parameters:
- batch_size (
int
orNone
, optional) – maximum number of prediction explanations rows to retrieve per request - exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- batch_size (
- Yields:
prediction_explanations_row (
PredictionExplanationsRow
) – Represents prediction explanations computed for a prediction row.
is_multiclass()¶
Whether these explanations are for a multiclass project or a non-multiclass project
is_unsupervised_clustering_or_multiclass()¶
Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set
get_number_of_explained_classes()¶
How many classes we attempt to explain for each row
get_all_as_dataframe(exclude_adjusted_predictions=True)¶
Retrieve all prediction explanations rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
- row_id : row id from prediction dataset
- prediction : the output of the model for this row
- adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
- class_0_label : a class level from the target (only appears for classification projects)
- class_0_probability : the probability that the target is this class (only appears for classification projects)
- class_1_label : a class level from the target (only appears for classification projects)
- class_1_probability : the probability that the target is this class (only appears for classification projects)
- explanation_0_feature : the name of the feature contributing to the prediction for this explanation
- explanation_0_feature_value : the value the feature took on
- explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
- explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
- explanation_0_strength : the amount this feature’s value affected the prediction
- …
- explanation_N_feature : the name of the feature contributing to the prediction for this explanation
- explanation_N_feature_value : the value the feature took on
- explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
- explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
- explanation_N_strength : the amount this feature’s value affected the prediction
For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.
- Parameters:
exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe. - Returns: dataframe
- Return type:
pandas.DataFrame
download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)¶
Save prediction explanations rows into CSV file.
- Parameters:
- filename (
str
orfile object
) – path or file object to save prediction explanations rows - encoding (
string
, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’ - exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- filename (
get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)¶
Get prediction explanations.
If you don’t want use a generator interface, you can access paginated prediction explanations directly.
- Parameters:
- limit (
int
orNone
) – the number of records to return, the server will use a (possibly finite) default if not specified - offset (
int
orNone
) – the number of records to skip, default 0 - exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- limit (
- Returns: prediction_explanations
- Return type:
PredictionExplanationsPage
delete()¶
Delete these prediction explanations.
class datarobot.models.prediction_explanations.PredictionExplanationsRow¶
Represents prediction explanations computed for a prediction row.
Notes
PredictionValue
contains:
label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
PredictionExplanation
contains:
label
: described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitative_strength
: a human-readable description of how strongly the feature affected the prediction. A large positive effect is denoted ‘+++’, medium ‘++’, small ‘+’, very small ‘<+’. A large negative effect is denoted ‘—’, medium ‘–’, small ‘-’, very small ‘<-‘.
- Variables:
- row_id (
int
) – which row thisPredictionExplanationsRow
describes - prediction (
float
) – the output of the model for this row - adjusted_prediction (
float
orNone
) – adjusted prediction value for projects that provide this information, None otherwise - prediction_values (
list
) – an array of dictionaries with a schema described asPredictionValue
- adjusted_prediction_values (
list
) – same as prediction_values but for adjusted predictions - prediction_explanations (
list
) – an array of dictionaries with a schema described asPredictionExplanation
- row_id (
class datarobot.models.prediction_explanations.PredictionExplanationsPage¶
Represents a batch of prediction explanations received by one request.
- Variables:
- id (
str
) – id of the prediction explanations computation result - data (
list[dict]
) – list of raw prediction explanations; each row corresponds to a row of the prediction dataset - count (
int
) – total number of rows computed - previous_page (
str
) – where to retrieve previous page of prediction explanations, None if current page is the first - next_page (
str
) – where to retrieve next page of prediction explanations, None if current page is the last - prediction_explanations_record_location (
str
) – where to retrieve the prediction explanations metadata - adjustment_method (
str
) – Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
- id (
classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)¶
Retrieve prediction explanations.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - prediction_explanations_id (
str
) – id of the prediction explanations - limit (
int
orNone
) – the number of records to return; the server will use a (possibly finite) default if not specified - offset (
int
orNone
) – the number of records to skip, default 0 - exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- project_id (
- Returns: prediction_explanations – The queried instance.
- Return type:
PredictionExplanationsPage
class datarobot.models.ShapMatrix¶
Represents SHAP based prediction explanations and provides access to score values.
- Variables:
- project_id (
str
) – id of the project the model belongs to - shap_matrix_id (
str
) – id of the generated SHAP matrix - model_id (
str
) – id of the model used to - dataset_id (
str
) – id of the prediction dataset SHAP values were computed for
- project_id (
Examples
import datarobot as dr
# request SHAP matrix calculation
shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id)
shap_matrix = shap_matrix_job.get_result_when_complete()
# list available SHAP matrices
shap_matrices = dr.ShapMatrix.list(project_id)
shap_matrix = shap_matrices[0]
# get SHAP matrix as dataframe
shap_matrix_values = shap_matrix.get_as_dataframe()
classmethod create(cls, project_id, model_id, dataset_id)¶
Calculate SHAP based prediction explanations against previously uploaded dataset.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - model_id (
str
) – id of the model for which prediction explanations are requested - dataset_id (
str
) – id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)
- project_id (
- Returns: job – The job computing the SHAP based prediction explanations
- Return type:
ShapMatrixJob
- Raises:
- ClientError – If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations
- ServerError – If the server responded with 5xx status
classmethod list(cls, project_id)¶
Fetch all the computed SHAP prediction explanations for a project.
- Parameters:
project_id (
str
) – id of the project - Returns:
A list of
ShapMatrix
objects - Return type:
List
ofShapMatrix
- Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
classmethod get(cls, project_id, id)¶
Retrieve the specific SHAP matrix.
- Parameters:
- project_id (
str
) – id of the project the model belongs to - id (
str
) – id of the SHAP matrix
- project_id (
- Return type:
ShapMatrix
object representing specified record
get_as_dataframe(read_timeout=60)¶
Retrieve SHAP matrix values as dataframe.
- Return type:
DataFrame
- Returns:
- dataframe (
pandas.DataFrame
) – A dataframe with SHAP scores -
read_timeout (
int (optional
, default60)
) – .. versionadded:: 2.29Wait this many seconds for the server to respond. * Raises: * datarobot.errors.ClientError – if the server responded with 4xx status. * datarobot.errors.ServerError – if the server responded with 5xx status.
- dataframe (
class datarobot.models.ClassListMode¶
Calculate prediction explanations for the specified classes in each row.
- Variables:
class_names (
list
) – List of class names that will be explained for each dataset row.
get_api_parameters(batch_route=False)¶
Get parameters passed in corresponding API call
- Parameters:
batch_route (
bool
) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters. - Return type:
dict
class datarobot.models.TopPredictionsMode¶
Calculate prediction explanations for the number of top predicted classes in each row.
- Variables:
num_top_classes (
int
) – Number of top predicted classes [1..10] that will be explained for each dataset row.
get_api_parameters(batch_route=False)¶
Get parameters passed in corresponding API call
- Parameters:
batch_route (
bool
) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters. - Return type:
dict
Rating table¶
class datarobot.models.RatingTable¶
Interface to modify and download rating tables.
- Variables:
- id (
str
) – The id of the rating table. - project_id (
str
) – The id of the project this rating table belongs to. - rating_table_name (
str
) – The name of the rating table. - original_filename (
str
) – The name of the file used to create the rating table. - parent_model_id (
str
) – The model id of the model the rating table was validated against. - model_id (
str
) – The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table. - model_job_id (
str
) – The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table. - validation_job_id (
str
) – The id of the created job to validate the rating table. Can be None if the rating table has not been validated. - validation_error (
str
) – Contains a description of any errors caused during validation.
- id (
classmethod from_server_data(data, should_warn=True, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters:
- data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place - should_warn (
bool
) – Whether or not to issue a warning if an invalid rating table is being retrieved.
- data (
- Return type:
RatingTable
classmethod get(project_id, rating_table_id)¶
Retrieve a single rating table
- Parameters:
- project_id (
str
) – The ID of the project the rating table is associated with. - rating_table_id (
str
) – The ID of the rating table
- project_id (
- Returns: rating_table – The queried instance
- Return type:
RatingTable
classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')¶
Uploads and validates a new rating table CSV
- Parameters:
- project_id (
str
) – id of the project the rating table belongs to - parent_model_id (
str
) – id of the model for which this rating table should be validated against - filename (
str
) – The path of the CSV file containing the modified rating table. - rating_table_name (
Optional[str]
) – A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.
- project_id (
- Returns: job – an instance of created async job
- Return type:
Job
- Raises:
- InputNotUnderstoodError – Raised if filename isn’t one of supported types.
- ClientError – Raised if parent_model_id is invalid.
download(filepath)¶
Download a csv file containing the contents of this rating table
- Parameters:
filepath (
str
) – The path at which to save the rating table file. - Return type:
None
rename(rating_table_name)¶
Renames a rating table to a different name.
- Parameters:
rating_table_name (
str
) – The new name to rename the rating table to. - Return type:
None
create_model()¶
Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.
- Returns: job – an instance of created async job
- Return type:
Job
- Raises:
- ClientError – Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested – Raised if creating model from a RatingTable that is already associated with a RatingTableModel
ROC curve¶
class datarobot.models.roc_curve.RocCurve¶
ROC curve data for model.
- Variables:
- source (
str
) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’. - roc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve. - negative_class_predictions (
list
offloat
) – List of predictions from example for negative class - positive_class_predictions (
list
offloat
) – List of predictions from example for positive class - source_model_id (
str
) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used - data_slice_id (
str
) – ID of the data slice this ROC curve represents.
- source (
classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶
Overwrite APIObject.from_server_data to handle roc curve data retrieved from either legacy URL or /insights/ new URL.
- Parameters:
- data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place. - keep_attrs (
iterable
) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None - use_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/RocCur/ URL to the format used in the legacy URL.
- data (
- Return type:
RocCurve
class datarobot.models.roc_curve.LabelwiseRocCurve¶
Labelwise ROC curve data for one label and one source.
- Variables:
- source (
str
) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’. - roc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve. - negative_class_predictions (
list
offloat
) – List of predictions from example for negative class - positive_class_predictions (
list
offloat
) – List of predictions from example for positive class - source_model_id (
str
) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used - label (
str
) – Label name for - kolmogorov_smirnov_metric (
float
) – Kolmogorov-Smirnov metric value for label - auc (
float
) – AUC metric value for label
- source (
Word Cloud¶
class datarobot.models.word_cloud.WordCloud¶
Word cloud data for the model.
Notes
WordCloudNgram
is a dict containing the following:
ngram
(str) Word or ngram value.coefficient
(float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.count
(int) Number of rows in the training sample where this ngram appears.frequency
(float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.is_stopword
(bool) True for ngrams that DataRobot evaluates as stopwords.class
(str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.
- Variables:
ngrams (
list
ofdict
) – List of dicts with schema described asWordCloudNgram
above.
most_frequent(top_n=5)¶
Return most frequent ngrams in the word cloud.
- Parameters:
top_n (
int
) – Number of ngrams to return - Returns: Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.
- Return type:
list
ofdict
most_important(top_n=5)¶
Return most important ngrams in the word cloud.
- Parameters:
top_n (
int
) – Number of ngrams to return - Returns: Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.
- Return type:
list
ofdict
ngrams_per_class()¶
Split ngrams per target class values. Useful for multiclass models.
- Returns: Dictionary in the format of (class label) -> (list of ngrams for that class)
- Return type:
dict