Developer documentation > API reference > Python API client > Modeling > DataRobot models

DataRobot Models¶

Generic models¶

class datarobot.models.GenericModel¶

GenericModel [ModelRecord] is the object which is returned from /modelRecords list route. Contains most generic model information.

Models¶

class datarobot.models.Model¶

A model trained on a project’s dataset capable of making predictions.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. See datetime partitioned project documentation for more information on duration strings.

Variables:
- id (str) – ID of the model.
- project_id (str) – ID of the project the model belongs to.
- processes (List[str]) – Processes used by the model.
- featurelist_name (str) – Name of the featurelist used by the model.
- featurelist_id (str) – ID of the featurelist used by the model.
- sample_pct (float or None) – Percentage of the project dataset used in model training. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date / training_end_date instead.
- training_row_count (int or None) – Number of rows of the project dataset used in model training. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date is used for training_row_count.
- training_duration (str or None) – For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the end date of the data used to train the model.
- model_type (str) – Type of model, for example ‘Nystroem Kernel SVM Regressor’.
- model_category (str) – Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models.
- is_frozen (bool) – Whether this model is a frozen model.
- is_n_clusters_dynamically_determined (bool) – (New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.
- blueprint_id (str) – ID of the blueprint used to build this model.
- metrics (dict) – Mapping from each metric to the model’s score for that metric.
- monotonic_increasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- n_clusters (int) – (New in version v2.27) Optional. Number of data clusters discovered by model.
- has_empty_clusters (bool) – (New in version v2.27) Optional. Whether clustering model produces empty clusters.
- supports_monotonic_constraints (bool) – Optional. Whether this model supports enforcing monotonic constraints.
- is_starred (bool) – Whether this model is marked as a starred model.
- prediction_threshold (float) – Binary classification projects only. Threshold used for predictions.
- prediction_threshold_read_only (bool) – Whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number (integer) – Model number assigned to the model.
- parent_model_id (str or None) – (New in version v2.20) ID of the model that tuning parameters are derived from.
- supports_composable_ml (bool or None) – (New in version v2.26) Whether this model is supported Composable ML.

classmethod get(project, model_id)¶

Retrieve a specific model.

Parameters:
- project (str) – Project ID.
- model_id (str) – ID of the model to retrieve.
Returns: model – Queried instance.
Return type: Model
Raises: ValueError – passed project parameter value is of not supported type

advanced_tune(params, description=None, grid_search_arguments=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
- params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description (str) – Human-readable string describing the newly advanced-tuned model
- grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: The created job to build the model
Return type: ModelJob

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- chunk_definition_id (str) – The Mongo ID for the chunking service.
- early_stopping_rounds (Optional[int]) – The number of chunks that, when no improvement has been shown, triggers the early stopping mechanism.
Returns: job – The model retraining job that is created.
Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Notes

To perform Cross Validation on a new model with new parameters, use train instead.

Returns: The created job to build the model
Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters:
- file_name (str) – File path where scoring code will be saved.
- source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.
Return type: None

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters: file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters: data (dict) – Correctly snake_cased keys and their values.
Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Overrides the inherited method since the model must _not_ recursively change casing

Parameters:
- data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs (list) – List of attribute namespaces like: [‘top.middle.bottom’], that should be kept even if their values are None

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns: A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys * parameter_name : (str) name of the parameter (unique per task, see below) * parameter_id : (str) opaque ID string uniquely identifying parameter * default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value) * current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range. * task_name : (str) name of the task that this parameter belongs to * constraints: (dict) see the notes below * vertex_id: (str) ID of vertex that this parameter belongs to * Return type: dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters: fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: Data for all available confusion charts for model.
Return type: list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters: data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
Returns: Data for all available model feature impacts. Or an empty list if not data found.
Return type: list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model lift charts. Or an empty list if no data found.
Return type: list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Data for all available model lift charts.
Return type: list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model residuals charts.
Return type: list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Return type: list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
- source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model ConfusionChart data
Return type: ConfusionChart
Raises: ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Return type: json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Notes

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
- partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns: cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.
Return type: dict

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- class_name1 (str) – One of the compared classes
- class_name2 (str) – Another compared class
Return type: json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters:
- fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
- offset (Optional[int]) – Number of items to skip.
- limit (Optional[int]) – Number of items to return.
Return type: json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects
Raises: ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns: feature_effect_metadata
Return type: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (str) – The source Feature Effects are retrieved for.
- class (str or None) – The class name Feature Effects are retrieved for.
Returns: The list of multiclass feature effects.
Return type: list
Raises: ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Only the top 1000 features are saved and can be returned.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
Returns: The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:
- featureImpacts - Feature Impact data as a dictionary. Each item is a dict with : keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
- shapBased - A boolean that indicates whether Feature Impact was calculated using : Shapley values.
- ranRedundancyDetection - A boolean that indicates whether redundant feature : identification was run while calculating this Feature Impact.
- rowCount - An integer or None that indicates the number of rows that was used to : calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
- count - An integer with the number of features under the featureImpacts.
- Return type: list or dict
- Raises:
- ClientError – If the feature impacts have not been computed.
- ValueError – If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns: features – The names of the features used in the model.
Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Return type: A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: Labelwise ROC Curve instances for source and all labels
Return type: list of LabelwiseRocCurve
Raises: ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
Returns: Model lift chart data
Return type: LiftChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns: The queried model missing report, sorted by missing count (DESCENDING order).
Return type: An iterable of MissingReportPerFeature

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns: The queried model blueprint chart.
Return type: ModelBlueprintChart

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns: All documents available for the model.
Return type: list of BlueprintTaskDocument

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns: Json representation of the blueprint stages.
Return type: BlueprintJson

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns: feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Return type: list of dict
Raises: ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
- projectId (str) – id of project containing the model
- modelId (str) – id of the model
- data (array) – list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form
- stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: feature_effects – The Feature Effects data.
Return type: FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
- source (string) – The source Feature Effects retrieve for.
- class (str or None) – The class name Feature Effects retrieve for.
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.
Returns: feature_effects – The list of multiclass feature effects data.
Return type: list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously.

Only the top 1000 features are saved and can be returned.

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs – Arbitrary keyword arguments passed to request_feature_impact.
Returns: feature_impacts – The feature impact data. See get_feature_impact for the exact schema.
Return type: list or dict

get_parameters()¶

Retrieve model parameters.

Returns: Model parameters for this model.
Return type: ModelParameters

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns: Model ParetoFront data
Return type: ParetoFront

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns: prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
Return type: dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
Returns: Model residuals chart data
Return type: ResidualsChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: Model ROC curve data
Return type: RocCurve
Raises:
- ClientError – If the insight is not available for this model
- (New in version v3.0) TypeError – If the underlying project type is multilabel
- ValueError – If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns: rulesets
Return type: list of Ruleset

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
- supportsBlending (bool) – whether the model supports blending
- supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
- hasWordCloud (bool) – whether the model has word cloud data available
- eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters (bool) – whether the model has parameters that can be retrieved
- supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
- supportsShap (bool) –
  
  (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based : feature Importance * supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns: url – Permanent static hyperlink to this model at leaderboard.
Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters: exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.
Returns: Word cloud data for the model.
Return type: WordCloud

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type: ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
- sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric (str) – For a single-metric list of results, specify that project metric.
- search_term (str) – If specified, only models containing the term in their name or processes are returned.
- featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
- families (List[str]) – If specified, only models belonging to selected families are returned.
- blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
- labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
- characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
- training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects:
  - number of rows in training subset For datetime partitioned projects:
  - , example P6Y0M0D
  - -- Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling).
  - Start/end date
  - Project settings
- number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit (int)
- offset (int)
Returns: generic_models
Return type: list of GenericModel

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns: job – the job generating the rulesets
Return type: Job

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns: status_id – A statusId of computation request.
Return type: str

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- compared_class_names (list(str)) – List of two classes to compare
Returns: status_id – A statusId of computation request.
Return type: str

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters:
- dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns: job – a Job representing external dataset insights computation
Return type: Job

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
Returns: status_id – A statusId of computation request.
Return type: str

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
Returns: job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
Return type: Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
- row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
Return type: Job or status_id
Raises: JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Notes

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
- sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)¶

Request per-class fairness insights be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of .
Returns: status_check_job – The returned object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters:
- dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against
- file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against
- file (IOBase, optional) – (New in v3.0) File to make predictions against
- include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
- predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
- explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
Returns: job – The job computing the predictions
Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters:
- source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters:
- data_subset (str) –
  
  data set definition to build predictions on. Choices are:
  - dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for : models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for : all data except training set. Not valid for models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  - dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading : the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
    - explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
    - max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.
  - Returns: an instance of created async job
  - Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters:
- sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id (Optional[str]) – The featurelist id
- training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
Returns: job – The created job that is retraining the model
Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters: threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session(grid_search_arguments=None)¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters: grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: Session for setting up and running Advanced Tuning on a model
Return type: AdvancedTuningSession

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first iteration. If set to True, the training process will be performed only for the first iteration. If set to False, training will continue until early stopping conditions are met or the maximum number of iterations is reached. The default value is False.
- chunk_definition_id (str) – The id of the chunk definition to be use for incremental training.
Returns: job – The created job that is retraining the model
Return type: ModelJob

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Notes

For datetime partitioned projects, see train_datetime instead.

Parameters:
- sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
- training_row_count (Optional[int]) – The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns: model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function
Return type: str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
- use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
- monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
Returns: job – the created job to build the model
Return type: ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
- data_stage_id (str) – The id of the data stage to use for training.
- training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
Returns: job – The created job that is retraining the model
Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

class datarobot.models.model.AdvancedTuningParamsType¶

class datarobot.models.model.BiasMitigationFeatureInfo¶

Prime models¶

class datarobot.models.PrimeModel¶

Represents a DataRobot Prime model approximating a parent model with downloadable code.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
- id (str) – the id of the model
- project_id (str) – the id of the project the model belongs to
- processes (List[str]) – the processes used by the model
- featurelist_name (str) – the name of the featurelist used by the model
- featurelist_id (str) – the id of the featurelist used by the model
- sample_pct (float) – the percentage of the project dataset used in training the model
- training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type (str) – what model this is, e.g. ‘DataRobot Prime’
- model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozen (bool) – whether this model is a frozen model
- blueprint_id (str) – the id of the blueprint used in this model
- metrics (dict) – a mapping from each metric to the model’s scores for that metric
- ruleset (Ruleset) – the ruleset used in the Prime model
- parent_model_id (str) – the id of the model that this Prime model approximates
- monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
- is_starred (bool) – whether this model is marked as starred
- prediction_threshold (float) – for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific prime model.

Parameters:
- project_id (str) – The id of the project the prime model belongs to
- model_id (str) – The model_id of the prime model to retrieve.
Returns: model – The queried instance.
Return type: PrimeModel

request_download_validation(language)¶

Prep and validate the downloadable code for the ruleset associated with this model.

Parameters: language (str) – the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages
Returns: job – A job tracking the code preparation and validation
Return type: Job

advanced_tune(params, description=None, grid_search_arguments=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
- params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description (str) – Human-readable string describing the newly advanced-tuned model
- grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: The created job to build the model
Return type: ModelJob

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- chunk_definition_id (str) – The Mongo ID for the chunking service.
- early_stopping_rounds (Optional[int]) – The number of chunks that, when no improvement has been shown, triggers the early stopping mechanism.
Returns: job – The model retraining job that is created.
Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Notes

To perform Cross Validation on a new model with new parameters, use train instead.

Returns: The created job to build the model
Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters:
- file_name (str) – File path where scoring code will be saved.
- source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.
Return type: None

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters: file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters: data (dict) – Correctly snake_cased keys and their values.
Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns: A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys * parameter_name : (str) name of the parameter (unique per task, see below) * parameter_id : (str) opaque ID string uniquely identifying parameter * default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value) * current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range. * task_name : (str) name of the task that this parameter belongs to * constraints: (dict) see the notes below * vertex_id: (str) ID of vertex that this parameter belongs to * Return type: dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters: fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: Data for all available confusion charts for model.
Return type: list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters: data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
Returns: Data for all available model feature impacts. Or an empty list if not data found.
Return type: list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model lift charts. Or an empty list if no data found.
Return type: list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Data for all available model lift charts.
Return type: list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model residuals charts.
Return type: list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Return type: list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
- source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model ConfusionChart data
Return type: ConfusionChart
Raises: ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Return type: json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Notes

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
- partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns: cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.
Return type: dict

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- class_name1 (str) – One of the compared classes
- class_name2 (str) – Another compared class
Return type: json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters:
- fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
- offset (Optional[int]) – Number of items to skip.
- limit (Optional[int]) – Number of items to return.
Return type: json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects
Raises: ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns: feature_effect_metadata
Return type: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (str) – The source Feature Effects are retrieved for.
- class (str or None) – The class name Feature Effects are retrieved for.
Returns: The list of multiclass feature effects.
Return type: list
Raises: ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Only the top 1000 features are saved and can be returned.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
Returns: The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:
- featureImpacts - Feature Impact data as a dictionary. Each item is a dict with : keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
- shapBased - A boolean that indicates whether Feature Impact was calculated using : Shapley values.
- ranRedundancyDetection - A boolean that indicates whether redundant feature : identification was run while calculating this Feature Impact.
- rowCount - An integer or None that indicates the number of rows that was used to : calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
- count - An integer with the number of features under the featureImpacts.
- Return type: list or dict
- Raises:
- ClientError – If the feature impacts have not been computed.
- ValueError – If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns: features – The names of the features used in the model.
Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Return type: A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: Labelwise ROC Curve instances for source and all labels
Return type: list of LabelwiseRocCurve
Raises: ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
Returns: Model lift chart data
Return type: LiftChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns: The queried model missing report, sorted by missing count (DESCENDING order).
Return type: An iterable of MissingReportPerFeature

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns: The queried model blueprint chart.
Return type: ModelBlueprintChart

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns: All documents available for the model.
Return type: list of BlueprintTaskDocument

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns: Json representation of the blueprint stages.
Return type: BlueprintJson

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns: feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Return type: list of dict
Raises: ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
- projectId (str) – id of project containing the model
- modelId (str) – id of the model
- data (array) – list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form
- stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: feature_effects – The Feature Effects data.
Return type: FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
- source (string) – The source Feature Effects retrieve for.
- class (str or None) – The class name Feature Effects retrieve for.
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.
Returns: feature_effects – The list of multiclass feature effects data.
Return type: list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously.

Only the top 1000 features are saved and can be returned.

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs – Arbitrary keyword arguments passed to request_feature_impact.
Returns: feature_impacts – The feature impact data. See get_feature_impact for the exact schema.
Return type: list or dict

get_parameters()¶

Retrieve model parameters.

Returns: Model parameters for this model.
Return type: ModelParameters

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns: Model ParetoFront data
Return type: ParetoFront

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns: prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
Return type: dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
Returns: Model residuals chart data
Return type: ResidualsChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: Model ROC curve data
Return type: RocCurve
Raises:
- ClientError – If the insight is not available for this model
- (New in version v3.0) TypeError – If the underlying project type is multilabel
- ValueError – If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns: rulesets
Return type: list of Ruleset

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
- supportsBlending (bool) – whether the model supports blending
- supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
- hasWordCloud (bool) – whether the model has word cloud data available
- eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters (bool) – whether the model has parameters that can be retrieved
- supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
- supportsShap (bool) –
  
  (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based : feature Importance * supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns: url – Permanent static hyperlink to this model at leaderboard.
Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters: exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.
Returns: Word cloud data for the model.
Return type: WordCloud

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type: ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
- sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric (str) – For a single-metric list of results, specify that project metric.
- search_term (str) – If specified, only models containing the term in their name or processes are returned.
- featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
- families (List[str]) – If specified, only models belonging to selected families are returned.
- blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
- labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
- characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
- training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects:
  - number of rows in training subset For datetime partitioned projects:
  - , example P6Y0M0D
  - -- Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling).
  - Start/end date
  - Project settings
- number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit (int)
- offset (int)
Returns: generic_models
Return type: list of GenericModel

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns: status_id – A statusId of computation request.
Return type: str

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- compared_class_names (list(str)) – List of two classes to compare
Returns: status_id – A statusId of computation request.
Return type: str

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters:
- dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns: job – a Job representing external dataset insights computation
Return type: Job

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
Returns: status_id – A statusId of computation request.
Return type: str

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
Returns: job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
Return type: Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
- row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
Return type: Job or status_id
Raises: JobAlreadyRequested – If the feature impacts have already been requested.

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)¶

Request per-class fairness insights be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of .
Returns: status_check_job – The returned object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters:
- dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against
- file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against
- file (IOBase, optional) – (New in v3.0) File to make predictions against
- include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
- predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
- explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
Returns: job – The job computing the predictions
Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters:
- source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters:
- data_subset (str) –
  
  data set definition to build predictions on. Choices are:
  - dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for : models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for : all data except training set. Not valid for models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  - dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading : the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
    - explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
    - max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.
  - Returns: an instance of created async job
  - Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters:
- sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id (Optional[str]) – The featurelist id
- training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
Returns: job – The created job that is retraining the model
Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters: threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session(grid_search_arguments=None)¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters: grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: Session for setting up and running Advanced Tuning on a model
Return type: AdvancedTuningSession

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first iteration. If set to True, the training process will be performed only for the first iteration. If set to False, training will continue until early stopping conditions are met or the maximum number of iterations is reached. The default value is False.
- chunk_definition_id (str) – The id of the chunk definition to be use for incremental training.
Returns: job – The created job that is retraining the model
Return type: ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
- data_stage_id (str) – The id of the data stage to use for training.
- training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
Returns: job – The created job that is retraining the model
Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Prime files¶

class datarobot.models.PrimeFile¶

Represents an executable file available for download of the code for a DataRobot Prime model

Variables:
- id (str) – the id of the PrimeFile
- project_id (str) – the id of the project this PrimeFile belongs to
- parent_model_id (str) – the model being approximated by this PrimeFile
- model_id (str) – the prime model this file represents
- ruleset_id (int) – the ruleset being used in this PrimeFile
- language (str) – the language of the code in this file - see enums.LANGUAGE for possibilities
- is_valid (bool) – whether the code passed basic validation

download(filepath)¶

Download the code and save it to a file

Parameters: filepath (string) – the location to save the file to
Return type: None

Blender models¶

class datarobot.models.BlenderModel¶

Represents blender model that combines prediction results from other models.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
- id (str) – the id of the model
- project_id (str) – the id of the project the model belongs to
- processes (List[str]) – the processes used by the model
- featurelist_name (str) – the name of the featurelist used by the model
- featurelist_id (str) – the id of the featurelist used by the model
- sample_pct (float) – the percentage of the project dataset used in training the model
- training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type (str) – what model this is, e.g. ‘DataRobot Prime’
- model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozen (bool) – whether this model is a frozen model
- blueprint_id (str) – the id of the blueprint used in this model
- metrics (dict) – a mapping from each metric to the model’s scores for that metric
- model_ids (List[str]) – List of model ids used in blender
- blender_method (str) – Method used to blend results from underlying models
- monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
- is_starred (bool) – whether this model marked as starred
- prediction_threshold (float) – for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number (integer) – model number assigned to a model
- parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific blender.

Parameters:
- project_id (str) – The project’s id.
- model_id (str) – The model_id of the leaderboard item to retrieve.
Returns: model – The queried instance.
Return type: BlenderModel

advanced_tune(params, description=None, grid_search_arguments=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
- params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description (str) – Human-readable string describing the newly advanced-tuned model
- grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: The created job to build the model
Return type: ModelJob

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- chunk_definition_id (str) – The Mongo ID for the chunking service.
- early_stopping_rounds (Optional[int]) – The number of chunks that, when no improvement has been shown, triggers the early stopping mechanism.
Returns: job – The model retraining job that is created.
Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Notes

To perform Cross Validation on a new model with new parameters, use train instead.

Returns: The created job to build the model
Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters:
- file_name (str) – File path where scoring code will be saved.
- source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.
Return type: None

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters: file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters: data (dict) – Correctly snake_cased keys and their values.
Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Overrides the inherited method since the model must _not_ recursively change casing

Parameters:
- data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs (list) – List of attribute namespaces like: [‘top.middle.bottom’], that should be kept even if their values are None

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns: A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys * parameter_name : (str) name of the parameter (unique per task, see below) * parameter_id : (str) opaque ID string uniquely identifying parameter * default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value) * current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range. * task_name : (str) name of the task that this parameter belongs to * constraints: (dict) see the notes below * vertex_id: (str) ID of vertex that this parameter belongs to * Return type: dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters: fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: Data for all available confusion charts for model.
Return type: list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters: data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
Returns: Data for all available model feature impacts. Or an empty list if not data found.
Return type: list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model lift charts. Or an empty list if no data found.
Return type: list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Data for all available model lift charts.
Return type: list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model residuals charts.
Return type: list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Return type: list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
- source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model ConfusionChart data
Return type: ConfusionChart
Raises: ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Return type: json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Notes

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
- partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns: cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.
Return type: dict

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- class_name1 (str) – One of the compared classes
- class_name2 (str) – Another compared class
Return type: json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters:
- fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
- offset (Optional[int]) – Number of items to skip.
- limit (Optional[int]) – Number of items to return.
Return type: json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects
Raises: ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns: feature_effect_metadata
Return type: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (str) – The source Feature Effects are retrieved for.
- class (str or None) – The class name Feature Effects are retrieved for.
Returns: The list of multiclass feature effects.
Return type: list
Raises: ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Only the top 1000 features are saved and can be returned.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
Returns: The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:
- featureImpacts - Feature Impact data as a dictionary. Each item is a dict with : keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
- shapBased - A boolean that indicates whether Feature Impact was calculated using : Shapley values.
- ranRedundancyDetection - A boolean that indicates whether redundant feature : identification was run while calculating this Feature Impact.
- rowCount - An integer or None that indicates the number of rows that was used to : calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
- count - An integer with the number of features under the featureImpacts.
- Return type: list or dict
- Raises:
- ClientError – If the feature impacts have not been computed.
- ValueError – If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns: features – The names of the features used in the model.
Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Return type: A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: Labelwise ROC Curve instances for source and all labels
Return type: list of LabelwiseRocCurve
Raises: ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
Returns: Model lift chart data
Return type: LiftChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns: The queried model missing report, sorted by missing count (DESCENDING order).
Return type: An iterable of MissingReportPerFeature

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns: The queried model blueprint chart.
Return type: ModelBlueprintChart

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns: All documents available for the model.
Return type: list of BlueprintTaskDocument

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns: Json representation of the blueprint stages.
Return type: BlueprintJson

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns: feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Return type: list of dict
Raises: ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
- projectId (str) – id of project containing the model
- modelId (str) – id of the model
- data (array) – list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form
- stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: feature_effects – The Feature Effects data.
Return type: FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
- source (string) – The source Feature Effects retrieve for.
- class (str or None) – The class name Feature Effects retrieve for.
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.
Returns: feature_effects – The list of multiclass feature effects data.
Return type: list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously.

Only the top 1000 features are saved and can be returned.

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs – Arbitrary keyword arguments passed to request_feature_impact.
Returns: feature_impacts – The feature impact data. See get_feature_impact for the exact schema.
Return type: list or dict

get_parameters()¶

Retrieve model parameters.

Returns: Model parameters for this model.
Return type: ModelParameters

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns: Model ParetoFront data
Return type: ParetoFront

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns: prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
Return type: dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
Returns: Model residuals chart data
Return type: ResidualsChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: Model ROC curve data
Return type: RocCurve
Raises:
- ClientError – If the insight is not available for this model
- (New in version v3.0) TypeError – If the underlying project type is multilabel
- ValueError – If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns: rulesets
Return type: list of Ruleset

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
- supportsBlending (bool) – whether the model supports blending
- supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
- hasWordCloud (bool) – whether the model has word cloud data available
- eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters (bool) – whether the model has parameters that can be retrieved
- supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
- supportsShap (bool) –
  
  (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based : feature Importance * supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns: url – Permanent static hyperlink to this model at leaderboard.
Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters: exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.
Returns: Word cloud data for the model.
Return type: WordCloud

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type: ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
- sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric (str) – For a single-metric list of results, specify that project metric.
- search_term (str) – If specified, only models containing the term in their name or processes are returned.
- featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
- families (List[str]) – If specified, only models belonging to selected families are returned.
- blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
- labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
- characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
- training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects:
  - number of rows in training subset For datetime partitioned projects:
  - , example P6Y0M0D
  - -- Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling).
  - Start/end date
  - Project settings
- number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit (int)
- offset (int)
Returns: generic_models
Return type: list of GenericModel

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns: job – the job generating the rulesets
Return type: Job

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns: status_id – A statusId of computation request.
Return type: str

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- compared_class_names (list(str)) – List of two classes to compare
Returns: status_id – A statusId of computation request.
Return type: str

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters:
- dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns: job – a Job representing external dataset insights computation
Return type: Job

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
Returns: status_id – A statusId of computation request.
Return type: str

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
Returns: job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
Return type: Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
- row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
Return type: Job or status_id
Raises: JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Notes

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
- sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)¶

Request per-class fairness insights be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of .
Returns: status_check_job – The returned object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters:
- dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against
- file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against
- file (IOBase, optional) – (New in v3.0) File to make predictions against
- include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
- predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
- explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
Returns: job – The job computing the predictions
Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters:
- source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters:
- data_subset (str) –
  
  data set definition to build predictions on. Choices are:
  - dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for : models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for : all data except training set. Not valid for models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  - dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading : the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
    - explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
    - max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.
  - Returns: an instance of created async job
  - Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters:
- sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id (Optional[str]) – The featurelist id
- training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
Returns: job – The created job that is retraining the model
Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters: threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session(grid_search_arguments=None)¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters: grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: Session for setting up and running Advanced Tuning on a model
Return type: AdvancedTuningSession

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first iteration. If set to True, the training process will be performed only for the first iteration. If set to False, training will continue until early stopping conditions are met or the maximum number of iterations is reached. The default value is False.
- chunk_definition_id (str) – The id of the chunk definition to be use for incremental training.
Returns: job – The created job that is retraining the model
Return type: ModelJob

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Notes

For datetime partitioned projects, see train_datetime instead.

Parameters:
- sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
- training_row_count (Optional[int]) – The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns: model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function
Return type: str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
- use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
- monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
Returns: job – the created job to build the model
Return type: ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
- data_stage_id (str) – The id of the data stage to use for training.
- training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
Returns: job – The created job that is retraining the model
Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Datetime models¶

class datarobot.models.DatetimeModel¶

Represents a model from a datetime partitioned project

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Variables:
- id (str) – the id of the model
- project_id (str) – the id of the project the model belongs to
- processes (List[str]) – the processes used by the model
- featurelist_name (str) – the name of the featurelist used by the model
- featurelist_id (str) – the id of the featurelist used by the model
- sample_pct (float) – the percentage of the project dataset used in training the model
- training_row_count (int or None) – If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
- training_duration (str or None) – If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- time_window_sample_pct (int or None) – An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
- sampling_method (str or None) – (New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how time_window_sample_pct has been applied).
- model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen (bool) – whether this model is a frozen model
- blueprint_id (str) – the id of the blueprint used in this model
- metrics (dict) – a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
- backtests (list of dict) – describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
- data_selection_method (str) – which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
- training_info (dict) – describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
- holdout_score (float or None) – the score against the holdout, if available and the holdout is unlocked, according to the project metric.
- holdout_status (string or None) – the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
- monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
- is_starred (bool) – whether this model marked as starred
- prediction_threshold (float) – for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- effective_feature_derivation_window_start (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.
- effective_feature_derivation_window_end (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.
- forecast_window_start (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.
- forecast_window_end (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.
- windows_basis_unit (str or None) – (New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.
- model_number (integer) – model number assigned to a model
- parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.
- is_n_clusters_dynamically_determined (Optional[bool]) – (New in version 2.27) if True, indicates that model determines number of clusters automatically.
- n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

classmethod get(project, model_id)¶

Retrieve a specific datetime model.

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:
- project (str) – the id of the project the model belongs to
- model_id (str) – the id of the model to retrieve
Returns: model – the model
Return type: DatetimeModel

score_backtests()¶

Compute the scores for all available backtests.

Some backtests may be unavailable if the model is trained into their validation data.

Returns: job – a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.
Return type: Job

cross_validate()¶

Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.

Return type: NoReturn

get_cross_validation_scores(partition=None, metric=None)¶

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

Return type: NoReturn

request_training_predictions(data_subset, *args, **kwargs)¶

Start a job that builds training predictions.

Parameters: data_subset (str) –

data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all : backtest validation folds. Requires the model to have successfully scored all backtests.
- Returns: an instance of created async job
- Return type: Job

get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶

Retrieve series accuracy results for the specified model as a pandas.DataFrame.

Parameters:
- offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.
- limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.
- metric (Optional[str]) – The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_value (Optional[str]) – If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
- reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.
Returns: A pandas.DataFrame with the Series Accuracy for the specified model.
Return type: data

download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶

Save series accuracy results for the specified model in a CSV file.

Parameters:
- filename (str or file object) – The path or file object to save the data to.
- encoding (Optional[str]) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.
- limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.
- metric (Optional[str]) – The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_value (Optional[str]) – If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
- reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)¶

Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.

Parameters:
- offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.
- limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.
- order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
- reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.
Returns: A dictionary of the series in the dataset with their associated cluster
Return type: Dict
Raises:
- ValueError – If the model type returns an unsupported insight
- ClientError – If the insight is not available for this model

compute_series_accuracy(compute_all_series=False)¶

Compute series accuracy for the model.

Parameters: compute_all_series (Optional[bool]) – Calculate accuracy for all series or only first 1000.
Returns: an instance of the created async job
Return type: Job

retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)¶

Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- featurelist_id (Optional[str]) – The ID of the featurelist to use.
- training_row_count (Optional[int]) – The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.
- time_window_sample_pct (Optional[int]) – An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.
- training_duration (Optional[str]) – A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.
- training_start_date (Optional[str]) – A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.
- training_end_date (Optional[str]) – A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
- n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
Returns: job – The created job that is retraining the model
Return type: ModelJob

get_feature_effect_metadata()¶

Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.

Each backtest is available for training and validation
If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
Feature Effect for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Returns: feature_effect_metadata
Return type: FeatureEffectMetadataDatetime

request_feature_effect(backtest_index, data_slice_filter=)¶

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

See get_feature_effect_metadata for retrieving information of backtest_index.

Parameters: backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns: job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature effect have already been requested.

get_feature_effect(source, backtest_index, data_slice_filter=)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
- source (string) – The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects
Raises: ClientError – If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, backtest_index, max_wait=600, data_slice_filter=)¶

Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature effect job to complete before erroring
- source (string) – The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects

request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)¶

Request feature effects to be computed for the multiclass datetime model.

See get_feature_effect for more information on the result of the job.

Parameters:
- backtest_index (str) – The backtest index to use for Feature Effects calculation.
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- features (list or None) – The list of features to use to calculate Feature Effects.
Returns: job – A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
Return type: Job

get_feature_effects_multiclass(backtest_index, source='training', class_=None)¶

Retrieve Feature Effects for the multiclass datetime model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- backtest_index (str) – The backtest index to retrieve Feature Effects for.
- source (str) – The source Feature Effects are retrieved for.
- class (str or None) – The class name Feature Effects are retrieved for.
Returns: The list of multiclass Feature Effects.
Return type: list
Raises: ClientError – If the Feature Effects have not been computed or source is not valid value.

get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.

Parameters:
- backtest_index (str) – The backtest index to retrieve Feature Effects for.
- source (string) – The source from which Feature Effects are retrieved.
- class (str or None) – The class name Feature Effects retrieve for.
- row_count (int) – The number of rows used from the dataset for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
- max_wait (Optional[int]) – The maximum time to wait for a requested feature effect job to complete before erroring.
Returns: feature_effects – The list of multiclass feature effects data.
Return type: list of FeatureEffectsMulticlass

calculate_prediction_intervals(prediction_intervals_size)¶

Calculate prediction intervals for this DatetimeModel for the specified size.

Added in version v2.19.

Parameters: prediction_intervals_size (int) – The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.
Returns: job – a Job tracking the prediction intervals computation
Return type: Job

get_calculated_prediction_intervals(offset=None, limit=None)¶

Retrieve a list of already-calculated prediction intervals for this model

Added in version v2.19.

Parameters:
- offset (Optional[int]) – If provided, this many results will be skipped
- limit (Optional[int]) – If provided, at most this many results will be returned. If not provided, will return at most 100 results.
Returns: A descending-ordered list of already-calculated prediction interval sizes
Return type: list[int]

compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)¶

Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models
- forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models
Returns: job – a Job tracking the datetime trend plots computation
Return type: Job

Notes

Forecast distance specifies the number of time steps between the predicted point and the origin point.
For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.
Maximum 100 forecast distances can be requested for calculation in time series supervised projects.

get_accuracy_over_time_plots_metadata(forecast_distance=None)¶

Retrieve Accuracy over Time plots metadata for this model.

Added in version v2.25.

Parameters: forecast_distance (Optional[int]) – Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
Returns: metadata – a AccuracyOverTimePlotsMetadata representing Accuracy over Time plots metadata
Return type: AccuracyOverTimePlotsMetadata

get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Accuracy over Time plots for this model.

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
- max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
- start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.
- end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a AccuracyOverTimePlot representing Accuracy over Time plot
Return type: AccuracyOverTimePlot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time.png")

get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)¶

Retrieve Accuracy over Time preview plots for this model.

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a AccuracyOverTimePlotPreview representing Accuracy over Time plot preview
Return type: AccuracyOverTimePlotPreview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time_preview.png")

get_forecast_vs_actual_plots_metadata()¶

Retrieve Forecast vs Actual plots metadata for this model.

Added in version v2.25.

Returns: metadata – a ForecastVsActualPlotsMetadata representing Forecast vs Actual plots metadata
Return type: ForecastVsActualPlotsMetadata

get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Forecast vs Actual plots for this model.

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.
- forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
- max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
- start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.
- end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a ForecastVsActualPlot representing Forecast vs Actual plot
Return type: ForecastVsActualPlot

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot()
df = pd.DataFrame.from_dict(plot.bins)

# As an example, get the forecasts for the 10th point
forecast_point_index = 10
# Pad the forecasts for plotting. The forecasts length must match the df length
forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index]
forecasts = forecasts + [None] * (len(df) - len(forecasts))

plt.plot(df.start_date, df.actual, label="Actual")
plt.plot(df.start_date, forecasts, label="Forecast")
forecast_point = df.start_date[forecast_point_index]
plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point))
plt.legend()
plt.savefig("forecast_vs_actual.png")

get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶

Retrieve Forecast vs Actual preview plots for this model.

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a ForecastVsActualPlotPreview representing Forecast vs Actual plot preview
Return type: ForecastVsActualPlotPreview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("forecast_vs_actual_preview.png")

get_anomaly_over_time_plots_metadata()¶

Retrieve Anomaly over Time plots metadata for this model.

Added in version v2.25.

Returns: metadata – a AnomalyOverTimePlotsMetadata representing Anomaly over Time plots metadata
Return type: AnomalyOverTimePlotsMetadata

get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Anomaly over Time plots for this model.

Added in version v2.25.

Parameters:
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
- max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
- start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.
- end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a AnomalyOverTimePlot representing Anomaly over Time plot
Return type: AnomalyOverTimePlot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", "predicted").get_figure()
figure.savefig("anomaly_over_time.png")

get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶

Retrieve Anomaly over Time preview plots for this model.

Added in version v2.25.

Parameters:
- prediction_threshold (Optional[float]) – Only bins with predictions exceeding this threshold will be returned in the response.
- backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
- source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
- series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.
Returns: plot – a AnomalyOverTimePlotPreview representing Anomaly over Time plot preview
Return type: AnomalyOverTimePlotPreview

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01)
df = pd.DataFrame.from_dict(plot.bins)
x = pd.date_range(
    plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0]
)
plt.plot(x, [0] * len(x), label="Date range")
plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly")
plt.yticks([])
plt.legend()
plt.savefig("anomaly_over_time_preview.png")

initialize_anomaly_assessment(backtest, source, series_id=None)¶

Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.

Parameters:
- backtest (int starting with 0 or "holdout") – The backtest to compute insight for.
- source ("training" or "validation") – The source to compute insight for.
- series_id (string) – Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”
Return type: AnomalyAssessmentRecord

get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶

Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.

Records can be filtered by the data backtest, source and series_id. The results can be limited.

Added in version v2.25.

Parameters:
- backtest (int starting with 0 or "holdout") – The backtest of the data to filter records by.
- source ("training" or "validation") – The source of the data to filter records by.
- series_id (string) – The series id to filter records by.
- limit (Optional[int])
- offset (Optional[int])
- with_data_only (Optional[bool]) – Whether to return only records with preview and explanations available. False by default.
Returns: records – a AnomalyAssessmentRecord representing Anomaly Assessment Record
Return type: list of AnomalyAssessmentRecord

get_feature_impact(with_metadata=False, backtest=None, data_slice_filter=)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Only the top 1000 features are saved and can be returned.

Else where this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:
- featureImpacts - Feature Impact data as a dictionary. Each item is a dict with : keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
- shapBased - A boolean that indicates whether Feature Impact was calculated using : Shapley values.
- ranRedundancyDetection - A boolean that indicates whether redundant feature : identification was run while calculating this Feature Impact.
- rowCount - An integer or None that indicates the number of rows that was used to : calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
- count - An integer with the number of features under the featureImpacts.
- Return type: list or dict
- Raises: ClientError – If the feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False, backtest=None, data_slice_filter=)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
- row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: job – A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None, data_slice_filter=)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
- row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- backtest (str) – Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: feature_impacts – The feature impact data. See get_feature_impact for the exact schema.
Return type: list or dict

request_lift_chart(source=None, backtest_index=None, data_slice_filter=)¶

(New in version v3.4) Request the model Lift Chart for the specified backtest data slice.

Parameters:
- source (str) – (Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_lift_chart will raise a ValueError.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

get_lift_chart(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=)¶

(New in version v3.4) Retrieve the model Lift chart for the specified backtest and data slice.

Parameters:
- source (str) – (Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
Returns: Model lift chart data
Return type: LiftChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

request_roc_curve(source=None, backtest_index=None, data_slice_filter=)¶

(New in version v3.4) Request the binary model Roc Curve for the specified backtest and data slice.

Parameters:
- source (str) – (Deprecated in version v3.4) Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_roc_curve will raise a ValueError.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

get_roc_curve(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=)¶

(New in version v3.4) Retrieve the ROC curve for a binary model for the specified backtest and data slice.

Parameters:
- source (str) – (Deprecated in version v3.4) ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the data slice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: Model ROC curve data
Return type: RocCurve
Raises:
- ClientError – If the insight is not available for this model
- TypeError – If the underlying project type is multilabel
- ValueError – If data_slice_filter passed as None

advanced_tune(params, description=None, grid_search_arguments=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
- params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description (str) – Human-readable string describing the newly advanced-tuned model
- grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: The created job to build the model
Return type: ModelJob

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- chunk_definition_id (str) – The Mongo ID for the chunking service.
- early_stopping_rounds (Optional[int]) – The number of chunks that, when no improvement has been shown, triggers the early stopping mechanism.
Returns: job – The model retraining job that is created.
Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters:
- file_name (str) – File path where scoring code will be saved.
- source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.
Return type: None

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters: file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters: data (dict) – Correctly snake_cased keys and their values.
Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns: A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys * parameter_name : (str) name of the parameter (unique per task, see below) * parameter_id : (str) opaque ID string uniquely identifying parameter * default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value) * current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range. * task_name : (str) name of the task that this parameter belongs to * constraints: (dict) see the notes below * vertex_id: (str) ID of vertex that this parameter belongs to * Return type: dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters: fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: Data for all available confusion charts for model.
Return type: list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters: data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
Returns: Data for all available model feature impacts. Or an empty list if not data found.
Return type: list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model lift charts. Or an empty list if no data found.
Return type: list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Data for all available model lift charts.
Return type: list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model residuals charts.
Return type: list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Return type: list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
- source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model ConfusionChart data
Return type: ConfusionChart
Raises: ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Return type: json

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- class_name1 (str) – One of the compared classes
- class_name2 (str) – Another compared class
Return type: json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters:
- fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
- offset (Optional[int]) – Number of items to skip.
- limit (Optional[int]) – Number of items to return.
Return type: json

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns: features – The names of the features used in the model.
Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Return type: A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: Labelwise ROC Curve instances for source and all labels
Return type: list of LabelwiseRocCurve
Raises: ClientError – If the insight is not available for this model

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns: The queried model missing report, sorted by missing count (DESCENDING order).
Return type: An iterable of MissingReportPerFeature

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns: The queried model blueprint chart.
Return type: ModelBlueprintChart

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns: All documents available for the model.
Return type: list of BlueprintTaskDocument

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns: Json representation of the blueprint stages.
Return type: BlueprintJson

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns: feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Return type: list of dict
Raises: ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
- projectId (str) – id of project containing the model
- modelId (str) – id of the model
- data (array) – list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form
- stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations (int) – the number of estimators or iterations trained by the model

get_parameters()¶

Retrieve model parameters.

Returns: Model parameters for this model.
Return type: ModelParameters

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns: Model ParetoFront data
Return type: ParetoFront

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns: prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
Return type: dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
Returns: Model residuals chart data
Return type: ResidualsChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns: rulesets
Return type: list of Ruleset

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
- supportsBlending (bool) – whether the model supports blending
- supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
- hasWordCloud (bool) – whether the model has word cloud data available
- eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters (bool) – whether the model has parameters that can be retrieved
- supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
- supportsShap (bool) –
  
  (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based : feature Importance * supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns: url – Permanent static hyperlink to this model at leaderboard.
Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters: exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.
Returns: Word cloud data for the model.
Return type: WordCloud

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type: ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
- sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric (str) – For a single-metric list of results, specify that project metric.
- search_term (str) – If specified, only models containing the term in their name or processes are returned.
- featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
- families (List[str]) – If specified, only models belonging to selected families are returned.
- blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
- labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
- characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
- training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects:
  - number of rows in training subset For datetime partitioned projects:
  - , example P6Y0M0D
  - -- Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling).
  - Start/end date
  - Project settings
- number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit (int)
- offset (int)
Returns: generic_models
Return type: list of GenericModel

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns: job – the job generating the rulesets
Return type: Job

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns: status_id – A statusId of computation request.
Return type: str

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- compared_class_names (list(str)) – List of two classes to compare
Returns: status_id – A statusId of computation request.
Return type: str

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters:
- dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns: job – a Job representing external dataset insights computation
Return type: Job

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
Returns: status_id – A statusId of computation request.
Return type: str

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_per_class_fairness_insights(fairness_metrics_set=None)¶

Request per-class fairness insights be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of .
Returns: status_check_job – The returned object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters:
- dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against
- file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against
- file (IOBase, optional) – (New in v3.0) File to make predictions against
- include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
- predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
- explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
Returns: job – The job computing the predictions
Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters: threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session(grid_search_arguments=None)¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters: grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: Session for setting up and running Advanced Tuning on a model
Return type: AdvancedTuningSession

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first iteration. If set to True, the training process will be performed only for the first iteration. If set to False, training will continue until early stopping conditions are met or the maximum number of iterations is reached. The default value is False.
- chunk_definition_id (str) – The id of the chunk definition to be use for incremental training.
Returns: job – The created job that is retraining the model
Return type: ModelJob

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
- use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
- monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
Returns: job – the created job to build the model
Return type: ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
- data_stage_id (str) – The id of the data stage to use for training.
- training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
Returns: job – The created job that is retraining the model
Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Frozen models¶

class datarobot.models.FrozenModel¶

Represents a model tuned with parameters which are derived from another model

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
- id (str) – the id of the model
- project_id (str) – the id of the project the model belongs to
- processes (List[str]) – the processes used by the model
- featurelist_name (str) – the name of the featurelist used by the model
- featurelist_id (str) – the id of the featurelist used by the model
- sample_pct (float) – the percentage of the project dataset used in training the model
- training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen (bool) – whether this model is a frozen model
- parent_model_id (str) – the id of the model that tuning parameters are derived from
- blueprint_id (str) – the id of the blueprint used in this model
- metrics (dict) – a mapping from each metric to the model’s scores for that metric
- monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
- is_starred (bool) – whether this model marked as starred
- prediction_threshold (float) – for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number (integer) – model number assigned to a model
- supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific frozen model.

Parameters:
- project_id (str) – The project’s id.
- model_id (str) – The model_id of the leaderboard item to retrieve.
Returns: model – The queried instance.
Return type: FrozenModel

Rating table models¶

class datarobot.models.RatingTableModel¶

A model that has a rating table.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
- id (str) – the id of the model
- project_id (str) – the id of the project the model belongs to
- processes (List[str]) – the processes used by the model
- featurelist_name (str) – the name of the featurelist used by the model
- featurelist_id (str) – the id of the featurelist used by the model
- sample_pct (float or None) – the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
- training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen (bool) – whether this model is a frozen model
- blueprint_id (str) – the id of the blueprint used in this model
- metrics (dict) – a mapping from each metric to the model’s scores for that metric
- rating_table_id (str) – the id of the rating table that belongs to this model
- monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
- is_starred (bool) – whether this model marked as starred
- prediction_threshold (float) – for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number (integer) – model number assigned to a model
- supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:
- project_id (str) – the id of the project the model belongs to
- model_id (str) – the id of the model to retrieve
Returns: model – the model
Return type: RatingTableModel

classmethod create_from_rating_table(project_id, rating_table_id)¶

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:
- project_id (str) – the id of the project the rating table belongs to
- rating_table_id (str) – the id of the rating table to create this model from
Returns: job – an instance of created async job
Return type: Job
Raises:
- ClientError – Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested – Raised if creating model from a RatingTable that is already associated with a RatingTableModel

advanced_tune(params, description=None, grid_search_arguments=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
- params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description (str) – Human-readable string describing the newly advanced-tuned model
- grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: The created job to build the model
Return type: ModelJob

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- chunk_definition_id (str) – The Mongo ID for the chunking service.
- early_stopping_rounds (Optional[int]) – The number of chunks that, when no improvement has been shown, triggers the early stopping mechanism.
Returns: job – The model retraining job that is created.
Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Notes

To perform Cross Validation on a new model with new parameters, use train instead.

Returns: The created job to build the model
Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters:
- file_name (str) – File path where scoring code will be saved.
- source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.
Return type: None

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters: file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters: data (dict) – Correctly snake_cased keys and their values.
Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Overrides the inherited method since the model must _not_ recursively change casing

Parameters:
- data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs (list) – List of attribute namespaces like: [‘top.middle.bottom’], that should be kept even if their values are None

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns: A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys * parameter_name : (str) name of the parameter (unique per task, see below) * parameter_id : (str) opaque ID string uniquely identifying parameter * default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value) * current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range. * task_name : (str) name of the task that this parameter belongs to * constraints: (dict) see the notes below * vertex_id: (str) ID of vertex that this parameter belongs to * Return type: dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters: fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: Data for all available confusion charts for model.
Return type: list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters: data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
Returns: Data for all available model feature impacts. Or an empty list if not data found.
Return type: list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model lift charts. Or an empty list if no data found.
Return type: list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Data for all available model lift charts.
Return type: list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters:
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model residuals charts.
Return type: list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters:
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
Returns: Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Return type: list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
- source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model ConfusionChart data
Return type: ConfusionChart
Raises: ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Return type: json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Notes

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
- partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns: cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.
Return type: dict

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- class_name1 (str) – One of the compared classes
- class_name2 (str) – Another compared class
Return type: json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters:
- fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
- offset (Optional[int]) – Number of items to skip.
- limit (Optional[int]) – Number of items to return.
Return type: json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.
Returns: feature_effects – The feature effects data.
Return type: FeatureEffects
Raises: ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns: feature_effect_metadata
Return type: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
- source (str) – The source Feature Effects are retrieved for.
- class (str or None) – The class name Feature Effects are retrieved for.
Returns: The list of multiclass feature effects.
Return type: list
Raises: ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Only the top 1000 features are saved and can be returned.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
- with_metadata (bool) – The flag indicating if the result should include the metadata as well.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
Returns: The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:
- featureImpacts - Feature Impact data as a dictionary. Each item is a dict with : keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
- shapBased - A boolean that indicates whether Feature Impact was calculated using : Shapley values.
- ranRedundancyDetection - A boolean that indicates whether redundant feature : identification was run while calculating this Feature Impact.
- rowCount - An integer or None that indicates the number of rows that was used to : calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
- count - An integer with the number of features under the featureImpacts.
- Return type: list or dict
- Raises:
- ClientError – If the feature impacts have not been computed.
- ValueError – If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns: features – The names of the features used in the model.
Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Return type: A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: Labelwise ROC Curve instances for source and all labels
Return type: list of LabelwiseRocCurve
Raises: ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
Returns: Model lift chart data
Return type: LiftChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns: The queried model missing report, sorted by missing count (DESCENDING order).
Return type: An iterable of MissingReportPerFeature

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns: The queried model blueprint chart.
Return type: ModelBlueprintChart

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns: All documents available for the model.
Return type: list of BlueprintTaskDocument

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns: Json representation of the blueprint stages.
Return type: BlueprintJson

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns: feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Return type: list of dict
Raises: ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

Retrieve model Lift chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- target_class (str, optional) – Lift chart target class name.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: Model lift chart data for each saved target class
Return type: list of LiftChart
Raises: ClientError – If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
- projectId (str) – id of project containing the model
- modelId (str) – id of the model
- data (array) – list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form
- stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
- source (string) – The source Feature Effects are retrieved for.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: feature_effects – The Feature Effects data.
Return type: FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
- source (string) – The source Feature Effects retrieve for.
- class (str or None) – The class name Feature Effects retrieve for.
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
- max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.
Returns: feature_effects – The list of multiclass feature effects data.
Return type: list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously.

Only the top 1000 features are saved and can be returned.

Parameters:
- max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs – Arbitrary keyword arguments passed to request_feature_impact.
Returns: feature_impacts – The feature impact data. See get_feature_impact for the exact schema.
Return type: list or dict

get_parameters()¶

Retrieve model parameters.

Returns: Model parameters for this model.
Return type: ModelParameters

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns: Model ParetoFront data
Return type: ParetoFront

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns: prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
Return type: dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
Returns: Model residuals chart data
Return type: ResidualsChart
Raises:
- ClientError – If the insight is not available for this model
- ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=)¶

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
- source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns: Model ROC curve data
Return type: RocCurve
Raises:
- ClientError – If the insight is not available for this model
- (New in version v3.0) TypeError – If the underlying project type is multilabel
- ValueError – If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns: rulesets
Return type: list of Ruleset

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
- supportsBlending (bool) – whether the model supports blending
- supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
- hasWordCloud (bool) – whether the model has word cloud data available
- eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters (bool) – whether the model has parameters that can be retrieved
- supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
- supportsShap (bool) –
  
  (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based : feature Importance * supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns: url – Permanent static hyperlink to this model at leaderboard.
Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters: exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.
Returns: Word cloud data for the model.
Return type: WordCloud

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type: ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
- sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric (str) – For a single-metric list of results, specify that project metric.
- search_term (str) – If specified, only models containing the term in their name or processes are returned.
- featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
- families (List[str]) – If specified, only models belonging to selected families are returned.
- blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
- labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
- characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
- training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects:
  - number of rows in training subset For datetime partitioned projects:
  - , example P6Y0M0D
  - -- Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling).
  - Start/end date
  - Project settings
- number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit (int)
- offset (int)
Returns: generic_models
Return type: list of GenericModel

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns: job – the job generating the rulesets
Return type: Job

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns: status_id – A statusId of computation request.
Return type: str

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters:
- feature (str) – Bias and Fairness protected feature name.
- compared_class_names (list(str)) – List of two classes to compare
Returns: status_id – A statusId of computation request.
Return type: str

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters:
- dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns: job – a Job representing external dataset insights computation
Return type: Job

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – Can be one of . The fairness metric used to calculate the fairness scores.
Returns: status_id – A statusId of computation request.
Return type: str

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Return type: Job
Raises: JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
- row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
- top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
- features (list or None) – The list of features used to calculate Feature Effects.
Returns: job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
Return type: Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
- row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
Return type: Job or status_id
Raises: JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Notes

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
- sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: model_job – the modeling job training a frozen model
Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters:
- source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)¶

Request per-class fairness insights be computed for the model.

Parameters: fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of .
Returns: status_check_job – The returned object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters:
- dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against
- file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against
- file (IOBase, optional) – (New in v3.0) File to make predictions against
- include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
- predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
- actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
- explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
Returns: job – The job computing the predictions
Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters:
- source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters:
- source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns: status_check_job – Object contains all needed logic for a periodical status check of an async job.
Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters:
- data_subset (str) –
  
  data set definition to build predictions on. Choices are:
  - dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for : models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for : all data except training set. Not valid for models in datetime partitioned projects
  - dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  - dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading : the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
    - explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
    - max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.
  - Returns: an instance of created async job
  - Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters:
- sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id (Optional[str]) – The featurelist id
- training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
Returns: job – The created job that is retraining the model
Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters: threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session(grid_search_arguments=None)¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters: grid_search_arguments (GridSearchArguments) – Grid search arguments
Returns: Session for setting up and running Advanced Tuning on a model
Return type: AdvancedTuningSession

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Parameters:
- early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first iteration. If set to True, the training process will be performed only for the first iteration. If set to False, training will continue until early stopping conditions are met or the maximum number of iterations is reached. The default value is False.
- chunk_definition_id (str) – The id of the chunk definition to be use for incremental training.
Returns: job – The created job that is retraining the model
Return type: ModelJob

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Notes

For datetime partitioned projects, see train_datetime instead.

Parameters:
- sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
- training_row_count (Optional[int]) – The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns: model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function
Return type: str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
- featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
- training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
- use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
- time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
- monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
- n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
Returns: job – the created job to build the model
Return type: ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
- data_stage_id (str) – The id of the data stage to use for training.
- training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
Returns: job – The created job that is retraining the model
Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Clustering¶

class datarobot.models.ClusteringModel¶

ClusteringModel extends Model class. It provides provides properties and methods specific to clustering projects.

compute_insights(max_wait=600)¶

Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified max_wait exception will be raised.

Parameters:
- project_id (str) – Project to start creation in.
- model_id (str) – Project’s model to start creation in.
- max_wait (int) – Maximum number of seconds to wait before giving up
Return type: List of ClusterInsight
Raises:
- ClientError – Server rejected creation due to client error. Most likely cause is bad project_id or model_id.
- AsyncFailureError – If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError – If the cluster insights computation has failed or was cancelled.
- AsyncTimeoutError – If the cluster insights computation did not resolve in time

property insights : List[ClusterInsight]¶

Return actual list of cluster insights if already computed.

Return type: List of ClusterInsight

property clusters : List[Cluster]¶

Return actual list of Clusters.

Return type: List of Cluster

update_cluster_names(cluster_name_mappings)¶

Change many cluster names at once based on list of name mappings.

Parameters: cluster_name_mappings (List of tuples) –

Cluster names mapping consisting of current cluster name and old cluster name. Example:
```
cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
```
* Return type: List of Cluster * Raises: datarobot.errors.ClientError – Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.

update_cluster_name(current_name, new_name)¶

Change cluster name from current_name to new_name.

Parameters:
- current_name (str) – Current cluster name.
- new_name (str) – New cluster name.
Return type: List of Cluster
Raises: datarobot.errors.ClientError – Server rejected update of cluster names.

class datarobot.models.cluster.Cluster¶

Representation of a single cluster.

Variables:
- name (str) – Current cluster name
- percent (float) – Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.

classmethod list(project_id, model_id)¶

Retrieve a list of clusters in the model.

Parameters:
- project_id (str) – ID of the project that the model is part of.
- model_id (str) – ID of the model.
Return type: List of clusters

classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)¶

Update many clusters at once based on list of name mappings.

Parameters:
- project_id (str) – ID of the project that the model is part of.
- model_id (str) – ID of the model.
- cluster_name_mappings (List of tuples) –
  
  Cluster name mappings, consisting of current and previous names for each cluster. Example:
```
cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
```
  * Return type: List of clusters * Raises: * datarobot.errors.ClientError – Server rejected update of cluster names. * ValueError – Invalid cluster name mapping provided.

classmethod update_name(project_id, model_id, current_name, new_name)¶

Change cluster name from current_name to new_name

Parameters:
- project_id (str) – ID of the project that the model is part of.
- model_id (str) – ID of the model.
- current_name (str) – Current cluster name
- new_name (str) – New cluster name
Return type: List of Cluster

class datarobot.models.cluster_insight.ClusterInsight¶

Holds data on all insights related to feature as well as breakdown per cluster.

Parameters:
- feature_name (str) – Name of a feature from the dataset.
- feature_type (str) – Type of feature.
- insights (List[ClusterInsight]) – List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.
- feature_impact (float) – Impact of a feature ranging from 0 to 1.

classmethod compute(project_id, model_id, max_wait=600)¶

Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.

Parameters:
- project_id (str) – ID of the project to begin creation of cluster insights for.
- model_id (str) – ID of the project model to begin creation of cluster insights for.
- max_wait (int) – Maximum number of seconds to wait canceling the request.
Return type: List[ClusterInsight]
Raises:
- ClientError – Server rejected creation due to client error. Most likely cause is bad project_id or model_id.
- AsyncFailureError – Indicates whether any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError – Indicates whether the cluster insights computation failed or was cancelled.
- AsyncTimeoutError – Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).

Pareto front¶

class datarobot.models.pareto_front.ParetoFront¶

Pareto front data for a Eureqa model.

The pareto front reflects the tradeoffs between error and complexity for particular model. The solutions reflect possible Eureqa models that are different levels of complexity. By default, only one solution will have a corresponding model, but models can be created for each solution.

Variables:
- project_id (str) – the ID of the project the model belongs to
- error_metric (str) – Eureqa error-metric identifier used to compute error metrics for this search. Note that Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available metrics are not the same, and are computed from a subset of the training data rather than from the validation data.
- hyperparameters (dict) – Hyperparameters used by this run of the Eureqa blueprint
- target_type (str) – Indicating what kind of modeling is being done in this project, either ‘Regression’, ‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).
- solutions (list(Solution)) – Solutions that Eureqa has found to model this data. Some solutions will have greater accuracy. Others will have slightly less accuracy but will use simpler expressions.

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
- data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs (list) – List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

class datarobot.models.pareto_front.Solution¶

Eureqa Solution.

A solution represents a possible Eureqa model; however not all solutions have models associated with them. It must have a model created before it can be used to make predictions, etc.

Variables:
- eureqa_solution_id (str) – ID of this Solution
- complexity (int) – Complexity score for this solution. Complexity score is a function of the mathematical operators used in the current solution. The Complexity calculation can be tuned via model hyperparameters.
- error (float or None) – Error for the current solution, as computed by Eureqa using the ‘error_metric’ error metric. It will be None if model refitted existing solution.
- expression (str) – Eureqa model equation string.
- expression_annotated (str) – Eureqa model equation string with variable names tagged for easy identification.
- best_model (bool) – True, if the model is determined to be the best

create_model()¶

Add this solution to the leaderboard, if it is not already present.

Combined models¶

See API reference for Combined Model in Segmented Modeling API Reference

Advanced tuning¶

class datarobot.models.advanced_tuning.AdvancedTuningSession¶

A session enabling users to configure and run advanced tuning for a model.

Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.

This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.

Variables: description (str) – Description for the new advance-tuned model. Defaults to the same description as the base model.

get_task_names()¶

Get the list of task names that are available for this model

Returns: List of task names
Return type: list(str)

get_parameter_names(task_name)¶

Get the list of parameter names available for a specific task

Returns: List of parameter names
Return type: list(str)

set_parameter(value, task_name=None, parameter_name=None, parameter_id=None)¶

Set the value of a parameter to be used

The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set. This function only affects client-side state. It will not check that the new parameter value(s) are valid.

Parameters:
- task_name (str) – Name of the task whose parameter needs to be set
- parameter_name (str) – Name of the parameter to set
- parameter_id (str) – ID of the parameter to set
- value (int, float, list, or str) – New value for the parameter, with legal values determined by the parameter being set
Raises:
- NoParametersFoundException – if no matching parameters are found.
- NonUniqueParametersException – if multiple parameters matched the specified filtering criteria
Return type: None

get_parameters()¶

Returns the set of parameters available to this model

The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.

Return type: AdvancedTuningParamsType
Returns:
- parameters (dict) – “Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.
- An additional field is added per parameter to the 'tuning_parameters' list in the dictionary
- value (int, float, list, or str) – The current value of the parameter. None if none has been specified.

run()¶

Submit this model for Advanced Tuning.

Returns: The created job to build the model
Return type: datarobot.models.modeljob.ModelJob

class datarobot.models.advanced_tuning.GridSearchArguments¶

Grid search arguments

Variables:
- search_type (GridSearchSearchType) – The type of grid search to be performed. If not specified, DataRobot performs Smart Search.
- algorithm (GridSearchAlgorithm (optional)) – The algorithm to apply when running the grid search. This is only applicable if the search type is specified and the search determines which algorithm to use. The following are the valid combinations of search type and algorithm: ———————————————————— | GridSearchSearchType.SMART | GridSearchAlgorithm.PATTERN_SEARCH (default) | | GridSearchSearchType.SMART | GridSearchAlgorithm.ACCELERATED_SEARCH | | GridSearchSearchType.BAYESIAN | GridSearchAlgorithm.TPE_SEARCH (default) | | GridSearchSearchType.BAYESIAN | GridSearchAlgorithm.GAUSSIAN_SEARCH | | GridSearchSearchType.BRUTE_FORCE | GridSearchAlgorithm.EXHAUSTIVE_SEARCH (default) | | GridSearchSearchType.BRUTE_FORCE | GridSearchAlgorithm.GREEDY_EXHAUSTIVE_SEARCH | ————————————————————
- batch_size (int (optional)) – The number of iterations to perform in each batch.
- max_iterations (int (optional)) – Sets the maximum number of iterations to perform.
- random_state (int (optional)) – The random state/seed used for the grid search.
- wall_clock_time_limit (int (optional)) – The wall clock time limit, in seconds. The model with the best score, at this point, is selected.

to_api_payload()¶

Convert the GridSearchArguments to an API payload

Return type: Dict[str, Any]

Recommended models¶

class datarobot.models.ModelRecommendation¶

A collection of information about a recommended model for a project.

Variables:
- project_id (str) – the id of the project the model belongs to
- model_id (str) – the id of the recommended model
- recommendation_type (str) – the type of model recommendation

classmethod get(project_id, recommendation_type=None)¶

Retrieves the default or specified by recommendation_type recommendation.

Parameters:
- project_id (str) – The project’s id.
- recommendation_type (enums.RECOMMENDED_MODEL_TYPE) – The type of recommendation to get. If None, returns the default recommendation.
Returns: recommended_model
Return type: ModelRecommendation

classmethod get_all(project_id)¶

Retrieves all of the current recommended models for the project.

Parameters: project_id (str) – The project’s id.
Returns: recommended_models
Return type: list of ModelRecommendation

classmethod get_recommendation(recommended_models, recommendation_type)¶

Returns the model in the given list with the requested type.

Parameters:
- recommended_models (list of ModelRecommendation)
- recommendation_type (enums.RECOMMENDED_MODEL_TYPE) – the type of model to extract from the recommended_models list
Returns: recommended_model
Return type: ModelRecommendation or None if no model with the requested type exists

get_model()¶

Returns the Model associated with this ModelRecommendation.

Returns: recommended_model
Return type: Model or DatetimeModel if the project is datetime-partitioned

Class mapping aggregation settings¶

For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.

class datarobot.helpers.ClassMappingAggregationSettings¶

Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.

Variables:
- max_unaggregated_class_values (Optional[int]) – Maximum amount of unique values allowed before aggregation kicks in.
- min_class_support (Optional[int]) – Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.
- excluded_from_aggregation (Optional[List]) – List of target values that should be guaranteed to kept as is, regardless of other settings.
- aggregation_class_name (Optional[str]) – If some of the values will be aggregated - this is the name of the aggregation class that will replace them.

Model jobs¶

datarobot.models.modeljob.wait_for_async_model_creation(project_id, model_job_id, max_wait=600)¶

Given a Project id and ModelJob id poll for status of process responsible for model creation until model is created.

Parameters:
- project_id (str) – The identifier of the project
- model_job_id (str) – The identifier of the ModelJob
- max_wait (Optional[int]) – Time in seconds after which model creation is considered unsuccessful
Returns: model – Newly created model
Return type: Model
Raises:
- AsyncModelCreationError – Raised if status of fetched ModelJob object is error
- AsyncTimeoutError – Model wasn’t created in time, specified by max_wait parameter

class datarobot.models.ModelJob¶

Tracks asynchronous work being done within a project

Variables:
- id (int) – the id of the job
- project_id (str) – the id of the project the job belongs to
- status (str) – the status of the job - will be one of datarobot.enums.QUEUE_STATUS
- job_type (str) – what kind of work the job is doing - will be ‘model’ for modeling jobs
- is_blocked (bool) – if true, the job is blocked (cannot be executed) until its dependencies are resolved
- sample_pct (float) – the percentage of the project’s dataset used in this modeling job
- model_type (str) – the model this job builds (e.g. ‘Nystroem Kernel SVM Regressor’)
- processes (List[str]) – the processes used by the model
- featurelist_id (str) – the id of the featurelist used in this modeling job
- blueprint (Blueprint) – the blueprint used in this modeling job

classmethod from_job(job)¶

Transforms a generic Job into a ModelJob

Parameters: job (Job) – A generic job representing a ModelJob
Returns: model_job – A fully populated ModelJob with all the details of the job
Return type: ModelJob
Raises: ValueError: – If the generic Job was not a model job, e.g. job_type != JOB_TYPE.MODEL

classmethod get(project_id, model_job_id)¶

Fetches one ModelJob. If the job finished, raises PendingJobFinished exception.

Parameters:
- project_id (str) – The identifier of the project the model belongs to
- model_job_id (str) – The identifier of the model_job
Returns: model_job – The pending ModelJob
Return type: ModelJob
Raises:
- PendingJobFinished – If the job being queried already finished, and the server is re-routing to the finished model.
- AsyncFailureError – Querying this resource gave a status code other than 200 or 303

classmethod get_model(project_id, model_job_id)¶

Fetches a finished model from the job used to create it.

Parameters:
- project_id (str) – The identifier of the project the model belongs to
- model_job_id (str) – The identifier of the model_job
Returns: model – The finished model
Return type: Model
Raises:
- JobNotFinished – If the job has not finished yet
- AsyncFailureError – Querying the model_job in question gave a status code other than 200 or 303

cancel()¶

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters: params (dict or None) – Query parameters to be added to request to get results.

Notes

For featureEffects, source param is required to define source, otherwise the default is training.

Returns: result –

Return type depends on the job type : - for model jobs, a Model is returned - for predict jobs, a pandas.DataFrame (with predictions) is returned - for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method). - for primeRulesets jobs, a list of Rulesets - for primeModel jobs, a PrimeModel - for primeDownloadValidation jobs, a PrimeFile - for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization - for predictionExplanations jobs, a PredictionExplanations - for featureEffects, a FeatureEffects. * Return type: object * Raises: * JobNotFinished – If the job is not finished, the result is not available. * AsyncProcessUnsuccessfulError – If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters:
- max_wait (Optional[int]) – How long to wait for the job to finish.
- params (dict, optional) – Query parameters to be added to request.
Returns: result – Return type is the same as would be returned by Job.get_result.
Return type: object
Raises:
- AsyncTimeoutError – If the job does not finish in time
- AsyncProcessUnsuccessfulError – If the job errored or was aborted

refresh()¶

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters: max_wait (Optional[int]) – How long to wait for the job to finish.
Return type: None

Registry jobs¶

class datarobot.models.registry.job.Job¶

A DataRobot job.

Added in version v3.4.

Variables:
- id (str) – The ID of the job.
- name (str) – The name of the job.
- created_at (str) – ISO-8601 formatted timestamp of when the version was created
- items (List[JobFileItem]) – A list of file items attached to the job.
- description (Optional[str]) – A job description.
- environment_id (Optional[str]) – The ID of the environment to use with the job.
- environment_version_id (Optional[str]) – The ID of the environment version to use with the job.

classmethod create(name, environment_id=None, environment_version_id=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)¶

Create a job.

Added in version v3.4.

Parameters:
- name (str) – The name of the job.
- environment_id (Optional[str]) – The environment ID to use for job runs. The ID must be specified in order to run the job.
- environment_version_id (Optional[str]) – The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.
- folder_path (Optional[str]) – The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files (Optional[Union[List[Tuple[str, str]], List[str]]]) – The files to be uploaded to the job. The files can be defined in 2 ways:
  1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system.
  2. List of local paths of the files to be uploaded. In this case files are added to the root of the model file system.
- file_data (Optional[Dict[str, str]]) – The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.
- runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
Returns: created job
Return type: Job
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status

classmethod list()¶

List jobs.

Added in version v3.4.

Returns: a list of jobs
Return type: List[Job]
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status

classmethod get(job_id)¶

Get job by id.

Added in version v3.4.

Parameters: job_id (str) – The ID of the job.
Returns: retrieved job
Return type: Job
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status.
- datarobot.errors.ServerError – if the server responded with 5xx status.

update(name=None, entry_point=None, environment_id=None, environment_version_id=None, description=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)¶

Update job properties.

Added in version v3.4.

Parameters:
- name (str) – The job name.
- entry_point (Optional[str]) – The job file item ID to use as an entry point of the job.
- environment_id (Optional[str]) – The environment ID to use for job runs. Must be specified in order to run the job.
- environment_version_id (Optional[str]) – The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.
- description (str) – The job description.
- folder_path (Optional[str]) – The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files (Optional[Union[List[Tuple[str, str]], List[str]]]) – The files to be uploaded to the job. The files can be defined in 2 ways:
  1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system.
  2. List of local paths of the files to be uploaded. In this case files are added to the root of the job file system.
- file_data (Optional[Dict[str, str]]) – The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.
- runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status.
- datarobot.errors.ServerError – if the server responded with 5xx status.
Return type: None

delete()¶

Delete job.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – If the server responded with 4xx status.
- datarobot.errors.ServerError – If the server responded with 5xx status.
Return type: None

refresh()¶

Update job with the latest data from server.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
Return type: None

classmethod create_from_custom_metric_gallery_template(template_id, name, description=None, sidecar_deployment_id=None)¶

Create a job from a custom metric gallery template.

Parameters:
- template_id (str) – ID of the template.
- name (str) – Name of the job.
- description (Optional[str]) – Description of the job.
- sidecar_deployment_id (Optional[str]) – ID of the sidecar deployment. Only relevant for templates that use sidecar deployments.
Returns: retrieved job
Return type: Job
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status.
- datarobot.errors.ServerError – if the server responded with 5xx status.

list_schedules()¶

List schedules for the job.

Returns: a list of schedules for the job.
Return type: List[JobSchedule]

class datarobot.models.registry.job.JobFileItem¶

A file item attached to a DataRobot job.

Added in version v3.4.

Variables:
- id (str) – The ID of the file item.
- file_name (str) – The name of the file item.
- file_path (str) – The path of the file item.
- file_source (str) – The source of the file item.
- created_at (str) – ISO-8601 formatted timestamp of when the version was created.

class datarobot.models.registry.job_run.JobRun¶

A DataRobot job run.

Added in version v3.4.

Variables:
- id (str) – The ID of the job run.
- custom_job_id (str) – The ID of the parent job.
- description (str) – A description of the job run.
- created_at (str) – ISO-8601 formatted timestamp of when the version was created
- items (List[JobFileItem]) – A list of file items attached to the job.
- status (JobRunStatus) – The status of the job run.
- duration (float) – The duration of the job run.

classmethod create(job_id, max_wait=600, runtime_parameter_values=None)¶

Create a job run.

Added in version v3.4.

Parameters:
- job_id (str) – The ID of the job.
- max_wait (Optional[int]) – max time to wait for a terminal status (“succeeded”, “failed”, “interrupted”, “canceled”). If set to None - method will return without waiting.
- runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
Returns: created job
Return type: Job
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
- ValueError – if execution environment or entry point is not specified for the job

classmethod list(job_id)¶

List job runs.

Added in version v3.4.

Parameters: job_id (str) – The ID of the job.
Returns: A list of job runs.
Return type: List[Job]
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status

classmethod get(job_id, job_run_id)¶

Get job run by id.

Added in version v3.4.

Parameters:
- job_id (str) – The ID of the job.
- job_run_id (str) – The ID of the job run.
Returns: The retrieved job run.
Return type: Job
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status.
- datarobot.errors.ServerError – if the server responded with 5xx status.

update(description=None)¶

Update job run properties.

Added in version v3.4.

Parameters: description (str) – new job run description
Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status.
- datarobot.errors.ServerError – if the server responded with 5xx status.
Return type: None

cancel()¶

Cancel job run.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – If the server responded with 4xx status.
- datarobot.errors.ServerError – If the server responded with 5xx status.
Return type: None

refresh()¶

Update job run with the latest data from server.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
Return type: None

get_logs()¶

Get log of the job run.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
Return type: Optional[str]

delete_logs()¶

Get log of the job run.

Added in version v3.4.

Raises:
- datarobot.errors.ClientError – if the server responded with 4xx status
- datarobot.errors.ServerError – if the server responded with 5xx status
Return type: None

class datarobot.models.registry.job_run.JobRunStatus¶

Enum of the job run statuses

class datarobot.models.registry.job.JobSchedule¶

A job schedule.

Added in version v3.5.

Variables:
- id (str) – The ID of the job schedule.
- custom_job_id (str) – The ID of the custom job.
- updated_at (str) – ISO-8601 formatted timestamp of when the schedule was updated.
- updated_by (Dict[str, Any]) – The user who updated the schedule.
- created_at (str) – ISO-8601 formatted timestamp of when the schedule was created.
- created_by (Dict[str, Any]) – The user who created the schedule.
- scheduled_job_id (str) – The ID of the scheduled job.
- deployment (Dict[str, Any]) – The deployment of the scheduled job.
- schedule (Schedule) – The schedule of the job.
- parameter_overrides (List[RuntimeParameterValue]) – The parameter overrides for this schedule.

update(schedule=None, parameter_overrides=None)¶

Update the job schedule.

Parameters:
- schedule (Optional[Schedule]) – The schedule of the job.
- parameter_overrides (Optional[List[RuntimeParameterValue]]) – The parameter overrides for this schedule.
Return type: JobSchedule

delete()¶

Delete the job schedule. :rtype: None

classmethod create(custom_job_id, schedule, parameter_overrides=None)¶

Create a job schedule.

Parameters:
- custom_job_id (str) – The ID of the custom job.
- schedule (Schedule) – The schedule of the job.
- parameter_overrides (Optional[List[RuntimeParameterValue]]) – The parameter overrides for this schedule.
Return type: JobSchedule

Missing values report¶

class datarobot.models.missing_report.MissingValuesReport¶

Missing values report for model, contains list of reports per feature sorted by missing count in descending order.

Notes

Report per feature contains:

feature : feature name.
type : feature type – ‘Numeric’ or ‘Categorical’.
missing_count : missing values count in training data.
missing_percentage : missing values percentage in training data.
tasks : list of information per each task, which was applied to feature.

task information contains:

id : a number of task in the blueprint diagram.
name : task name.
descriptions : human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.

classmethod get(project_id, model_id)¶

Retrieve a missing report.

Parameters:
- project_id (str) – The project’s id.
- model_id (str) – The model’s id.
Returns: The queried missing report.
Return type: MissingValuesReport

Registered models¶

class datarobot.models.RegisteredModel¶

A registered model is a logical grouping of model packages (versions) that are related to each other.

Variables:
- id (str) – The ID of the registered model.
- name (str) – The name of the registered model.
- description (str) – The description of the registered model.
- created_at (str) – The creation time of the registered model.
- modified_at (str) – The last modification time for the registered model.
- modified_by (datarobot.models.model_registry.common.UserMetadata) – Information on the user who last modified the registered model.
- target (Target) – Information on the target variable.
- created_by (datarobot.models.model_registry.common.UserMetadata) – Information on the creator of the registered model.
- last_version_num (int) – The latest version number associated to this registered model.
- is_archived (bool) – Determines whether the registered model is archived.

classmethod get(registered_model_id)¶

Get a registered model by ID.

Parameters: registered_model_id (str) – ID of the registered model to retrieve
Returns: registered_model – Registered Model Object
Return type: RegisteredModel

Examples

from datarobot import RegisteredModel
registered_model = RegisteredModel.get(registered_model_id='5c939e08962d741e34f609f0')
registered_model.id
>>>'5c939e08962d741e34f609f0'
registered_model.name
>>>'My Registered Model'

classmethod list(limit=100, offset=None, sort_key=None, sort_direction=None, search=None, filters=None)¶

List all registered models a user can view.

Parameters:
- limit (Optional[int]) – Maximum number of registered models to return
- offset (Optional[int]) – Number of registered models to skip before returning results
- sort_key (RegisteredModelSortKey, optional) – Key to order result by
- sort_direction (RegisteredModelSortDirection, optional) – Sort direction
- search (Optional[str]) – A term to search for in registered model name, description, or target name
- filters (RegisteredModelListFilters, optional) – An object containing all filters that you’d like to apply to the resulting list of registered models.
Returns: registered_models – A list of registered models user can view.
Return type: List[RegisteredModel]

Examples

from datarobot import RegisteredModel
registered_models = RegisteredModel.list()
>>> [RegisteredModel('My Registered Model'), RegisteredModel('My Other Registered Model')]

from datarobot import RegisteredModel
from datarobot.models.model_registry import RegisteredModelListFilters
from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection
filters = RegisteredModelListFilters(target_type='Regression')
registered_models = RegisteredModel.list(
    filters=filters,
    sort_key=RegisteredModelSortKey.NAME.value,
    sort_direction=RegisteredModelSortDirection.DESC.value
    search='other')
>>> [RegisteredModel('My Other Registered Model')]

classmethod archive(registered_model_id)¶

Permanently archive a registered model and all of its versions.

Parameters: registered_model_id (str) – ID of the registered model to be archived
Return type: None

classmethod update(registered_model_id, name)¶

Update the name of a registered model.

Parameters:
- registered_model_id (str) – ID of the registered model to be updated
- name (str) – New name for the registered model
Returns: registered_model – Updated registered model object
Return type: RegisteredModel

get_shared_roles(offset=None, limit=None, id=None)¶

Retrieve access control information for this registered model.

Parameters:
- offset (Optional[int]) – The number of records to skip over. Optional. Default is 0.
- limit (Optional[int]) – The number of records to return. Optional. Default is 100.
- id (Optional[str]) – Return the access control information for a user with this user ID. Optional.
Return type: List[SharingRole]

Share this registered model or remove access from one or more user(s).

Parameters: roles (List[SharingRole]) – A list of SharingRole instances, each of which references a user and a role to be assigned.
Return type: None

Examples

>>> from datarobot import RegisteredModel, SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>> registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
>>> sharing_role = SharingRole(
...    role=SHARING_ROLE.CONSUMER,
...    share_recipient_type=SHARING_RECIPIENT_TYPE.USER,
...    username='jim.bob@datarobot.com'
...    )
>>> registered_model.share(roles=[sharing_role])

get_version(version_id)¶

Retrieve a registered model version.

Parameters: version_id (str) – The ID of the registered model version to retrieve.
Returns: registered_model_version – A registered model version object.
Return type: RegisteredModelVersion

Examples

from datarobot import RegisteredModel
registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
registered_model_version = registered_model.get_version('5c939e08962d741e34f609f0')
>>> RegisteredModelVersion('My Registered Model Version')

list_versions(filters=None, search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of registered model versions.

Parameters:
- filters (Optional[RegisteredModelVersionsListFilters]) – A RegisteredModelVersionsListFilters instance used to filter the list of registered model versions returned.
- search (Optional[str]) – A search string used to filter the list of registered model versions returned.
- sort_key (Optional[RegisteredModelVersionSortKey]) – The key to use to sort the list of registered model versions returned.
- sort_direction (Optional[RegisteredModelSortDirection]) – The direction to use to sort the list of registered model versions returned.
- limit (Optional[int]) – The maximum number of registered model versions to return. Default is 100.
- offset (Optional[int]) – The number of registered model versions to skip over. Default is 0.
Returns: registered_model_versions – A list of registered model version objects.
Return type: List[RegisteredModelVersion]

Examples

from datarobot import RegisteredModel
from datarobot.models.model_registry import RegisteredModelVersionsListFilters
from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection
registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
filters = RegisteredModelVersionsListFilters(tags=['tag1', 'tag2'])
registered_model_versions = registered_model.list_versions(filters=filters)
>>> [RegisteredModelVersion('My Registered Model Version')]

list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of deployments associated with this registered model.

Parameters:
- search (Optional[str])
- sort_key (Optional[RegisteredModelDeploymentSortKey])
- sort_direction (Optional[RegisteredModelSortDirection])
- limit (Optional[int])
- offset (Optional[int])
Returns: deployments – A list of deployments associated with this registered model.
Return type: List[VersionAssociatedDeployment]

class datarobot.models.RegisteredModelVersion¶

Represents a version of a registered model.

Parameters:
- id (str) – The ID of the registered model version.
- registered_model_id (str) – The ID of the parent registered model.
- registered_model_version (int) – The version of the registered model.
- name (str) – The name of the registered model version.
- model_id (str) – The ID of the model.
- model_execution_type (str) – Type of model package (version). dedicated (native DataRobot models) and custom_inference_model` (user added inference models) both execute on DataRobot prediction servers, external do not
- is_archived (bool) –
  
  Whether the model package (version) is permanently archived (cannot be used in deployment or : replacement) * import_meta (ImportMeta) – Information from when this Model Package (version) was first saved. * source_meta (SourceMeta) – Meta information from where this model was generated * model_kind (ModelKind) – Model attribute information. * target (Target) – Target information for the registered model version. * model_description (ModelDescription) – Model description information. * datasets (Dataset) – Dataset information for the registered model version. * timeseries (Timeseries) – Timeseries information for the registered model version. * bias_and_fairness (BiasAndFairness) – Bias and fairness information for the registered model version. * is_deprecated (bool) –
  
  Whether the model package (version) is deprecated (cannot be used in deployment or : replacement) * permissions (List[str]) – Permissions for the registered model version. * active_deployment_count (int or None) – Number of the active deployments associated with the registered model version. * build_status (str or None) – Model package (version) build status. One of complete, inProgress, failed. * user_provided_id (str or None) – User provided ID for the registered model version. * updated_at (str or None) – The time the registered model version was last updated. * updated_by (UserMetadata or None) – The user who last updated the registered model version. * tags (List[TagWithId] or None) – The tags associated with the registered model version. * mlpkg_file_contents (str or None) – The contents of the model package file.

classmethod create_for_leaderboard_item(model_id, name=None, prediction_threshold=None, distribution_prediction_model_id=None, description=None, compute_all_ts_intervals=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶

Parameters:
- model_id (str) – ID of the DataRobot model.
- name (str or None) – Name of the version (model package).
- prediction_threshold (float or None) – Threshold used for binary classification in predictions.
- distribution_prediction_model_id (str or None) – ID of the DataRobot distribution prediction model trained on predictions from the DataRobot model.
- description (str or None) – Description of the version (model package).
- compute_all_ts_intervals (bool or None) – Whether to compute all time series prediction intervals (1-100 percentiles).
- registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tags (Optional[List[Tag]]) – Tags for the registered model version.
- registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
- registered_model_description (Optional[str]) – Description for the registered model.
Returns: regitered_model_version – A new registered model version object.
Return type: RegisteredModelVersion

classmethod create_for_external(name, target, model_id=None, model_description=None, datasets=None, timeseries=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None, geospatial_monitoring=None)¶

Create a new registered model version from an external model.

Parameters:
- name (str) – Name of the registered model version.
- target (ExternalTarget) – Target information for the registered model version.
- model_id (Optional[str]) – Model ID of the registered model version.
- model_description (Optional[ModelDescription]) – Information about the model.
- datasets (Optional[ExternalDatasets]) – Dataset information for the registered model version.
- timeseries (Optional[Timeseries]) – Timeseries properties for the registered model version.
- registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tags (Optional[List[Tag]]) – Tags for the registered model version.
- registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
- registered_model_description (Optional[str]) – Description for the registered model.
- geospatial_monitoring (Optional[ExternalGeospatialMonitoring]) – Geospatial monitoring settings for the registered model version.
Returns: registered_model_version – A new registered model version object.
Return type: RegisteredModelVersion

classmethod create_for_custom_model_version(custom_model_version_id, name=None, description=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶

Create a new registered model version from a custom model version.

Parameters:
- custom_model_version_id (str) – ID of the custom model version.
- name (Optional[str]) – Name of the registered model version.
- description (Optional[str]) – Description of the registered model version.
- registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tags (Optional[List[Tag]]) – Tags for the registered model version.
- registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
- registered_model_description (Optional[str]) – Description for the registered model.
Returns: registered_model_version – A new registered model version object.
Return type: RegisteredModelVersion

list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of deployments associated with this registered model version.

Parameters:
- search (Optional[str])
- sort_key (Optional[RegisteredModelDeploymentSortKey])
- sort_direction (Optional[RegisteredModelSortDirection])
- limit (Optional[int])
- offset (Optional[int])
Returns: deployments – A list of deployments associated with this registered model version.
Return type: List[VersionAssociatedDeployment]

class datarobot.models.model_registry.deployment.VersionAssociatedDeployment¶

Represents a deployment associated with a registered model version.

Parameters:
- id (str) – The ID of the deployment.
- currently_deployed (bool) – Whether this version is currently deployed.
- registered_model_version (int) – The version of the registered model associated with this deployment.
- is_challenger (bool) – Whether the version associated with this deployment is a challenger.
- status (str) – The status of the deployment.
- label (Optional[str]) – The label of the deployment.
- first_deployed_at (datetime.datetime, optional) – The time the version was first deployed.
- first_deployed_by (UserMetadata, optional) – The user who first deployed the version.
- created_by (UserMetadata, optional) – The user who created the deployment.
- prediction_environment (DeploymentPredictionEnvironment, optional) – The prediction environment of the deployment.

class datarobot.models.model_registry.RegisteredModelVersionsListFilters¶

Filters for listing of registered model versions.

Parameters:
- target_name (str or None) – Name of the target to filter by.
- target_type (str or None) – Type of the target to filter by.
- compatible_with_leaderboard_model_id (str or None.) – If specified, limit results to versions (model packages) of the Leaderboard model with the specified ID.
- compatible_with_model_package_id (str or None.) – Returns versions compatible with the given model package (version) ID. If used, it will only return versions that match target.name, target.type, target.classNames (for classification models), modelKind.isTimeSeries and modelKind.isMultiseries for the specified model package (version).
- for_challenger (bool or None) – Can be used with compatibleWithModelPackageId to request similar versions that can be used as challenger models; for external model packages (versions), instead of returning similar external model packages (versions), similar DataRobot and Custom model packages (versions) will be retrieved.
- prediction_threshold (float or None) – Return versions with the specified prediction threshold used for binary classification models.
- imported (bool or None) – If specified, return either imported (true) or non-imported (false) versions (model packages).
- prediction_environment_id (str or None) – Can be used to filter versions (model packages) by what is supported by the prediction environment
- model_kind (str or None) – Can be used to filter versions (model packages) by model kind.
- build_status (str or None) – If specified, filter versions by the build status.

class datarobot.models.model_registry.RegisteredModelListFilters¶

Filters for listing registered models.

Parameters:
- created_at_start (datetime.datetime) – Registered models created on or after this timestamp.
- created_at_end (datetime.datetime) – Registered models created before this timestamp. Defaults to the current time.
- modified_at_start (datetime.datetime) – Registered models modified on or after this timestamp.
- modified_at_end (datetime.datetime) – Registered models modified before this timestamp. Defaults to the current time.
- target_name (str) – Name of the target to filter by.
- target_type (str) – Type of the target to filter by.
- created_by (str) – Email of the user that created registered model to filter by.
- compatible_with_leaderboard_model_id (str) – If specified, limit results to registered models containing versions (model packages) for the leaderboard model with the specified ID.
- compatible_with_model_package_id (str) – Return registered models that have versions (model packages) compatible with given model package (version) ID. If used, will only return registered models which have versions that match target.name, target.type, target.classNames (for classification models), modelKind.isTimeSeries, and modelKind.isMultiseries of the specified model package (version).
- for_challenger (bool) – Can be used with compatibleWithModelPackageId to request similar registered models that contain versions (model packages) that can be used as challenger models; for external model packages (versions), instead of returning similar external model packages (versions), similar DataRobot and Custom model packages will be retrieved.
- prediction_threshold (float) – If specified, return any registered models containing one or more versions matching the prediction threshold used for binary classification models.
- imported (bool) – If specified, return any registered models that contain either imported (true) or non-imported (false) versions (model packages).
- prediction_environment_id (str) – Can be used to filter registered models by what is supported by the prediction environment.
- model_kind (str) – Return models that contain versions matching a specific format.
- build_status (str) – If specified, only return models that have versions with specified build status.

Rulesets¶

class datarobot.models.Ruleset¶

Represents an approximation of a model with DataRobot Prime

Variables:
- id (str) – the id of the ruleset
- rule_count (int) – the number of rules used to approximate the model
- score (float) – the validation score of the approximation
- project_id (str) – the project the approximation belongs to
- parent_model_id (str) – the model being approximated
- model_id (str or None) – the model using this ruleset (if it exists). Will be None if no such model has been trained.

request_model()¶

Request training for a model using this ruleset

Training a model using a ruleset is a necessary prerequisite for being able to download the code for a ruleset.

Returns: job – the job fitting the new Prime model
Return type: Job

DataRobot Models¶

Generic models¶

class datarobot.models.GenericModel¶

Models¶

class datarobot.models.Model¶

classmethod get(project, model_id)¶

advanced_tune(params, description=None, grid_search_arguments=None)¶

continue_incremental_learning_from_incremental_model(chunk_definition_id, early_stopping_rounds=None)¶

cross_validate()¶

delete()¶

download_scoring_code(file_name, source_code=False)¶

download_training_artifact(file_name)¶

classmethod from_data(data)¶

classmethod from_server_data(data, keep_attrs=None)¶

get_advanced_tuning_parameters()¶

get_all_confusion_charts(fallback_to_parent_insights=False)¶

get_all_feature_impacts(data_slice_filter=None)¶

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

get_confusion_chart(source, fallback_to_parent_insights=False)¶

get_cross_class_accuracy_scores()¶

get_cross_validation_scores(partition=None, metric=None)¶

get_data_disparity_insights(feature, class_name1, class_name2)¶

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

get_feature_effect(source, data_slice_id=None)¶

get_feature_effect_metadata()¶

get_feature_effects_multiclass(source='training', class_=None)¶

get_feature_impact(with_metadata=False, data_slice_filter=)¶

get_features_used()¶

get_frozen_child_models()¶

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

get_missing_report_info()¶

get_model_blueprint_chart()¶

get_model_blueprint_documents()¶

get_model_blueprint_json()¶

get_multiclass_feature_impact()¶

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=, target_class=None)¶

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

get_num_iterations_trained()¶

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

get_or_request_feature_impact(max_wait=600, **kwargs)¶

get_parameters()¶

get_pareto_front()¶

get_prime_eligibility()¶

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=)¶

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=)¶

get_rulesets()¶

get_supported_capabilities()¶

get_uri()¶

get_word_cloud(exclude_stop_words=False)¶

incremental_train(data_stage_id, training_data_name=None)¶

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶

open_in_browser()¶

request_approximation()¶

request_cross_class_accuracy_scores()¶

request_data_disparity_insights(feature, compared_class_names)¶

request_external_test(dataset_id, actual_value_column=None)¶

request_fairness_insights(fairness_metrics_set=None)¶

request_feature_effect(row_count=None, data_slice_id=None)¶

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

request_frozen_model(sample_pct=None, training_row_count=None)¶

request_lift_chart(source, data_slice_id=None)¶

request_per_class_fairness_insights(fairness_metrics_set=None)¶

request_residuals_chart(source, data_slice_id=None)¶

request_roc_curve(source, data_slice_id=None)¶

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

set_prediction_threshold(threshold)¶

star_model()¶

start_advanced_tuning_session(grid_search_arguments=None)¶

start_incremental_learning_from_sample(early_stopping_rounds=None, first_iteration_only=False, chunk_definition_id=None)¶

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=, monotonic_decreasing_featurelist_id=)¶

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=, monotonic_decreasing_featurelist_id=, use_project_settings=False, sampling_method=None, n_clusters=None)¶

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=