project_id (str) – ID of the project the model belongs to.
processes (List[str]) – Processes used by the model.
featurelist_name (str) – Name of the featurelist used by the model.
featurelist_id (str) – ID of the featurelist used by the model.
sample_pct (float or None) – Percentage of the project dataset used in model training. If the project uses
datetime partitioning, the sample_pct will be None. See training_row_count,
training_duration, and training_start_date / training_end_date instead.
training_row_count (int or None) – Number of rows of the project dataset used in model training. In a datetime
partitioned project, if specified, defines the number of rows used to train the model and
evaluate backtest scores; if unspecified, either training_duration or
training_start_date and training_end_date is used for training_row_count.
training_duration (str or None) – For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train
the model and evaluate backtest scores.
training_start_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the end
date of the data used to train the model.
model_type (str) – Type of model, for example ‘Nystroem Kernel SVM Regressor’.
model_category (str) – Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and
‘model’ for other models.
is_frozen (bool) – Whether this model is a frozen model.
is_n_clusters_dynamically_determined (bool) – (New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.
blueprint_id (str) – ID of the blueprint used to build this model.
metrics (dict) – Mapping from each metric to the model’s score for that metric.
monotonic_increasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
n_clusters (int) – (New in version v2.27) Optional. Number of data clusters discovered by model.
has_empty_clusters (bool) – (New in version v2.27) Optional. Whether clustering model produces empty clusters.
supports_monotonic_constraints (bool) – Optional. Whether this model supports enforcing monotonic constraints.
is_starred (bool) – Whether this model is marked as a starred model.
prediction_threshold (float) – Binary classification projects only. Threshold used for predictions.
prediction_threshold_read_only (bool) – Whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
model_number (integer) – Model number assigned to the model.
parent_model_id (str or None) – (New in version v2.20) ID of the model that tuning parameters are derived from.
supports_composable_ml (bool or None) – (New in version v2.26)
Whether this model is supported Composable ML.
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Parameters:
params (dict) – Mapping of parameter ID to parameter value.
The list of valid parameter IDs for a model can be found by calling
get_advanced_tuning_parameters().
This endpoint does not need to include values for all parameters. If a parameter
is omitted, its current_value will be used.
description (str) – Human-readable string describing the newly advanced-tuned model
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Returns:
A dictionary describing the advanced-tuning parameters for the current model.
There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the
user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
* parameter_name : (str) name of the parameter (unique per task, see below)
* parameter_id : (str) opaque ID string uniquely identifying parameter
* default_value : (*) the actual value used to train the model; either
the single value of the parameter specified before training, or the best
value from the list of grid-searched values (based on current_value)
* current_value : (*) the single value or list of values of the
parameter that were grid searched. Depending on the grid search
specification, could be a single fixed value (no grid search),
a list of discrete values, or a range.
* task_name : (str) name of the task that this parameter belongs to
* constraints: (dict) see the notes below
* vertex_id: (str) ID of vertex that this parameter belongs to
* Return type:dict
Notes
The type of default_value and current_value is defined by the constraints structure.
It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys.
The presence of a key indicates that the parameter may take on the specified type.
(If a key is absent, this means that the parameter may not take on the specified type.)
If a key on constraints is present, its value will be a dict containing
all of the fields described below for that key.
select:
Rather than specifying a specific data type, if present, it indicates that the parameter
is permitted to take on any of the specified values. Listed values may be of any string
or real (non-complex) numeric type.
ascii:
The parameter may be a unicode object that encodes simple ASCII characters.
(A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed
constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode:
The parameter may be any Python unicode object.
int:
The value may be an object of type int within the specified range (inclusive).
Please note that the value will be passed around using the JSON format, and
some JSON parsers have undefined behavior with integers outside of the range
[-(2**53)+1, (2**53)-1].
float:
The value may be an object of type float within the specified range (inclusive).
intList, floatList:
The value may be a list of int or float objects, respectively, following constraints
as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple
keys, the parameter may take on any value permitted by any key.
Retrieve a list of all confusion matrices available for the model.
Parameters:fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent for any source that is not available for this model and if this
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
Returns:
Data for all available confusion charts for model.
Retrieve a list of all feature impact results available for the model.
Parameters:data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then no data_slice filtering will be applied when requesting the roc_curve.
Returns:
Data for all available model feature impacts. Or an empty list if not data found.
model=datarobot.Model(id='model-id',project_id='project-id')# Get feature impact insights for sliced datadata_slice=datarobot.DataSlice(id='data-slice-id')sliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get feature impact insights for unsliced datadata_slice=datarobot.DataSlice()unsliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get all feature impact insightsall_fi=model.get_all_feature_impacts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model lift charts. Or an empty list if no data found.
model=datarobot.Model.get('project-id','model-id')# Get lift chart insights for sliced datasliced_lift_charts=model.get_all_lift_charts(data_slice_id='data-slice-id')# Get lift chart insights for unsliced dataunsliced_lift_charts=model.get_all_lift_charts(unsliced_only=True)# Get all lift chart insightsall_lift_charts=model.get_all_lift_charts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Data for all available model lift charts.
Retrieve a list of all residuals charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent
for any source that is not available for this model and if this model has a
defined parent model. If omitted or False, or this model has no parent, this will
not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model residuals charts.
model=datarobot.Model.get('project-id','model-id')# Get residuals chart insights for sliced datasliced_residuals_charts=model.get_all_residuals_charts(data_slice_id='data-slice-id')# Get residuals chart insights for unsliced dataunsliced_residuals_charts=model.get_all_residuals_charts(unsliced_only=True)# Get all residuals chart insightsall_residuals_charts=model.get_all_residuals_charts()
Retrieve a list of all ROC curves available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on
data_slice_id.
Returns:
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
model=datarobot.Model.get('project-id','model-id')ds_filter=DataSlice(id='data-slice-id')# Get roc curve insights for sliced datasliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get roc curve insights for unsliced datadata_slice_filter=DataSlice(id=None)unsliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get all roc curve insightsall_roc_curves=model.get_all_roc_curves()
Retrieve a multiclass model’s confusion matrix for the specified source.
Parameters:
source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent if the confusion chart is not available for this model and the
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model ConfusionChart data
Return type:ConfusionChart
Raises:ClientError – If the insight is not available for this model
Return a dictionary, keyed by metric, showing cross validation
scores per partition.
Cross Validation should already have been performed using
cross_validate or
train.
Notes
Models that computed cross validation before this feature was added will need
to be deleted and retrained before this method can be used.
Parameters:
partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by
can be a whole number positive integer or float value. 0 corresponds to the
validation partition.
metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns:cross_validation_scores – A dictionary keyed by metric showing cross validation scores per
partition.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older
projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions
(i.e., no out-of-sample predictions in those partitions),
Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for
the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve the computed Feature Impact results, a measure of the relevance of each
feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly
permuted (but the others left unchanged), and seeing how the error metric score for the
predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score
is when making predictions on this modified data. The ‘impactNormalized’ is normalized so
that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t
contribute much in addition, the ‘redundantWith’ value is the name of feature that has the
highest correlation with this feature. Note that redundancy detection is only available for
jobs run after the addition of this feature. When retrieving data that predates this
functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_feature_impact will raise a ValueError.
Returns:
The feature impact data response depends on the with_metadata parameter. The response is
either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys featureName, impactNormalized, and
impactUnnormalized, redundantWith and count.
For dict response available keys are:
featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
: keys: featureName, impactNormalized, and impactUnnormalized, and
redundantWith.
shapBased - A boolean that indicates whether Feature Impact was calculated using
: Shapley values.
ranRedundancyDetection - A boolean that indicates whether redundant feature
: identification was run while calculating this Feature Impact.
rowCount - An integer or None that indicates the number of rows that was used to
: calculate Feature Impact. For the Feature Impact calculated with the default
logic, without specifying the rowCount, we return None here.
count - An integer with the number of features under the featureImpacts.
Query the server to determine which features were used.
Note that the data returned by this method is possibly different
than the names of the features in the featurelist used by this model.
This method will return the raw features that must be supplied in order
for predictions to be generated on a new set of data. The featurelist,
in contrast, would also include the names of derived features.
Returns:features – The names of the features used in the model.
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels.
This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
Returns:
Labelwise ROC Curve instances for source and all labels
Retrieve the model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
Returns:
Model lift chart data
Return type:LiftChart
Raises:
ClientError – If the insight is not available for this model
Retrieve a report on missing training data that can be used to understand missing
values treatment in the model. The report consists of missing values resolutions for
features numeric or categorical features that were part of building the model.
Returns:
The queried model missing report, sorted by missing count (DESCENDING order).
Return type:An iterable of MissingReportPerFeature
For multiclass it’s possible to calculate feature impact separately for each target class.
The method for calculation is exactly the same, calculated in one-vs-all style for each
target class.
Returns:feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list),
‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’,
‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Retrieve model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Model lift chart data for each saved target class
Retrieve model Lift charts for the specified source.
Added in version v2.24.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model lift chart data for each saved target class
source (string) – The source Feature Effects are retrieved for.
max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:feature_effects – The Feature Effects data.
Check if this model can be approximated with DataRobot Prime
Returns:prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime
(key can_make_prime) and why it may be ineligible (key message)
Retrieve model residuals chart for the specified source.
Parameters:
source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible
values.
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if
the residuals chart is not available for this model and the model has a defined parent
model. If omitted or False, or there is no parent model, will not attempt to return
residuals data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_residuals_chart will raise a ValueError.
Returns:
Model residuals chart data
Return type:ResidualsChart
Raises:
ClientError – If the insight is not available for this model
Retrieve the ROC curve for a binary model for the specified source.
This method is valid only for binary projects. For multilabel projects, use
Model.get_labelwise_roc_curves.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_roc_curve will raise a ValueError.
Returns:
Model ROC curve data
Return type:RocCurve
Raises:
ClientError – If the insight is not available for this model
(New in version v3.0)TypeError – If the underlying project type is multilabel
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these
are rulesets approximating this model, not rulesets used to construct this model.
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
Returns:
supportsBlending (bool) – whether the model supports blending
supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
hasWordCloud (bool) – whether the model has word cloud data available
eligibleForPrime (bool) – (Deprecated in version v3.6)
whether the model is eligible for Prime
hasParameters (bool) – whether the model has parameters that can be retrieved
supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
supportsShap (bool) –
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
: feature Importance
* supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping
tree-based model and number of trained iterations can be retrieved.
Retrieve paginated model records, sorted by scores, with optional filtering.
Parameters:
sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
with_metric (str) – For a single-metric list of results, specify that project metric.
search_term (str) – If specified, only models containing the term in their name or processes are returned.
featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
families (List[str]) – If specified, only models belonging to selected families are returned.
blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned.
The following formats are supported for autoML and datetime partitioned projects:
number of rows in training subset
For datetime partitioned projects:
, example P6Y0M0D
-- Example: P6Y0M0D-78-Random,
(returns models trained on 6 years of data, sampling rate 78%, random sampling).
Start/end date
Project settings
number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After
comparing their scores and rule counts, the code used in the approximation can be downloaded
and run locally.
Request external test to compute scores and insights on an external test dataset
Parameters:
dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
Returns:job – a Job representing external dataset insights computation
row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – A Job representing the feature effect computation. To get the completed feature effect
data, use job.get_result or job.get_result_when_complete.
row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
features (list or None) – The list of features used to calculate Feature Effects.
Returns:job – A Job representing Feature Effect computation. To get the completed Feature Effect
data, use job.get_result or job.get_result_when_complete.
row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multiclass (which has a separate method), and time series
projects.
with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata.
If true, metadata is included.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact
data, use job.get_result or job.get_result_when_complete.
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an
error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
In addition of training_row_count and training_duration, frozen datetime models may be
trained on an exact date range. Only one of training_row_count, training_duration, or
training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can
be trained into the holdout data (once the holdout is unlocked).
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
training_duration may not be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, training_row_count may not be specified.
training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after
this datetime will be used. If training_start_date is specified, training_end_date
must also be specified.
training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before
this datetime will be used. If training_end_date is specified, training_start_date
must also be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
Returns:model_job – the modeling job training a frozen model
Train a new frozen model with parameters from this model
Notes
This method only works if project the model belongs to is not datetime
partitioned. If it is, use request_frozen_datetime_model instead.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
Parameters:
sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will
use the value from this model.
training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with
the model. Only one of sample_pct and training_row_count should be specified.
Returns:model_job – the modeling job training a frozen model
Requests predictions against a previously uploaded dataset.
Parameters:
dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframe (pd.DataFrame, optional) – (New in v3.0)
The dataframe to make predictions against
file_path (Optional[str]) – (New in v3.0)
Path to file to make predictions against
file (IOBase, optional) – (New in v3.0)
File to make predictions against
include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only.
Specifies whether prediction intervals should be calculated for this request. Defaults
to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only.
Represents the percentile to use for the size of the prediction intervals. Defaults to
80 if include_prediction_intervals is True. Prediction intervals size must be
between 1 and 100 (inclusive).
forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative
to which predictions will be generated, based on the forecast window of the project. See
the time series prediction documentation for more
information.
predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk
predictions. Note that this parameter is for generating historical predictions using the
training data. This parameter should be provided in conjunction with
predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk
predictions, exclusive. Note that this parameter is for generating historical
predictions using the training data. This parameter should be provided in conjunction
with predictions_start_date. Can’t be provided with the
forecast_point parameter.
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley
Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value,
greatest to least. If null, no limit. In the case of ‘shap’: if the number of features
is greater than the limit, the sum of remaining values will also be returned as
shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is
omitted.
max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that
should be returned. If set to all, text explanations will be computed and all the
ngram explanations will be returned. If set to a non zero positive integer value, text
explanations will be computed and this amount of descendingly sorted ngram explanations
will be returned. By default text explanation won’t be triggered to be computed.
data set definition to build predictions on.
Choices are:
dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
: models in datetime partitioned projects
dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
: all data except training set. Not valid for models in datetime partitioned
projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
: the predictions for all backtest validation folds. Requires the model to have
successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response
will include prediction explanations based on the SHAP explainer (SHapley Additive
exPlanations). Defaults to None (no prediction explanations).
max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should
be returned for each row, ordered by absolute value, greatest to least. In the case of
dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all
features. If the number of features is greater than the max_explanations, the sum of
remaining values will also be returned as shap_remaining_total. Max 100. Defaults to
null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100
columns. Is ignored if explanation_algorithm is not set.
Submit a job to the queue to train a blender model.
Parameters:
sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used
then training_row_count should not be given.
featurelist_id (Optional[str]) – The featurelist id
training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct
should not be given.
n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that do not determine
the number of clusters automatically.
Returns:job – The created job that is retraining the model
May not be used once prediction_threshold_read_only is True for this model.
Parameters:threshold (float) – only used for binary classification projects. The threshold to when deciding between
the positive and negative classes when making predictions. Should be between 0.0 and
1.0 (inclusive).
Submit a job to the queue to perform the first incremental learning iteration training on an existing
sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
Parameters:
early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first
iteration. If set to True, the training process will be performed only for the first
iteration. If set to False, training will continue until early stopping conditions
are met or the maximum number of iterations is reached. The default value is False.
Returns:job – The created job that is retraining the model
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to
the end of the queue for this project.
After the job has finished you can get the newly trained model by retrieving
it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to
use, but not both. If neither are specified, a default of the maximum amount of data that
can safely be used to train any blueprint without going into the validation data will be
selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms
of rows of the minority class.
Notes
For datetime partitioned projects, see train_datetime instead.
Parameters:
sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from
0 to 100.
featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the
featurelist of this model is used.
scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation
or dr.SCORING_TYPE.cross_validation). validation is available for every
partitioning type, and indicates that the default model validation should be
used for the project.
If the project uses a form of cross-validation partitioning,
crossValidation can also be used to indicate
that all of the available training/validation combinations
should be used to evaluate the model.
training_row_count (Optional[int]) – The number of rows to use to train the requested model.
monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns:model_job_id – id of created job, can be used as parameter to ModelJob.get
method or wait_for_async_model_creation function
featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this
model is used.
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
neither training_duration nor use_project_settings may be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, neither training_row_count nor use_project_settings may be
specified.
use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom
backtest partitioning settings specified by the user will be used to train the model and
evaluate backtest scores. If specified, neither training_row_count nor
training_duration may be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
Submit a job to the queue to perform incremental training on an existing model using
additional data. The id of the additional data to use for training is specified with the data_stage_id.
Optionally a name for the iteration can be supplied by the user to help identify the contents of data in
the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
Parameters:
data_stage_id (str) – The id of the data stage to use for training.
training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8).
Supported formats: UTF-8, ASCII, WINDOWS1252
data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None).
Supported formats: zip
Returns:job – The created job that is retraining the model
project_id (str) – the id of the project the model belongs to
processes (List[str]) – the processes used by the model
featurelist_name (str) – the name of the featurelist used by the model
featurelist_id (str) – the id of the featurelist used by the model
sample_pct (float) – the percentage of the project dataset used in training the model
training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime
partitioned project, if specified, defines the number of rows used to train the model and
evaluate backtest scores; if unspecified, either training_duration or
training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string
specifying the duration spanned by the data used to train the model and evaluate backtest
scores.
training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end
date of the data used to train the model.
model_type (str) – what model this is, e.g. ‘DataRobot Prime’
model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) – whether this model is a frozen model
blueprint_id (str) – the id of the blueprint used in this model
metrics (dict) – a mapping from each metric to the model’s scores for that metric
ruleset (Ruleset) – the ruleset used in the Prime model
parent_model_id (str) – the id of the model that this Prime model approximates
monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
is_starred (bool) – whether this model is marked as starred
prediction_threshold (float) – for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
supports_composable_ml (bool or None) – (New in version v2.26)
whether this model is supported in the Composable ML.
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Parameters:
params (dict) – Mapping of parameter ID to parameter value.
The list of valid parameter IDs for a model can be found by calling
get_advanced_tuning_parameters().
This endpoint does not need to include values for all parameters. If a parameter
is omitted, its current_value will be used.
description (str) – Human-readable string describing the newly advanced-tuned model
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Returns:
A dictionary describing the advanced-tuning parameters for the current model.
There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the
user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
* parameter_name : (str) name of the parameter (unique per task, see below)
* parameter_id : (str) opaque ID string uniquely identifying parameter
* default_value : (*) the actual value used to train the model; either
the single value of the parameter specified before training, or the best
value from the list of grid-searched values (based on current_value)
* current_value : (*) the single value or list of values of the
parameter that were grid searched. Depending on the grid search
specification, could be a single fixed value (no grid search),
a list of discrete values, or a range.
* task_name : (str) name of the task that this parameter belongs to
* constraints: (dict) see the notes below
* vertex_id: (str) ID of vertex that this parameter belongs to
* Return type:dict
Notes
The type of default_value and current_value is defined by the constraints structure.
It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys.
The presence of a key indicates that the parameter may take on the specified type.
(If a key is absent, this means that the parameter may not take on the specified type.)
If a key on constraints is present, its value will be a dict containing
all of the fields described below for that key.
select:
Rather than specifying a specific data type, if present, it indicates that the parameter
is permitted to take on any of the specified values. Listed values may be of any string
or real (non-complex) numeric type.
ascii:
The parameter may be a unicode object that encodes simple ASCII characters.
(A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed
constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode:
The parameter may be any Python unicode object.
int:
The value may be an object of type int within the specified range (inclusive).
Please note that the value will be passed around using the JSON format, and
some JSON parsers have undefined behavior with integers outside of the range
[-(2**53)+1, (2**53)-1].
float:
The value may be an object of type float within the specified range (inclusive).
intList, floatList:
The value may be a list of int or float objects, respectively, following constraints
as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple
keys, the parameter may take on any value permitted by any key.
Retrieve a list of all confusion matrices available for the model.
Parameters:fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent for any source that is not available for this model and if this
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
Returns:
Data for all available confusion charts for model.
Retrieve a list of all feature impact results available for the model.
Parameters:data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then no data_slice filtering will be applied when requesting the roc_curve.
Returns:
Data for all available model feature impacts. Or an empty list if not data found.
model=datarobot.Model(id='model-id',project_id='project-id')# Get feature impact insights for sliced datadata_slice=datarobot.DataSlice(id='data-slice-id')sliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get feature impact insights for unsliced datadata_slice=datarobot.DataSlice()unsliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get all feature impact insightsall_fi=model.get_all_feature_impacts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model lift charts. Or an empty list if no data found.
model=datarobot.Model.get('project-id','model-id')# Get lift chart insights for sliced datasliced_lift_charts=model.get_all_lift_charts(data_slice_id='data-slice-id')# Get lift chart insights for unsliced dataunsliced_lift_charts=model.get_all_lift_charts(unsliced_only=True)# Get all lift chart insightsall_lift_charts=model.get_all_lift_charts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Data for all available model lift charts.
Retrieve a list of all residuals charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent
for any source that is not available for this model and if this model has a
defined parent model. If omitted or False, or this model has no parent, this will
not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model residuals charts.
model=datarobot.Model.get('project-id','model-id')# Get residuals chart insights for sliced datasliced_residuals_charts=model.get_all_residuals_charts(data_slice_id='data-slice-id')# Get residuals chart insights for unsliced dataunsliced_residuals_charts=model.get_all_residuals_charts(unsliced_only=True)# Get all residuals chart insightsall_residuals_charts=model.get_all_residuals_charts()
Retrieve a list of all ROC curves available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on
data_slice_id.
Returns:
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
model=datarobot.Model.get('project-id','model-id')ds_filter=DataSlice(id='data-slice-id')# Get roc curve insights for sliced datasliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get roc curve insights for unsliced datadata_slice_filter=DataSlice(id=None)unsliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get all roc curve insightsall_roc_curves=model.get_all_roc_curves()
Retrieve a multiclass model’s confusion matrix for the specified source.
Parameters:
source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent if the confusion chart is not available for this model and the
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model ConfusionChart data
Return type:ConfusionChart
Raises:ClientError – If the insight is not available for this model
Return a dictionary, keyed by metric, showing cross validation
scores per partition.
Cross Validation should already have been performed using
cross_validate or
train.
Notes
Models that computed cross validation before this feature was added will need
to be deleted and retrained before this method can be used.
Parameters:
partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by
can be a whole number positive integer or float value. 0 corresponds to the
validation partition.
metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns:cross_validation_scores – A dictionary keyed by metric showing cross validation scores per
partition.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older
projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions
(i.e., no out-of-sample predictions in those partitions),
Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for
the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve the computed Feature Impact results, a measure of the relevance of each
feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly
permuted (but the others left unchanged), and seeing how the error metric score for the
predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score
is when making predictions on this modified data. The ‘impactNormalized’ is normalized so
that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t
contribute much in addition, the ‘redundantWith’ value is the name of feature that has the
highest correlation with this feature. Note that redundancy detection is only available for
jobs run after the addition of this feature. When retrieving data that predates this
functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_feature_impact will raise a ValueError.
Returns:
The feature impact data response depends on the with_metadata parameter. The response is
either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys featureName, impactNormalized, and
impactUnnormalized, redundantWith and count.
For dict response available keys are:
featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
: keys: featureName, impactNormalized, and impactUnnormalized, and
redundantWith.
shapBased - A boolean that indicates whether Feature Impact was calculated using
: Shapley values.
ranRedundancyDetection - A boolean that indicates whether redundant feature
: identification was run while calculating this Feature Impact.
rowCount - An integer or None that indicates the number of rows that was used to
: calculate Feature Impact. For the Feature Impact calculated with the default
logic, without specifying the rowCount, we return None here.
count - An integer with the number of features under the featureImpacts.
Query the server to determine which features were used.
Note that the data returned by this method is possibly different
than the names of the features in the featurelist used by this model.
This method will return the raw features that must be supplied in order
for predictions to be generated on a new set of data. The featurelist,
in contrast, would also include the names of derived features.
Returns:features – The names of the features used in the model.
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels.
This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
Returns:
Labelwise ROC Curve instances for source and all labels
Retrieve the model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
Returns:
Model lift chart data
Return type:LiftChart
Raises:
ClientError – If the insight is not available for this model
Retrieve a report on missing training data that can be used to understand missing
values treatment in the model. The report consists of missing values resolutions for
features numeric or categorical features that were part of building the model.
Returns:
The queried model missing report, sorted by missing count (DESCENDING order).
Return type:An iterable of MissingReportPerFeature
For multiclass it’s possible to calculate feature impact separately for each target class.
The method for calculation is exactly the same, calculated in one-vs-all style for each
target class.
Returns:feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list),
‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’,
‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Retrieve model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Model lift chart data for each saved target class
Retrieve model Lift charts for the specified source.
Added in version v2.24.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model lift chart data for each saved target class
source (string) – The source Feature Effects are retrieved for.
max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:feature_effects – The Feature Effects data.
Check if this model can be approximated with DataRobot Prime
Returns:prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime
(key can_make_prime) and why it may be ineligible (key message)
Retrieve model residuals chart for the specified source.
Parameters:
source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible
values.
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if
the residuals chart is not available for this model and the model has a defined parent
model. If omitted or False, or there is no parent model, will not attempt to return
residuals data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_residuals_chart will raise a ValueError.
Returns:
Model residuals chart data
Return type:ResidualsChart
Raises:
ClientError – If the insight is not available for this model
Retrieve the ROC curve for a binary model for the specified source.
This method is valid only for binary projects. For multilabel projects, use
Model.get_labelwise_roc_curves.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_roc_curve will raise a ValueError.
Returns:
Model ROC curve data
Return type:RocCurve
Raises:
ClientError – If the insight is not available for this model
(New in version v3.0)TypeError – If the underlying project type is multilabel
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these
are rulesets approximating this model, not rulesets used to construct this model.
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
Returns:
supportsBlending (bool) – whether the model supports blending
supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
hasWordCloud (bool) – whether the model has word cloud data available
eligibleForPrime (bool) – (Deprecated in version v3.6)
whether the model is eligible for Prime
hasParameters (bool) – whether the model has parameters that can be retrieved
supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
supportsShap (bool) –
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
: feature Importance
* supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping
tree-based model and number of trained iterations can be retrieved.
Retrieve paginated model records, sorted by scores, with optional filtering.
Parameters:
sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
with_metric (str) – For a single-metric list of results, specify that project metric.
search_term (str) – If specified, only models containing the term in their name or processes are returned.
featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
families (List[str]) – If specified, only models belonging to selected families are returned.
blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned.
The following formats are supported for autoML and datetime partitioned projects:
number of rows in training subset
For datetime partitioned projects:
, example P6Y0M0D
-- Example: P6Y0M0D-78-Random,
(returns models trained on 6 years of data, sampling rate 78%, random sampling).
Start/end date
Project settings
number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
Request external test to compute scores and insights on an external test dataset
Parameters:
dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
Returns:job – a Job representing external dataset insights computation
row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – A Job representing the feature effect computation. To get the completed feature effect
data, use job.get_result or job.get_result_when_complete.
row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
features (list or None) – The list of features used to calculate Feature Effects.
Returns:job – A Job representing Feature Effect computation. To get the completed Feature Effect
data, use job.get_result or job.get_result_when_complete.
row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multiclass (which has a separate method), and time series
projects.
with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata.
If true, metadata is included.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact
data, use job.get_result or job.get_result_when_complete.
Requests predictions against a previously uploaded dataset.
Parameters:
dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframe (pd.DataFrame, optional) – (New in v3.0)
The dataframe to make predictions against
file_path (Optional[str]) – (New in v3.0)
Path to file to make predictions against
file (IOBase, optional) – (New in v3.0)
File to make predictions against
include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only.
Specifies whether prediction intervals should be calculated for this request. Defaults
to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only.
Represents the percentile to use for the size of the prediction intervals. Defaults to
80 if include_prediction_intervals is True. Prediction intervals size must be
between 1 and 100 (inclusive).
forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative
to which predictions will be generated, based on the forecast window of the project. See
the time series prediction documentation for more
information.
predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk
predictions. Note that this parameter is for generating historical predictions using the
training data. This parameter should be provided in conjunction with
predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk
predictions, exclusive. Note that this parameter is for generating historical
predictions using the training data. This parameter should be provided in conjunction
with predictions_start_date. Can’t be provided with the
forecast_point parameter.
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley
Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value,
greatest to least. If null, no limit. In the case of ‘shap’: if the number of features
is greater than the limit, the sum of remaining values will also be returned as
shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is
omitted.
max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that
should be returned. If set to all, text explanations will be computed and all the
ngram explanations will be returned. If set to a non zero positive integer value, text
explanations will be computed and this amount of descendingly sorted ngram explanations
will be returned. By default text explanation won’t be triggered to be computed.
data set definition to build predictions on.
Choices are:
dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
: models in datetime partitioned projects
dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
: all data except training set. Not valid for models in datetime partitioned
projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
: the predictions for all backtest validation folds. Requires the model to have
successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response
will include prediction explanations based on the SHAP explainer (SHapley Additive
exPlanations). Defaults to None (no prediction explanations).
max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should
be returned for each row, ordered by absolute value, greatest to least. In the case of
dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all
features. If the number of features is greater than the max_explanations, the sum of
remaining values will also be returned as shap_remaining_total. Max 100. Defaults to
null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100
columns. Is ignored if explanation_algorithm is not set.
Submit a job to the queue to train a blender model.
Parameters:
sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used
then training_row_count should not be given.
featurelist_id (Optional[str]) – The featurelist id
training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct
should not be given.
n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that do not determine
the number of clusters automatically.
Returns:job – The created job that is retraining the model
May not be used once prediction_threshold_read_only is True for this model.
Parameters:threshold (float) – only used for binary classification projects. The threshold to when deciding between
the positive and negative classes when making predictions. Should be between 0.0 and
1.0 (inclusive).
Submit a job to the queue to perform the first incremental learning iteration training on an existing
sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
Parameters:
early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first
iteration. If set to True, the training process will be performed only for the first
iteration. If set to False, training will continue until early stopping conditions
are met or the maximum number of iterations is reached. The default value is False.
Returns:job – The created job that is retraining the model
Submit a job to the queue to perform incremental training on an existing model using
additional data. The id of the additional data to use for training is specified with the data_stage_id.
Optionally a name for the iteration can be supplied by the user to help identify the contents of data in
the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
Parameters:
data_stage_id (str) – The id of the data stage to use for training.
training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8).
Supported formats: UTF-8, ASCII, WINDOWS1252
data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None).
Supported formats: zip
Returns:job – The created job that is retraining the model
project_id (str) – the id of the project the model belongs to
processes (List[str]) – the processes used by the model
featurelist_name (str) – the name of the featurelist used by the model
featurelist_id (str) – the id of the featurelist used by the model
sample_pct (float) – the percentage of the project dataset used in training the model
training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime
partitioned project, if specified, defines the number of rows used to train the model and
evaluate backtest scores; if unspecified, either training_duration or
training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string
specifying the duration spanned by the data used to train the model and evaluate backtest
scores.
training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end
date of the data used to train the model.
model_type (str) – what model this is, e.g. ‘DataRobot Prime’
model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) – whether this model is a frozen model
blueprint_id (str) – the id of the blueprint used in this model
metrics (dict) – a mapping from each metric to the model’s scores for that metric
model_ids (List[str]) – List of model ids used in blender
blender_method (str) – Method used to blend results from underlying models
monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
is_starred (bool) – whether this model marked as starred
prediction_threshold (float) – for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
model_number (integer) – model number assigned to a model
parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from
supports_composable_ml (bool or None) – (New in version v2.26)
whether this model is supported in the Composable ML.
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Parameters:
params (dict) – Mapping of parameter ID to parameter value.
The list of valid parameter IDs for a model can be found by calling
get_advanced_tuning_parameters().
This endpoint does not need to include values for all parameters. If a parameter
is omitted, its current_value will be used.
description (str) – Human-readable string describing the newly advanced-tuned model
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Returns:
A dictionary describing the advanced-tuning parameters for the current model.
There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the
user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
* parameter_name : (str) name of the parameter (unique per task, see below)
* parameter_id : (str) opaque ID string uniquely identifying parameter
* default_value : (*) the actual value used to train the model; either
the single value of the parameter specified before training, or the best
value from the list of grid-searched values (based on current_value)
* current_value : (*) the single value or list of values of the
parameter that were grid searched. Depending on the grid search
specification, could be a single fixed value (no grid search),
a list of discrete values, or a range.
* task_name : (str) name of the task that this parameter belongs to
* constraints: (dict) see the notes below
* vertex_id: (str) ID of vertex that this parameter belongs to
* Return type:dict
Notes
The type of default_value and current_value is defined by the constraints structure.
It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys.
The presence of a key indicates that the parameter may take on the specified type.
(If a key is absent, this means that the parameter may not take on the specified type.)
If a key on constraints is present, its value will be a dict containing
all of the fields described below for that key.
select:
Rather than specifying a specific data type, if present, it indicates that the parameter
is permitted to take on any of the specified values. Listed values may be of any string
or real (non-complex) numeric type.
ascii:
The parameter may be a unicode object that encodes simple ASCII characters.
(A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed
constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode:
The parameter may be any Python unicode object.
int:
The value may be an object of type int within the specified range (inclusive).
Please note that the value will be passed around using the JSON format, and
some JSON parsers have undefined behavior with integers outside of the range
[-(2**53)+1, (2**53)-1].
float:
The value may be an object of type float within the specified range (inclusive).
intList, floatList:
The value may be a list of int or float objects, respectively, following constraints
as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple
keys, the parameter may take on any value permitted by any key.
Retrieve a list of all confusion matrices available for the model.
Parameters:fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent for any source that is not available for this model and if this
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
Returns:
Data for all available confusion charts for model.
Retrieve a list of all feature impact results available for the model.
Parameters:data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then no data_slice filtering will be applied when requesting the roc_curve.
Returns:
Data for all available model feature impacts. Or an empty list if not data found.
model=datarobot.Model(id='model-id',project_id='project-id')# Get feature impact insights for sliced datadata_slice=datarobot.DataSlice(id='data-slice-id')sliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get feature impact insights for unsliced datadata_slice=datarobot.DataSlice()unsliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get all feature impact insightsall_fi=model.get_all_feature_impacts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model lift charts. Or an empty list if no data found.
model=datarobot.Model.get('project-id','model-id')# Get lift chart insights for sliced datasliced_lift_charts=model.get_all_lift_charts(data_slice_id='data-slice-id')# Get lift chart insights for unsliced dataunsliced_lift_charts=model.get_all_lift_charts(unsliced_only=True)# Get all lift chart insightsall_lift_charts=model.get_all_lift_charts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Data for all available model lift charts.
Retrieve a list of all residuals charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent
for any source that is not available for this model and if this model has a
defined parent model. If omitted or False, or this model has no parent, this will
not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model residuals charts.
model=datarobot.Model.get('project-id','model-id')# Get residuals chart insights for sliced datasliced_residuals_charts=model.get_all_residuals_charts(data_slice_id='data-slice-id')# Get residuals chart insights for unsliced dataunsliced_residuals_charts=model.get_all_residuals_charts(unsliced_only=True)# Get all residuals chart insightsall_residuals_charts=model.get_all_residuals_charts()
Retrieve a list of all ROC curves available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on
data_slice_id.
Returns:
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
model=datarobot.Model.get('project-id','model-id')ds_filter=DataSlice(id='data-slice-id')# Get roc curve insights for sliced datasliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get roc curve insights for unsliced datadata_slice_filter=DataSlice(id=None)unsliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get all roc curve insightsall_roc_curves=model.get_all_roc_curves()
Retrieve a multiclass model’s confusion matrix for the specified source.
Parameters:
source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent if the confusion chart is not available for this model and the
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model ConfusionChart data
Return type:ConfusionChart
Raises:ClientError – If the insight is not available for this model
Return a dictionary, keyed by metric, showing cross validation
scores per partition.
Cross Validation should already have been performed using
cross_validate or
train.
Notes
Models that computed cross validation before this feature was added will need
to be deleted and retrained before this method can be used.
Parameters:
partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by
can be a whole number positive integer or float value. 0 corresponds to the
validation partition.
metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns:cross_validation_scores – A dictionary keyed by metric showing cross validation scores per
partition.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older
projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions
(i.e., no out-of-sample predictions in those partitions),
Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for
the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve the computed Feature Impact results, a measure of the relevance of each
feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly
permuted (but the others left unchanged), and seeing how the error metric score for the
predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score
is when making predictions on this modified data. The ‘impactNormalized’ is normalized so
that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t
contribute much in addition, the ‘redundantWith’ value is the name of feature that has the
highest correlation with this feature. Note that redundancy detection is only available for
jobs run after the addition of this feature. When retrieving data that predates this
functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_feature_impact will raise a ValueError.
Returns:
The feature impact data response depends on the with_metadata parameter. The response is
either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys featureName, impactNormalized, and
impactUnnormalized, redundantWith and count.
For dict response available keys are:
featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
: keys: featureName, impactNormalized, and impactUnnormalized, and
redundantWith.
shapBased - A boolean that indicates whether Feature Impact was calculated using
: Shapley values.
ranRedundancyDetection - A boolean that indicates whether redundant feature
: identification was run while calculating this Feature Impact.
rowCount - An integer or None that indicates the number of rows that was used to
: calculate Feature Impact. For the Feature Impact calculated with the default
logic, without specifying the rowCount, we return None here.
count - An integer with the number of features under the featureImpacts.
Query the server to determine which features were used.
Note that the data returned by this method is possibly different
than the names of the features in the featurelist used by this model.
This method will return the raw features that must be supplied in order
for predictions to be generated on a new set of data. The featurelist,
in contrast, would also include the names of derived features.
Returns:features – The names of the features used in the model.
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels.
This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
Returns:
Labelwise ROC Curve instances for source and all labels
Retrieve the model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
Returns:
Model lift chart data
Return type:LiftChart
Raises:
ClientError – If the insight is not available for this model
Retrieve a report on missing training data that can be used to understand missing
values treatment in the model. The report consists of missing values resolutions for
features numeric or categorical features that were part of building the model.
Returns:
The queried model missing report, sorted by missing count (DESCENDING order).
Return type:An iterable of MissingReportPerFeature
For multiclass it’s possible to calculate feature impact separately for each target class.
The method for calculation is exactly the same, calculated in one-vs-all style for each
target class.
Returns:feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list),
‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’,
‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Retrieve model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Model lift chart data for each saved target class
Retrieve model Lift charts for the specified source.
Added in version v2.24.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model lift chart data for each saved target class
source (string) – The source Feature Effects are retrieved for.
max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:feature_effects – The Feature Effects data.
Check if this model can be approximated with DataRobot Prime
Returns:prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime
(key can_make_prime) and why it may be ineligible (key message)
Retrieve model residuals chart for the specified source.
Parameters:
source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible
values.
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if
the residuals chart is not available for this model and the model has a defined parent
model. If omitted or False, or there is no parent model, will not attempt to return
residuals data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_residuals_chart will raise a ValueError.
Returns:
Model residuals chart data
Return type:ResidualsChart
Raises:
ClientError – If the insight is not available for this model
Retrieve the ROC curve for a binary model for the specified source.
This method is valid only for binary projects. For multilabel projects, use
Model.get_labelwise_roc_curves.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_roc_curve will raise a ValueError.
Returns:
Model ROC curve data
Return type:RocCurve
Raises:
ClientError – If the insight is not available for this model
(New in version v3.0)TypeError – If the underlying project type is multilabel
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these
are rulesets approximating this model, not rulesets used to construct this model.
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
Returns:
supportsBlending (bool) – whether the model supports blending
supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
hasWordCloud (bool) – whether the model has word cloud data available
eligibleForPrime (bool) – (Deprecated in version v3.6)
whether the model is eligible for Prime
hasParameters (bool) – whether the model has parameters that can be retrieved
supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
supportsShap (bool) –
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
: feature Importance
* supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping
tree-based model and number of trained iterations can be retrieved.
Retrieve paginated model records, sorted by scores, with optional filtering.
Parameters:
sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
with_metric (str) – For a single-metric list of results, specify that project metric.
search_term (str) – If specified, only models containing the term in their name or processes are returned.
featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
families (List[str]) – If specified, only models belonging to selected families are returned.
blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned.
The following formats are supported for autoML and datetime partitioned projects:
number of rows in training subset
For datetime partitioned projects:
, example P6Y0M0D
-- Example: P6Y0M0D-78-Random,
(returns models trained on 6 years of data, sampling rate 78%, random sampling).
Start/end date
Project settings
number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After
comparing their scores and rule counts, the code used in the approximation can be downloaded
and run locally.
Request external test to compute scores and insights on an external test dataset
Parameters:
dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
Returns:job – a Job representing external dataset insights computation
row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – A Job representing the feature effect computation. To get the completed feature effect
data, use job.get_result or job.get_result_when_complete.
row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
features (list or None) – The list of features used to calculate Feature Effects.
Returns:job – A Job representing Feature Effect computation. To get the completed Feature Effect
data, use job.get_result or job.get_result_when_complete.
row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multiclass (which has a separate method), and time series
projects.
with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata.
If true, metadata is included.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact
data, use job.get_result or job.get_result_when_complete.
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an
error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
In addition of training_row_count and training_duration, frozen datetime models may be
trained on an exact date range. Only one of training_row_count, training_duration, or
training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can
be trained into the holdout data (once the holdout is unlocked).
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
training_duration may not be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, training_row_count may not be specified.
training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after
this datetime will be used. If training_start_date is specified, training_end_date
must also be specified.
training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before
this datetime will be used. If training_end_date is specified, training_start_date
must also be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
Returns:model_job – the modeling job training a frozen model
Train a new frozen model with parameters from this model
Notes
This method only works if project the model belongs to is not datetime
partitioned. If it is, use request_frozen_datetime_model instead.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
Parameters:
sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will
use the value from this model.
training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with
the model. Only one of sample_pct and training_row_count should be specified.
Returns:model_job – the modeling job training a frozen model
Requests predictions against a previously uploaded dataset.
Parameters:
dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframe (pd.DataFrame, optional) – (New in v3.0)
The dataframe to make predictions against
file_path (Optional[str]) – (New in v3.0)
Path to file to make predictions against
file (IOBase, optional) – (New in v3.0)
File to make predictions against
include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only.
Specifies whether prediction intervals should be calculated for this request. Defaults
to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only.
Represents the percentile to use for the size of the prediction intervals. Defaults to
80 if include_prediction_intervals is True. Prediction intervals size must be
between 1 and 100 (inclusive).
forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative
to which predictions will be generated, based on the forecast window of the project. See
the time series prediction documentation for more
information.
predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk
predictions. Note that this parameter is for generating historical predictions using the
training data. This parameter should be provided in conjunction with
predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk
predictions, exclusive. Note that this parameter is for generating historical
predictions using the training data. This parameter should be provided in conjunction
with predictions_start_date. Can’t be provided with the
forecast_point parameter.
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley
Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value,
greatest to least. If null, no limit. In the case of ‘shap’: if the number of features
is greater than the limit, the sum of remaining values will also be returned as
shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is
omitted.
max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that
should be returned. If set to all, text explanations will be computed and all the
ngram explanations will be returned. If set to a non zero positive integer value, text
explanations will be computed and this amount of descendingly sorted ngram explanations
will be returned. By default text explanation won’t be triggered to be computed.
data set definition to build predictions on.
Choices are:
dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
: models in datetime partitioned projects
dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
: all data except training set. Not valid for models in datetime partitioned
projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
: the predictions for all backtest validation folds. Requires the model to have
successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response
will include prediction explanations based on the SHAP explainer (SHapley Additive
exPlanations). Defaults to None (no prediction explanations).
max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should
be returned for each row, ordered by absolute value, greatest to least. In the case of
dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all
features. If the number of features is greater than the max_explanations, the sum of
remaining values will also be returned as shap_remaining_total. Max 100. Defaults to
null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100
columns. Is ignored if explanation_algorithm is not set.
Submit a job to the queue to train a blender model.
Parameters:
sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used
then training_row_count should not be given.
featurelist_id (Optional[str]) – The featurelist id
training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct
should not be given.
n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that do not determine
the number of clusters automatically.
Returns:job – The created job that is retraining the model
May not be used once prediction_threshold_read_only is True for this model.
Parameters:threshold (float) – only used for binary classification projects. The threshold to when deciding between
the positive and negative classes when making predictions. Should be between 0.0 and
1.0 (inclusive).
Submit a job to the queue to perform the first incremental learning iteration training on an existing
sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
Parameters:
early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first
iteration. If set to True, the training process will be performed only for the first
iteration. If set to False, training will continue until early stopping conditions
are met or the maximum number of iterations is reached. The default value is False.
Returns:job – The created job that is retraining the model
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to
the end of the queue for this project.
After the job has finished you can get the newly trained model by retrieving
it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to
use, but not both. If neither are specified, a default of the maximum amount of data that
can safely be used to train any blueprint without going into the validation data will be
selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms
of rows of the minority class.
Notes
For datetime partitioned projects, see train_datetime instead.
Parameters:
sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from
0 to 100.
featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the
featurelist of this model is used.
scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation
or dr.SCORING_TYPE.cross_validation). validation is available for every
partitioning type, and indicates that the default model validation should be
used for the project.
If the project uses a form of cross-validation partitioning,
crossValidation can also be used to indicate
that all of the available training/validation combinations
should be used to evaluate the model.
training_row_count (Optional[int]) – The number of rows to use to train the requested model.
monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns:model_job_id – id of created job, can be used as parameter to ModelJob.get
method or wait_for_async_model_creation function
featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this
model is used.
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
neither training_duration nor use_project_settings may be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, neither training_row_count nor use_project_settings may be
specified.
use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom
backtest partitioning settings specified by the user will be used to train the model and
evaluate backtest scores. If specified, neither training_row_count nor
training_duration may be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
Submit a job to the queue to perform incremental training on an existing model using
additional data. The id of the additional data to use for training is specified with the data_stage_id.
Optionally a name for the iteration can be supplied by the user to help identify the contents of data in
the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
Parameters:
data_stage_id (str) – The id of the data stage to use for training.
training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8).
Supported formats: UTF-8, ASCII, WINDOWS1252
data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None).
Supported formats: zip
Returns:job – The created job that is retraining the model
Note that only one of training_row_count, training_duration, and
training_start_date and training_end_date will be specified, depending on the
data_selection_method of the model. Whichever method was selected determines the amount of
data used to train on when making predictions and scoring the backtests and the holdout.
Variables:
id (str) – the id of the model
project_id (str) – the id of the project the model belongs to
processes (List[str]) – the processes used by the model
featurelist_name (str) – the name of the featurelist used by the model
featurelist_id (str) – the id of the featurelist used by the model
sample_pct (float) – the percentage of the project dataset used in training the model
training_row_count (int or None) – If specified, an int specifying the number of rows used to train the model and evaluate
backtest scores.
training_duration (str or None) – If specified, a duration string specifying the duration spanned by the data used to train
the model and evaluate backtest scores.
training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end
date of the data used to train the model.
time_window_sample_pct (int or None) – An integer between 1 and 99 indicating the percentage of sampling within the training
window. The points kept are determined by a random uniform sample. If not specified, no
sampling was done.
sampling_method (str or None) – (New in v2.23) indicates the way training data has been selected (either how rows have been
selected within backtest or how time_window_sample_pct has been applied).
model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models,
and ‘model’ for other models
is_frozen (bool) – whether this model is a frozen model
blueprint_id (str) – the id of the blueprint used in this model
metrics (dict) – a mapping from each metric to the model’s scores for that metric. The keys in metrics are
the different metrics used to evaluate the model, and the values are the results. The
dictionaries inside of metrics will be as described here: ‘validation’, the score
for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for
all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a
list of scores for all backtests where the score is None if that backtest does not have a
score available; and ‘holdout’, the score for the holdout or None if the holdout is locked
or the score is unavailable.
backtests (list of dict) – describes what data was used to fit each backtest, the score for the project metric, and
why the backtest score is unavailable if it is not provided.
data_selection_method (str) – which of training_row_count, training_duration, or training_start_data and training_end_date
were used to determine the data used to fit the model. One of ‘rowCount’,
‘duration’, or ‘selectedDateRange’.
training_info (dict) – describes which data was used to train on when scoring the holdout and making predictions.
training_info` will have the following keys: holdout_training_start_date,
holdout_training_duration, holdout_training_row_count, holdout_training_end_date,
prediction_training_start_date, prediction_training_duration,
prediction_training_row_count, prediction_training_end_date. Start and end dates will
be datetimes, durations will be duration strings, and rows will be integers.
holdout_score (float or None) – the score against the holdout, if available and the holdout is unlocked, according to the
project metric.
holdout_status (string or None) – the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”.
Unavailable if the holdout fold was disabled in the partitioning configuration.
monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
is_starred (bool) – whether this model marked as starred
prediction_threshold (float) – for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
effective_feature_derivation_window_start (int or None) – (New in v2.16) For time series projects only.
How many units of the windows_basis_unit into the past relative to the forecast point
the user needs to provide history for at prediction time. This can differ from the
feature_derivation_window_start set on the project due to the differencing method and
period selected, or if the model is a time series native model such as ARIMA. Will be a
negative integer in time series projects and None otherwise.
effective_feature_derivation_window_end (int or None) – (New in v2.16) For time series projects only.
How many units of the windows_basis_unit into the past relative to the forecast point
the feature derivation window should end. Will be a non-positive integer in time series
projects and None otherwise.
forecast_window_start (int or None) – (New in v2.16) For time series projects only.
How many units of the windows_basis_unit into the future relative to the forecast point
the forecast window should start. Note that this field will be the same as what is shown in
the project settings. Will be a non-negative integer in time series projects and None
otherwise.
forecast_window_end (int or None) – (New in v2.16) For time series projects only.
How many units of the windows_basis_unit into the future relative to the forecast point
the forecast window should end. Note that this field will be the same as what is shown in
the project settings. Will be a non-negative integer in time series projects and None
otherwise.
windows_basis_unit (str or None) – (New in v2.16) For time series projects only.
Indicates which unit is the basis for the feature derivation window and the forecast window.
Note that this field will be the same as what is shown in the project settings. In time
series projects, will be either the detected time unit or “ROW”, and None otherwise.
model_number (integer) – model number assigned to a model
parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from
supports_composable_ml (bool or None) – (New in version v2.26)
whether this model is supported in the Composable ML.
is_n_clusters_dynamically_determined (Optional[bool]) – (New in version 2.27) if True, indicates that model determines number of clusters
automatically.
n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
data set definition to build predictions on.
Choices are:
dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
: backtest validation folds. Requires the model to have successfully scored all
backtests.
Retrain an existing datetime model using a new training period for the model’s training
set (with optional time window sampling) or a different feature list.
featurelist_id (Optional[str]) – The ID of the featurelist to use.
training_row_count (Optional[int]) – The number of rows to train the model on. If this parameter is used then sample_pct
cannot be specified.
time_window_sample_pct (Optional[int]) – An int between 1 and 99 indicating the percentage of
sampling within the time window. The points kept are determined by a random uniform
sample. If specified, training_row_count must not be specified and either
training_duration or training_start_date and training_end_date must be specified.
training_duration (Optional[str]) – A duration string representing the training duration for the submitted model. If
specified then training_row_count, training_start_date, and training_end_date
cannot be specified.
training_start_date (Optional[str]) – A datetime string representing the start date of
the data to use for training this model. If specified, training_end_date must also be
specified, and training_duration cannot be specified. The value must be before the
training_end_date value.
training_end_date (Optional[str]) – A datetime string representing the end date of the
data to use for training this model. If specified, training_start_date must also be
specified, and training_duration cannot be specified. The value must be after the
training_start_date value.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
Returns:job – The created job that is retraining the model
Retrieve Feature Effect metadata for each backtest. Response contains status and available
sources for each backtest of the model.
Each backtest is available for training and validation
If holdout is configured for the project it has holdout as backtestIndex. It has
training and holdout sources available.
Start/stop models contain a single response item with startstop value for backtestIndex.
Feature Effect of training is always available
(except for the old project which supports only Feature Effect for validation).
When a model is trained into validation or holdout without stacked prediction
(e.g. no out-of-sample prediction in validation or holdout),
Feature Effect is not available for validation or holdout.
Feature Effect for holdout is not available when there is no holdout configured for
the project.
source is expected parameter to retrieve Feature Effect. One of provided sources
shall be used.
backtestIndex is expected parameter to submit compute request and retrieve Feature Effect.
One of provided backtest indexes shall be used.
Parameters:backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns:job – A Job representing the feature effect computation. To get the completed feature effect
data, use job.get_result or job.get_result_when_complete.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
source (string) – The source Feature Effects are retrieved for.
One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available
sources for feature effect.
backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns:feature_effects – The feature effects data.
max_wait (Optional[int]) – The maximum time to wait for a requested feature effect job to complete before erroring
source (string) – The source Feature Effects are retrieved for.
One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources
for feature effect.
backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.
Returns:feature_effects – The feature effects data.
backtest_index (str) – The backtest index to use for Feature Effects calculation.
row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
features (list or None) – The list of features to use to calculate Feature Effects.
Returns:job – A Job representing Feature Effects computation. To get the completed Feature Effect
data, use job.get_result or job.get_result_when_complete.
Retrieve Feature Effects for the multiclass datetime model.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Calculate prediction intervals for this DatetimeModel for the specified size.
Added in version v2.19.
Parameters:prediction_intervals_size (int) – The prediction interval’s size to calculate for this model. See the
prediction intervals documentation for more information.
Returns:job – a Job tracking the prediction intervals computation
Computes datetime trend plots
(Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model
Added in version v2.25.
Parameters:
backtest (int or string, optional) – Compute plots for a specific backtest (use the backtest index starting from zero).
To compute plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to compute.
If not specified, the first forecast distance for this project will be used.
Only for time series supervised models
forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to compute.
If not specified, the last forecast distance for this project will be used.
Only for time series supervised models
Returns:job – a Job tracking the datetime trend plots computation
Retrieve Accuracy over Time plots metadata for this model.
Added in version v2.25.
Parameters:forecast_distance (Optional[int]) – Forecast distance to retrieve the metadata for.
If not specified, the first forecast distance for this project will be used.
Only available for time series projects.
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for.
If not specified, the first forecast distance for this project will be used.
Only available for time series projects.
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
resolution (string, optional) – Specifying at which resolution the data should be binned.
If not provided an optimal resolution will be used to
build chart data with number of bins <= max_bin_size.
One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies
the maximum number of bins for the retrieval. Default is 500.
start_date (datetime.datetime, optional) – The start of the date range to return.
If not specified, start date for requested plot will be used.
end_date (datetime.datetime, optional) – The end of the date range to return.
If not specified, end date for requested plot will be used.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
Retrieve Accuracy over Time preview plots for this model.
Added in version v2.25.
Parameters:
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for.
If not specified, the first forecast distance for this project will be used.
Only available for time series projects.
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to retrieve.
If not specified, the first forecast distance for this project will be used.
forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to retrieve.
If not specified, the last forecast distance for this project will be used.
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
resolution (string, optional) – Specifying at which resolution the data should be binned.
If not provided an optimal resolution will be used to
build chart data with number of bins <= max_bin_size.
One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies
the maximum number of bins for the retrieval. Default is 500.
start_date (datetime.datetime, optional) – The start of the date range to return.
If not specified, start date for requested plot will be used.
end_date (datetime.datetime, optional) – The end of the date range to return.
If not specified, end date for requested plot will be used.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
importdatarobotasdrimportpandasaspdimportmatplotlib.pyplotaspltmodel=dr.DatetimeModel(project_id=project_id,id=model_id)plot=model.get_forecast_vs_actual_plot()df=pd.DataFrame.from_dict(plot.bins)# As an example, get the forecasts for the 10th pointforecast_point_index=10# Pad the forecasts for plotting. The forecasts length must match the df lengthforecasts=[None]*forecast_point_index+df.forecasts[forecast_point_index]forecasts=forecasts+[None]*(len(df)-len(forecasts))plt.plot(df.start_date,df.actual,label="Actual")plt.plot(df.start_date,forecasts,label="Forecast")forecast_point=df.start_date[forecast_point_index]plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point))plt.legend()plt.savefig("forecast_vs_actual.png")
Retrieve Forecast vs Actual preview plots for this model.
Added in version v2.25.
Parameters:
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
resolution (string, optional) – Specifying at which resolution the data should be binned.
If not provided an optimal resolution will be used to
build chart data with number of bins <= max_bin_size.
One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies
the maximum number of bins for the retrieval. Default is 500.
start_date (datetime.datetime, optional) – The start of the date range to return.
If not specified, start date for requested plot will be used.
end_date (datetime.datetime, optional) – The end of the date range to return.
If not specified, end date for requested plot will be used.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
Retrieve Anomaly over Time preview plots for this model.
Added in version v2.25.
Parameters:
prediction_threshold (Optional[float]) – Only bins with predictions exceeding this threshold will be returned in the response.
backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero).
To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
source (string, optional) – The source of the data for the backtest/holdout.
Attribute must be one of dr.enums.SOURCE_TYPE
series_id (string, optional) – The name of the series to retrieve for multiseries projects.
If not provided an average plot for the first 1000 series will be retrieved.
max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots.
Default is dr.enums.DEFAULT_MAX_WAIT.
If 0 or None, the plots would be retrieved without attempting the computation.
Initialize the anomaly assessment insight and calculate
Shapley explanations for the most anomalous points in the subset.
The insight is available for anomaly detection models in time series unsupervised projects
which also support calculation of Shapley values.
Parameters:
backtest (int starting with 0 or "holdout") – The backtest to compute insight for.
source ("training" or "validation") – The source to compute insight for.
series_id (string) – Required for multiseries projects. The series id to compute insight for.
Say if there is a series column containing cities,
the example of the series name to pass would be “Boston”
Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly
detection model in time series unsupervised project which also supports calculation of
Shapley values.
Records can be filtered by the data backtest, source and series_id.
The results can be limited.
Added in version v2.25.
Parameters:
backtest (int starting with 0 or "holdout") – The backtest of the data to filter records by.
source ("training" or "validation") – The source of the data to filter records by.
series_id (string) – The series id to filter records by.
limit (Optional[int])
offset (Optional[int])
with_data_only (Optional[bool]) – Whether to return only records with preview and explanations available.
False by default.
Retrieve the computed Feature Impact results, a measure of the relevance of each
feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly
permuted (but the others left unchanged), and seeing how the error metric score for the
predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score
is when making predictions on this modified data. The ‘impactNormalized’ is normalized so
that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t
contribute much in addition, the ‘redundantWith’ value is the name of feature that has the
highest correlation with this feature. Note that redundancy detection is only available for
jobs run after the addition of this feature. When retrieving data that predates this
functionality, a NoRedundancyImpactAvailable warning will be used.
Else where this technique is sometimes called ‘Permutation Importance’.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported
only in DatetimeModels
data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id.
By default, this function will use data_slice_filter.id == None which returns an unsliced insight.
If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns:
The feature impact data response depends on the with_metadata parameter. The response is
either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys featureName, impactNormalized, and
impactUnnormalized, redundantWith and count.
For dict response available keys are:
featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
: keys: featureName, impactNormalized, and impactUnnormalized, and
redundantWith.
shapBased - A boolean that indicates whether Feature Impact was calculated using
: Shapley values.
ranRedundancyDetection - A boolean that indicates whether redundant feature
: identification was run while calculating this Feature Impact.
rowCount - An integer or None that indicates the number of rows that was used to
: calculate Feature Impact. For the Feature Impact calculated with the default
logic, without specifying the rowCount, we return None here.
count - An integer with the number of features under the featureImpacts.
row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multi-class (that has a separate method) and time series
projects.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported
only in DatetimeModels
data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id.
By default, this function will use data_slice_filter.id == None which returns an unsliced insight.
If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns:job – A Job representing the feature impact computation. To get the completed feature impact
data, use job.get_result or job.get_result_when_complete.
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters:
max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring
row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multi-class (that has a separate method) and time series
projects.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
backtest (str) – Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id.
By default, this function will use data_slice_filter.id == None which returns an unsliced insight.
If data_slice_filter is None then get_roc_curve will raise a ValueError.
Returns:feature_impacts – The feature impact data. See
get_feature_impact for the exact
schema.
(New in version v3.4)
Request the model Lift Chart for the specified backtest data slice.
Parameters:
source (str) – (Deprecated in version v3.4)
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
If backtest_index is present then this will be ignored.
backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then request_lift_chart will raise a ValueError.
Returns:status_check_job – Object contains all needed logic for a periodical status check of an async job.
(New in version v3.4)
Retrieve the model Lift chart for the specified backtest and data slice.
Parameters:
source (str) – (Deprecated in version v3.4)
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
For time series and OTV models, also accepts values backtest_2, backtest_3, …,
up to the number of backtests in the model.
If backtest_index is present then this will be ignored.
backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
Returns:
Model lift chart data
Return type:LiftChart
Raises:
ClientError – If the insight is not available for this model
(New in version v3.4)
Request the binary model Roc Curve for the specified backtest and data slice.
Parameters:
source (str) – (Deprecated in version v3.4)
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
If backtest_index is present then this will be ignored.
backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then request_roc_curve will raise a ValueError.
Returns:status_check_job – Object contains all needed logic for a periodical status check of an async job.
(New in version v3.4)
Retrieve the ROC curve for a binary model for the specified backtest and data slice.
Parameters:
source (str) – (Deprecated in version v3.4)
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
For time series and OTV models, also accepts values backtest_2, backtest_3, …,
up to the number of backtests in the model.
If backtest_index is present then this will be ignored.
backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the data slice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_roc_curve will raise a ValueError.
Returns:
Model ROC curve data
Return type:RocCurve
Raises:
ClientError – If the insight is not available for this model
TypeError – If the underlying project type is multilabel
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Parameters:
params (dict) – Mapping of parameter ID to parameter value.
The list of valid parameter IDs for a model can be found by calling
get_advanced_tuning_parameters().
This endpoint does not need to include values for all parameters. If a parameter
is omitted, its current_value will be used.
description (str) – Human-readable string describing the newly advanced-tuned model
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Returns:
A dictionary describing the advanced-tuning parameters for the current model.
There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the
user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
* parameter_name : (str) name of the parameter (unique per task, see below)
* parameter_id : (str) opaque ID string uniquely identifying parameter
* default_value : (*) the actual value used to train the model; either
the single value of the parameter specified before training, or the best
value from the list of grid-searched values (based on current_value)
* current_value : (*) the single value or list of values of the
parameter that were grid searched. Depending on the grid search
specification, could be a single fixed value (no grid search),
a list of discrete values, or a range.
* task_name : (str) name of the task that this parameter belongs to
* constraints: (dict) see the notes below
* vertex_id: (str) ID of vertex that this parameter belongs to
* Return type:dict
Notes
The type of default_value and current_value is defined by the constraints structure.
It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys.
The presence of a key indicates that the parameter may take on the specified type.
(If a key is absent, this means that the parameter may not take on the specified type.)
If a key on constraints is present, its value will be a dict containing
all of the fields described below for that key.
select:
Rather than specifying a specific data type, if present, it indicates that the parameter
is permitted to take on any of the specified values. Listed values may be of any string
or real (non-complex) numeric type.
ascii:
The parameter may be a unicode object that encodes simple ASCII characters.
(A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed
constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode:
The parameter may be any Python unicode object.
int:
The value may be an object of type int within the specified range (inclusive).
Please note that the value will be passed around using the JSON format, and
some JSON parsers have undefined behavior with integers outside of the range
[-(2**53)+1, (2**53)-1].
float:
The value may be an object of type float within the specified range (inclusive).
intList, floatList:
The value may be a list of int or float objects, respectively, following constraints
as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple
keys, the parameter may take on any value permitted by any key.
Retrieve a list of all confusion matrices available for the model.
Parameters:fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent for any source that is not available for this model and if this
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
Returns:
Data for all available confusion charts for model.
Retrieve a list of all feature impact results available for the model.
Parameters:data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then no data_slice filtering will be applied when requesting the roc_curve.
Returns:
Data for all available model feature impacts. Or an empty list if not data found.
model=datarobot.Model(id='model-id',project_id='project-id')# Get feature impact insights for sliced datadata_slice=datarobot.DataSlice(id='data-slice-id')sliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get feature impact insights for unsliced datadata_slice=datarobot.DataSlice()unsliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get all feature impact insightsall_fi=model.get_all_feature_impacts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model lift charts. Or an empty list if no data found.
model=datarobot.Model.get('project-id','model-id')# Get lift chart insights for sliced datasliced_lift_charts=model.get_all_lift_charts(data_slice_id='data-slice-id')# Get lift chart insights for unsliced dataunsliced_lift_charts=model.get_all_lift_charts(unsliced_only=True)# Get all lift chart insightsall_lift_charts=model.get_all_lift_charts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Data for all available model lift charts.
Retrieve a list of all residuals charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent
for any source that is not available for this model and if this model has a
defined parent model. If omitted or False, or this model has no parent, this will
not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model residuals charts.
model=datarobot.Model.get('project-id','model-id')# Get residuals chart insights for sliced datasliced_residuals_charts=model.get_all_residuals_charts(data_slice_id='data-slice-id')# Get residuals chart insights for unsliced dataunsliced_residuals_charts=model.get_all_residuals_charts(unsliced_only=True)# Get all residuals chart insightsall_residuals_charts=model.get_all_residuals_charts()
Retrieve a list of all ROC curves available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on
data_slice_id.
Returns:
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
model=datarobot.Model.get('project-id','model-id')ds_filter=DataSlice(id='data-slice-id')# Get roc curve insights for sliced datasliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get roc curve insights for unsliced datadata_slice_filter=DataSlice(id=None)unsliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get all roc curve insightsall_roc_curves=model.get_all_roc_curves()
Retrieve a multiclass model’s confusion matrix for the specified source.
Parameters:
source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent if the confusion chart is not available for this model and the
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model ConfusionChart data
Return type:ConfusionChart
Raises:ClientError – If the insight is not available for this model
Query the server to determine which features were used.
Note that the data returned by this method is possibly different
than the names of the features in the featurelist used by this model.
This method will return the raw features that must be supplied in order
for predictions to be generated on a new set of data. The featurelist,
in contrast, would also include the names of derived features.
Returns:features – The names of the features used in the model.
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels.
This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
Returns:
Labelwise ROC Curve instances for source and all labels
Retrieve a report on missing training data that can be used to understand missing
values treatment in the model. The report consists of missing values resolutions for
features numeric or categorical features that were part of building the model.
Returns:
The queried model missing report, sorted by missing count (DESCENDING order).
Return type:An iterable of MissingReportPerFeature
For multiclass it’s possible to calculate feature impact separately for each target class.
The method for calculation is exactly the same, calculated in one-vs-all style for each
target class.
Returns:feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list),
‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’,
‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Retrieve model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Model lift chart data for each saved target class
Retrieve model Lift charts for the specified source.
Added in version v2.24.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model lift chart data for each saved target class
Check if this model can be approximated with DataRobot Prime
Returns:prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime
(key can_make_prime) and why it may be ineligible (key message)
Retrieve model residuals chart for the specified source.
Parameters:
source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible
values.
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if
the residuals chart is not available for this model and the model has a defined parent
model. If omitted or False, or there is no parent model, will not attempt to return
residuals data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_residuals_chart will raise a ValueError.
Returns:
Model residuals chart data
Return type:ResidualsChart
Raises:
ClientError – If the insight is not available for this model
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these
are rulesets approximating this model, not rulesets used to construct this model.
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
Returns:
supportsBlending (bool) – whether the model supports blending
supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
hasWordCloud (bool) – whether the model has word cloud data available
eligibleForPrime (bool) – (Deprecated in version v3.6)
whether the model is eligible for Prime
hasParameters (bool) – whether the model has parameters that can be retrieved
supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
supportsShap (bool) –
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
: feature Importance
* supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping
tree-based model and number of trained iterations can be retrieved.
Retrieve paginated model records, sorted by scores, with optional filtering.
Parameters:
sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
with_metric (str) – For a single-metric list of results, specify that project metric.
search_term (str) – If specified, only models containing the term in their name or processes are returned.
featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
families (List[str]) – If specified, only models belonging to selected families are returned.
blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned.
The following formats are supported for autoML and datetime partitioned projects:
number of rows in training subset
For datetime partitioned projects:
, example P6Y0M0D
-- Example: P6Y0M0D-78-Random,
(returns models trained on 6 years of data, sampling rate 78%, random sampling).
Start/end date
Project settings
number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After
comparing their scores and rule counts, the code used in the approximation can be downloaded
and run locally.
Request external test to compute scores and insights on an external test dataset
Parameters:
dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
Returns:job – a Job representing external dataset insights computation
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an
error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
In addition of training_row_count and training_duration, frozen datetime models may be
trained on an exact date range. Only one of training_row_count, training_duration, or
training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can
be trained into the holdout data (once the holdout is unlocked).
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
training_duration may not be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, training_row_count may not be specified.
training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after
this datetime will be used. If training_start_date is specified, training_end_date
must also be specified.
training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before
this datetime will be used. If training_end_date is specified, training_start_date
must also be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
Returns:model_job – the modeling job training a frozen model
Requests predictions against a previously uploaded dataset.
Parameters:
dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframe (pd.DataFrame, optional) – (New in v3.0)
The dataframe to make predictions against
file_path (Optional[str]) – (New in v3.0)
Path to file to make predictions against
file (IOBase, optional) – (New in v3.0)
File to make predictions against
include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only.
Specifies whether prediction intervals should be calculated for this request. Defaults
to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only.
Represents the percentile to use for the size of the prediction intervals. Defaults to
80 if include_prediction_intervals is True. Prediction intervals size must be
between 1 and 100 (inclusive).
forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative
to which predictions will be generated, based on the forecast window of the project. See
the time series prediction documentation for more
information.
predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk
predictions. Note that this parameter is for generating historical predictions using the
training data. This parameter should be provided in conjunction with
predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk
predictions, exclusive. Note that this parameter is for generating historical
predictions using the training data. This parameter should be provided in conjunction
with predictions_start_date. Can’t be provided with the
forecast_point parameter.
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley
Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value,
greatest to least. If null, no limit. In the case of ‘shap’: if the number of features
is greater than the limit, the sum of remaining values will also be returned as
shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is
omitted.
max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that
should be returned. If set to all, text explanations will be computed and all the
ngram explanations will be returned. If set to a non zero positive integer value, text
explanations will be computed and this amount of descendingly sorted ngram explanations
will be returned. By default text explanation won’t be triggered to be computed.
May not be used once prediction_threshold_read_only is True for this model.
Parameters:threshold (float) – only used for binary classification projects. The threshold to when deciding between
the positive and negative classes when making predictions. Should be between 0.0 and
1.0 (inclusive).
Submit a job to the queue to perform the first incremental learning iteration training on an existing
sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
Parameters:
early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first
iteration. If set to True, the training process will be performed only for the first
iteration. If set to False, training will continue until early stopping conditions
are met or the maximum number of iterations is reached. The default value is False.
Returns:job – The created job that is retraining the model
featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this
model is used.
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
neither training_duration nor use_project_settings may be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, neither training_row_count nor use_project_settings may be
specified.
use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom
backtest partitioning settings specified by the user will be used to train the model and
evaluate backtest scores. If specified, neither training_row_count nor
training_duration may be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
Submit a job to the queue to perform incremental training on an existing model using
additional data. The id of the additional data to use for training is specified with the data_stage_id.
Optionally a name for the iteration can be supplied by the user to help identify the contents of data in
the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
Parameters:
data_stage_id (str) – The id of the data stage to use for training.
training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8).
Supported formats: UTF-8, ASCII, WINDOWS1252
data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None).
Supported formats: zip
Returns:job – The created job that is retraining the model
project_id (str) – the id of the project the model belongs to
processes (List[str]) – the processes used by the model
featurelist_name (str) – the name of the featurelist used by the model
featurelist_id (str) – the id of the featurelist used by the model
sample_pct (float) – the percentage of the project dataset used in training the model
training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime
partitioned project, if specified, defines the number of rows used to train the model and
evaluate backtest scores; if unspecified, either training_duration or
training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string
specifying the duration spanned by the data used to train the model and evaluate backtest
scores.
training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end
date of the data used to train the model.
model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models,
and ‘model’ for other models
is_frozen (bool) – whether this model is a frozen model
parent_model_id (str) – the id of the model that tuning parameters are derived from
blueprint_id (str) – the id of the blueprint used in this model
metrics (dict) – a mapping from each metric to the model’s scores for that metric
monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
is_starred (bool) – whether this model marked as starred
prediction_threshold (float) – for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
model_number (integer) – model number assigned to a model
supports_composable_ml (bool or None) – (New in version v2.26)
whether this model is supported in the Composable ML.
project_id (str) – the id of the project the model belongs to
processes (List[str]) – the processes used by the model
featurelist_name (str) – the name of the featurelist used by the model
featurelist_id (str) – the id of the featurelist used by the model
sample_pct (float or None) – the percentage of the project dataset used in training the model. If the project uses
datetime partitioning, the sample_pct will be None. See training_row_count,
training_duration, and training_start_date and training_end_date instead.
training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime
partitioned project, if specified, defines the number of rows used to train the model and
evaluate backtest scores; if unspecified, either training_duration or
training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string
specifying the duration spanned by the data used to train the model and evaluate backtest
scores.
training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start
date of the data used to train the model.
training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end
date of the data used to train the model.
model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models,
and ‘model’ for other models
is_frozen (bool) – whether this model is a frozen model
blueprint_id (str) – the id of the blueprint used in this model
metrics (dict) – a mapping from each metric to the model’s scores for that metric
rating_table_id (str) – the id of the rating table that belongs to this model
monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically increasing relationship to the target.
If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with
a monotonically decreasing relationship to the target.
If None, no such constraints are enforced.
supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints
is_starred (bool) – whether this model marked as starred
prediction_threshold (float) – for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold
modification is forbidden once a model has had a deployment created or predictions made via
the dedicated prediction API.
model_number (integer) – model number assigned to a model
supports_composable_ml (bool or None) – (New in version v2.26)
whether this model is supported in the Composable ML.
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Parameters:
params (dict) – Mapping of parameter ID to parameter value.
The list of valid parameter IDs for a model can be found by calling
get_advanced_tuning_parameters().
This endpoint does not need to include values for all parameters. If a parameter
is omitted, its current_value will be used.
description (str) – Human-readable string describing the newly advanced-tuned model
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and
user-created support Advanced Tuning.
Returns:
A dictionary describing the advanced-tuning parameters for the current model.
There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the
user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
* parameter_name : (str) name of the parameter (unique per task, see below)
* parameter_id : (str) opaque ID string uniquely identifying parameter
* default_value : (*) the actual value used to train the model; either
the single value of the parameter specified before training, or the best
value from the list of grid-searched values (based on current_value)
* current_value : (*) the single value or list of values of the
parameter that were grid searched. Depending on the grid search
specification, could be a single fixed value (no grid search),
a list of discrete values, or a range.
* task_name : (str) name of the task that this parameter belongs to
* constraints: (dict) see the notes below
* vertex_id: (str) ID of vertex that this parameter belongs to
* Return type:dict
Notes
The type of default_value and current_value is defined by the constraints structure.
It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys.
The presence of a key indicates that the parameter may take on the specified type.
(If a key is absent, this means that the parameter may not take on the specified type.)
If a key on constraints is present, its value will be a dict containing
all of the fields described below for that key.
select:
Rather than specifying a specific data type, if present, it indicates that the parameter
is permitted to take on any of the specified values. Listed values may be of any string
or real (non-complex) numeric type.
ascii:
The parameter may be a unicode object that encodes simple ASCII characters.
(A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed
constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode:
The parameter may be any Python unicode object.
int:
The value may be an object of type int within the specified range (inclusive).
Please note that the value will be passed around using the JSON format, and
some JSON parsers have undefined behavior with integers outside of the range
[-(2**53)+1, (2**53)-1].
float:
The value may be an object of type float within the specified range (inclusive).
intList, floatList:
The value may be a list of int or float objects, respectively, following constraints
as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple
keys, the parameter may take on any value permitted by any key.
Retrieve a list of all confusion matrices available for the model.
Parameters:fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent for any source that is not available for this model and if this
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
Returns:
Data for all available confusion charts for model.
Retrieve a list of all feature impact results available for the model.
Parameters:data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then no data_slice filtering will be applied when requesting the roc_curve.
Returns:
Data for all available model feature impacts. Or an empty list if not data found.
model=datarobot.Model(id='model-id',project_id='project-id')# Get feature impact insights for sliced datadata_slice=datarobot.DataSlice(id='data-slice-id')sliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get feature impact insights for unsliced datadata_slice=datarobot.DataSlice()unsliced_fi=model.get_all_feature_impacts(data_slice_filter=data_slice)# Get all feature impact insightsall_fi=model.get_all_feature_impacts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model lift charts. Or an empty list if no data found.
model=datarobot.Model.get('project-id','model-id')# Get lift chart insights for sliced datasliced_lift_charts=model.get_all_lift_charts(data_slice_id='data-slice-id')# Get lift chart insights for unsliced dataunsliced_lift_charts=model.get_all_lift_charts(unsliced_only=True)# Get all lift chart insightsall_lift_charts=model.get_all_lift_charts()
Retrieve a list of all Lift charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Data for all available model lift charts.
Retrieve a list of all residuals charts available for the model.
Parameters:
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent
for any source that is not available for this model and if this model has a
defined parent model. If omitted or False, or this model has no parent, this will
not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id.
If None (the default) applies no filter based on data_slice_id.
Returns:
Data for all available model residuals charts.
model=datarobot.Model.get('project-id','model-id')# Get residuals chart insights for sliced datasliced_residuals_charts=model.get_all_residuals_charts(data_slice_id='data-slice-id')# Get residuals chart insights for unsliced dataunsliced_residuals_charts=model.get_all_residuals_charts(unsliced_only=True)# Get all residuals chart insightsall_residuals_charts=model.get_all_residuals_charts()
Retrieve a list of all ROC curves available for the model.
Parameters:
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent for any source that is not available for this model and if this model
has a defined parent model. If omitted or False, or this model has no parent,
this will not attempt to retrieve any data from this model’s parent.
data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on
data_slice_id.
Returns:
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
model=datarobot.Model.get('project-id','model-id')ds_filter=DataSlice(id='data-slice-id')# Get roc curve insights for sliced datasliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get roc curve insights for unsliced datadata_slice_filter=DataSlice(id=None)unsliced_roc=model.get_all_roc_curves(data_slice_filter=ds_filter)# Get all roc curve insightsall_roc_curves=model.get_all_roc_curves()
Retrieve a multiclass model’s confusion matrix for the specified source.
Parameters:
source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for
this model’s parent if the confusion chart is not available for this model and the
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model ConfusionChart data
Return type:ConfusionChart
Raises:ClientError – If the insight is not available for this model
Return a dictionary, keyed by metric, showing cross validation
scores per partition.
Cross Validation should already have been performed using
cross_validate or
train.
Notes
Models that computed cross validation before this feature was added will need
to be deleted and retrained before this method can be used.
Parameters:
partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by
can be a whole number positive integer or float value. 0 corresponds to the
validation partition.
metric (unicode) – optional name of the metric to filter to resulting cross validation scores by
Returns:cross_validation_scores – A dictionary keyed by metric showing cross validation scores per
partition.
Feature Effects provides partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older
projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions
(i.e., no out-of-sample predictions in those partitions),
Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for
the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500
features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after
accounting for the average effects of all other predictive features. It indicates how,
holding all other variables except the feature of interest as they were,
the value of this feature affects your prediction.
Retrieve the computed Feature Impact results, a measure of the relevance of each
feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly
permuted (but the others left unchanged), and seeing how the error metric score for the
predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score
is when making predictions on this modified data. The ‘impactNormalized’ is normalized so
that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t
contribute much in addition, the ‘redundantWith’ value is the name of feature that has the
highest correlation with this feature. Note that redundancy detection is only available for
jobs run after the addition of this feature. When retrieving data that predates this
functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
with_metadata (bool) – The flag indicating if the result should include the metadata as well.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_feature_impact will raise a ValueError.
Returns:
The feature impact data response depends on the with_metadata parameter. The response is
either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys featureName, impactNormalized, and
impactUnnormalized, redundantWith and count.
For dict response available keys are:
featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
: keys: featureName, impactNormalized, and impactUnnormalized, and
redundantWith.
shapBased - A boolean that indicates whether Feature Impact was calculated using
: Shapley values.
ranRedundancyDetection - A boolean that indicates whether redundant feature
: identification was run while calculating this Feature Impact.
rowCount - An integer or None that indicates the number of rows that was used to
: calculate Feature Impact. For the Feature Impact calculated with the default
logic, without specifying the rowCount, we return None here.
count - An integer with the number of features under the featureImpacts.
Query the server to determine which features were used.
Note that the data returned by this method is possibly different
than the names of the features in the featurelist used by this model.
This method will return the raw features that must be supplied in order
for predictions to be generated on a new set of data. The featurelist,
in contrast, would also include the names of derived features.
Returns:features – The names of the features used in the model.
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels.
This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
Returns:
Labelwise ROC Curve instances for source and all labels
Retrieve the model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
Returns:
Model lift chart data
Return type:LiftChart
Raises:
ClientError – If the insight is not available for this model
Retrieve a report on missing training data that can be used to understand missing
values treatment in the model. The report consists of missing values resolutions for
features numeric or categorical features that were part of building the model.
Returns:
The queried model missing report, sorted by missing count (DESCENDING order).
Return type:An iterable of MissingReportPerFeature
For multiclass it’s possible to calculate feature impact separately for each target class.
The method for calculation is exactly the same, calculated in one-vs-all style for each
target class.
Returns:feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list),
‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’,
‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Retrieve model Lift chart for the specified source.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_lift_chart will raise a ValueError.
target_class (str, optional) – Lift chart target class name.
Returns:
Model lift chart data for each saved target class
Retrieve model Lift charts for the specified source.
Added in version v2.24.
Parameters:
source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this
model’s parent if the lift chart is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return insight data from this model’s parent.
Returns:
Model lift chart data for each saved target class
source (string) – The source Feature Effects are retrieved for.
max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:feature_effects – The Feature Effects data.
Check if this model can be approximated with DataRobot Prime
Returns:prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime
(key can_make_prime) and why it may be ineligible (key message)
Retrieve model residuals chart for the specified source.
Parameters:
source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible
values.
fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if
the residuals chart is not available for this model and the model has a defined parent
model. If omitted or False, or there is no parent model, will not attempt to return
residuals data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_residuals_chart will raise a ValueError.
Returns:
Model residuals chart data
Return type:ResidualsChart
Raises:
ClientError – If the insight is not available for this model
Retrieve the ROC curve for a binary model for the specified source.
This method is valid only for binary projects. For multilabel projects, use
Model.get_labelwise_roc_curves.
Parameters:
source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
(New in version v2.23) For time series and OTV models, also accepts values backtest_2,
backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this
model’s parent if the ROC curve is not available for this model and the model has a
defined parent model. If omitted or False, or there is no parent model, will not
attempt to return data from this model’s parent.
data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will
use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None
then get_roc_curve will raise a ValueError.
Returns:
Model ROC curve data
Return type:RocCurve
Raises:
ClientError – If the insight is not available for this model
(New in version v3.0)TypeError – If the underlying project type is multilabel
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these
are rulesets approximating this model, not rulesets used to construct this model.
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
Returns:
supportsBlending (bool) – whether the model supports blending
supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints
hasWordCloud (bool) – whether the model has word cloud data available
eligibleForPrime (bool) – (Deprecated in version v3.6)
whether the model is eligible for Prime
hasParameters (bool) – whether the model has parameters that can be retrieved
supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation
supportsShap (bool) –
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
: feature Importance
* supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping
tree-based model and number of trained iterations can be retrieved.
Retrieve paginated model records, sorted by scores, with optional filtering.
Parameters:
sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.
sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
with_metric (str) – For a single-metric list of results, specify that project metric.
search_term (str) – If specified, only models containing the term in their name or processes are returned.
featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.
families (List[str]) – If specified, only models belonging to selected families are returned.
blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.
labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.
characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.
training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned.
The following formats are supported for autoML and datetime partitioned projects:
number of rows in training subset
For datetime partitioned projects:
, example P6Y0M0D
-- Example: P6Y0M0D-78-Random,
(returns models trained on 6 years of data, sampling rate 78%, random sampling).
Start/end date
Project settings
number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After
comparing their scores and rule counts, the code used in the approximation can be downloaded
and run locally.
Request external test to compute scores and insights on an external test dataset
Parameters:
dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
Returns:job – a Job representing external dataset insights computation
row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation.
Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model,
whichever is less.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – A Job representing the feature effect computation. To get the completed feature effect
data, use job.get_result or job.get_result_when_complete.
row_count (int) – The number of rows from dataset to use for Feature Impact calculation.
top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.
features (list or None) – The list of features used to calculate Feature Effects.
Returns:job – A Job representing Feature Effect computation. To get the completed Feature Effect
data, use job.get_result or job.get_result_when_complete.
row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not
supported for unsupervised, multiclass (which has a separate method), and time series
projects.
with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata.
If true, metadata is included.
data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.
Returns:job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact
data, use job.get_result or job.get_result_when_complete.
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an
error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
In addition of training_row_count and training_duration, frozen datetime models may be
trained on an exact date range. Only one of training_row_count, training_duration, or
training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can
be trained into the holdout data (once the holdout is unlocked).
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
training_duration may not be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, training_row_count may not be specified.
training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after
this datetime will be used. If training_start_date is specified, training_end_date
must also be specified.
training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before
this datetime will be used. If training_end_date is specified, training_start_date
must also be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
Returns:model_job – the modeling job training a frozen model
Train a new frozen model with parameters from this model
Notes
This method only works if project the model belongs to is not datetime
partitioned. If it is, use request_frozen_datetime_model instead.
Frozen models use the same tuning parameters as their parent model instead of independently
optimizing them to allow efficiently retraining models on larger amounts of the training
data.
Parameters:
sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will
use the value from this model.
training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with
the model. Only one of sample_pct and training_row_count should be specified.
Returns:model_job – the modeling job training a frozen model
Requests predictions against a previously uploaded dataset.
Parameters:
dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframe (pd.DataFrame, optional) – (New in v3.0)
The dataframe to make predictions against
file_path (Optional[str]) – (New in v3.0)
Path to file to make predictions against
file (IOBase, optional) – (New in v3.0)
File to make predictions against
include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only.
Specifies whether prediction intervals should be calculated for this request. Defaults
to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only.
Represents the percentile to use for the size of the prediction intervals. Defaults to
80 if include_prediction_intervals is True. Prediction intervals size must be
between 1 and 100 (inclusive).
forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative
to which predictions will be generated, based on the forecast window of the project. See
the time series prediction documentation for more
information.
predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk
predictions. Note that this parameter is for generating historical predictions using the
training data. This parameter should be provided in conjunction with
predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk
predictions, exclusive. Note that this parameter is for generating historical
predictions using the training data. This parameter should be provided in conjunction
with predictions_start_date. Can’t be provided with the
forecast_point parameter.
actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only.
Actual value column can be used to calculate the classification metrics and
insights on the prediction dataset. Can’t be provided with the forecast_point
parameter.
explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley
Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value,
greatest to least. If null, no limit. In the case of ‘shap’: if the number of features
is greater than the limit, the sum of remaining values will also be returned as
shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is
omitted.
max_ngram_explanations (optional; int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that
should be returned. If set to all, text explanations will be computed and all the
ngram explanations will be returned. If set to a non zero positive integer value, text
explanations will be computed and this amount of descendingly sorted ngram explanations
will be returned. By default text explanation won’t be triggered to be computed.
data set definition to build predictions on.
Choices are:
dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
: models in datetime partitioned projects
dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
: all data except training set. Not valid for models in datetime partitioned
projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
: the predictions for all backtest validation folds. Requires the model to have
successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response
will include prediction explanations based on the SHAP explainer (SHapley Additive
exPlanations). Defaults to None (no prediction explanations).
max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should
be returned for each row, ordered by absolute value, greatest to least. In the case of
dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all
features. If the number of features is greater than the max_explanations, the sum of
remaining values will also be returned as shap_remaining_total. Max 100. Defaults to
null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100
columns. Is ignored if explanation_algorithm is not set.
Submit a job to the queue to train a blender model.
Parameters:
sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used
then training_row_count should not be given.
featurelist_id (Optional[str]) – The featurelist id
training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct
should not be given.
n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that do not determine
the number of clusters automatically.
Returns:job – The created job that is retraining the model
May not be used once prediction_threshold_read_only is True for this model.
Parameters:threshold (float) – only used for binary classification projects. The threshold to when deciding between
the positive and negative classes when making predictions. Should be between 0.0 and
1.0 (inclusive).
Submit a job to the queue to perform the first incremental learning iteration training on an existing
sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
Parameters:
early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
first_iteration_only (bool) – Specifies whether incremental learning training should be limited to the first
iteration. If set to True, the training process will be performed only for the first
iteration. If set to False, training will continue until early stopping conditions
are met or the maximum number of iterations is reached. The default value is False.
Returns:job – The created job that is retraining the model
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to
the end of the queue for this project.
After the job has finished you can get the newly trained model by retrieving
it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to
use, but not both. If neither are specified, a default of the maximum amount of data that
can safely be used to train any blueprint without going into the validation data will be
selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms
of rows of the minority class.
Notes
For datetime partitioned projects, see train_datetime instead.
Parameters:
sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from
0 to 100.
featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the
featurelist of this model is used.
scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation
or dr.SCORING_TYPE.cross_validation). validation is available for every
partitioning type, and indicates that the default model validation should be
used for the project.
If the project uses a form of cross-validation partitioning,
crossValidation can also be used to indicate
that all of the available training/validation combinations
should be used to evaluate the model.
training_row_count (Optional[int]) – The number of rows to use to train the requested model.
monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
Returns:model_job_id – id of created job, can be used as parameter to ModelJob.get
method or wait_for_async_model_creation function
featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this
model is used.
training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified,
neither training_duration nor use_project_settings may be specified.
training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should
span. If specified, neither training_row_count nor use_project_settings may be
specified.
use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom
backtest partitioning settings specified by the user will be used to train the model and
evaluate backtest scores. If specified, neither training_row_count nor
training_duration may be specified.
time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start
and end dates). An integer between 1 and 99 indicating the percentage to sample by
within the window. The points kept are determined by a random uniform sample.
If specified, training_duration must be specified otherwise, the number of rows used
to train the model and evaluate backtest scores and an error will occur.
sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either
random or latest. In combination with training_row_count defines how rows
are selected from backtest (latest by default). When training data is defined using
time range (training_duration or use_project_settings) this setting changes the
way time_window_sample_pct is applied (random by default). Applicable to OTV
projects only.
monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically increasing relationship to the target.
Passing None disables increasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines
the set of features with a monotonically decreasing relationship to the target.
Passing None disables decreasing monotonicity constraint. Default
(dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model.
This parameter is used only for unsupervised clustering models that don’t automatically
determine the number of clusters.
Submit a job to the queue to perform incremental training on an existing model using
additional data. The id of the additional data to use for training is specified with the data_stage_id.
Optionally a name for the iteration can be supplied by the user to help identify the contents of data in
the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
Parameters:
data_stage_id (str) – The id of the data stage to use for training.
training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.
data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8).
Supported formats: UTF-8, ASCII, WINDOWS1252
data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).
data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None).
Supported formats: zip
Returns:job – The created job that is retraining the model
Compute and retrieve cluster insights for model. This method awaits completion of
job computing cluster insights and returns results after it is finished. If computation
takes longer than specified max_wait exception will be raised.
Parameters:
project_id (str) – Project to start creation in.
model_id (str) – Project’s model to start creation in.
max_wait (int) – Maximum number of seconds to wait before giving up
Return type:List of ClusterInsight
Raises:
ClientError – Server rejected creation due to client error.
Most likely cause is bad project_id or model_id.
AsyncFailureError – If any of the responses from the server are unexpected
Change many cluster names at once based on list of name mappings.
Parameters:cluster_name_mappings (List of tuples) –
Cluster names mapping consisting of current cluster name and old cluster name.
Example:
cluster_name_mappings=[("current cluster name 1","new cluster name 1"),("current cluster name 2","new cluster name 2")]
* Return type:List of Cluster
* Raises:datarobot.errors.ClientError – Server rejected update of cluster names.
Possible reasons include: incorrect format of mapping, mapping introduces duplicates.
class datarobot.models.cluster_insight.ClusterInsight¶
Holds data on all insights related to feature as well as breakdown per cluster.
Parameters:
feature_name (str) – Name of a feature from the dataset.
feature_type (str) – Type of feature.
insights (List[ClusterInsight]) – List provides information regarding the importance of a specific feature in relation
to each cluster. Results help understand how the model is grouping data and what each
cluster represents.
feature_impact (float) – Impact of a feature ranging from 0 to 1.
Starts creation of cluster insights for the model and if successful, returns computed
ClusterInsights. This method allows calculation to continue for a specified time and
if not complete, cancels the request.
Parameters:
project_id (str) – ID of the project to begin creation of cluster insights for.
model_id (str) – ID of the project model to begin creation of cluster insights for.
max_wait (int) – Maximum number of seconds to wait canceling the request.
Return type:List[ClusterInsight]
Raises:
ClientError – Server rejected creation due to client error.
Most likely cause is bad project_id or model_id.
AsyncFailureError – Indicates whether any of the responses from the server are unexpected.
The pareto front reflects the tradeoffs between error and complexity for particular model. The
solutions reflect possible Eureqa models that are different levels of complexity. By default,
only one solution will have a corresponding model, but models can be created for each solution.
Variables:
project_id (str) – the ID of the project the model belongs to
error_metric (str) – Eureqa error-metric identifier used to compute error metrics for this search. Note that
Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available
metrics are not the same, and are computed from a subset of the training data rather than
from the validation data.
hyperparameters (dict) – Hyperparameters used by this run of the Eureqa blueprint
target_type (str) – Indicating what kind of modeling is being done in this project, either ‘Regression’,
‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).
solutions (list(Solution)) – Solutions that Eureqa has found to model this data.
Some solutions will have greater accuracy. Others will have slightly
less accuracy but will use simpler expressions.
A solution represents a possible Eureqa model; however not all solutions
have models associated with them. It must have a model created before
it can be used to make predictions, etc.
Variables:
eureqa_solution_id (str) – ID of this Solution
complexity (int) – Complexity score for this solution. Complexity score is a function
of the mathematical operators used in the current solution.
The Complexity calculation can be tuned via model hyperparameters.
error (float or None) – Error for the current solution, as computed by Eureqa using the
‘error_metric’ error metric. It will be None if model refitted existing solution.
expression (str) – Eureqa model equation string.
expression_annotated (str) – Eureqa model equation string with variable names tagged for easy identification.
best_model (bool) – True, if the model is determined to be the best
class datarobot.models.advanced_tuning.AdvancedTuningSession¶
A session enabling users to configure and run advanced tuning for a model.
Every model contains a set of one or more tasks. Every task contains a set of
zero or more parameters. This class allows tuning the values of each parameter
on each task of a model, before running that model.
This session is client-side only and is not persistent.
Only the final model, constructed when run is called, is persisted on the DataRobot server.
Variables:description (str) – Description for the new advance-tuned model.
Defaults to the same description as the base model.
The caller must supply enough of the optional arguments to this function
to uniquely identify the parameter that is being set.
For example, a less-common parameter name such as
‘building_block__complementary_error_function’ might only be used once (if at all)
by a single task in a model. In which case it may be sufficient to simply specify
‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by
several of the model’s tasks, and it may be necessary to also specify ‘task_name’
to clarify which task’s random seed is to be set.
This function only affects client-side state. It will not check that the new parameter
value(s) are valid.
Parameters:
task_name (str) – Name of the task whose parameter needs to be set
parameter_name (str) – Name of the parameter to set
parameter_id (str) – ID of the parameter to set
value (int, float, list, or str) – New value for the parameter, with legal values determined by the parameter being set
Raises:
NoParametersFoundException – if no matching parameters are found.
NonUniqueParametersException – if multiple parameters matched the specified filtering criteria
Returns the set of parameters available to this model
The returned parameters have one additional key, “value”, reflecting any new values that
have been set in this AdvancedTuningSession. When the session is run, “value” will be used,
or if it is unset, “current_value”.
For multiclass projects with a lot of unique values in target column you can
specify the parameters for aggregation of rare values to improve the modeling
performance and decrease the runtime and resource usage of resulting models.
class datarobot.helpers.ClassMappingAggregationSettings¶
Class mapping aggregation settings.
For multiclass projects allows fine control over which target values will be
preserved as classes. Classes which aren’t preserved will be
- aggregated into a single “catch everything else” class in case of multiclass
- or will be ignored in case of multilabel.
All attributes are optional, if not specified - server side defaults will be used.
Variables:
max_unaggregated_class_values (Optional[int]) – Maximum amount of unique values allowed before aggregation kicks in.
min_class_support (Optional[int]) – Minimum number of instances necessary for each target value in the dataset.
All values with less instances will be aggregated.
excluded_from_aggregation (Optional[List]) – List of target values that should be guaranteed to kept as is,
regardless of other settings.
aggregation_class_name (Optional[str]) – If some of the values will be aggregated - this is the name of the aggregation class
that will replace them.
Parameters:params (dict or None) – Query parameters to be added to request to get results.
Notes
For featureEffects, source param is required to define source,
otherwise the default is training.
Returns:result –
Return type depends on the job type
: - for model jobs, a Model is returned
- for predict jobs, a pandas.DataFrame (with predictions) is returned
- for featureImpact jobs, a list of dicts by default (see with_metadata
parameter of the FeatureImpactJob class and its get() method).
- for primeRulesets jobs, a list of Rulesets
- for primeModel jobs, a PrimeModel
- for primeDownloadValidation jobs, a PrimeFile
- for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
- for predictionExplanations jobs, a PredictionExplanations
- for featureEffects, a FeatureEffects.
* Return type:object
* Raises:
* JobNotFinished – If the job is not finished, the result is not available.
* AsyncProcessUnsuccessfulError – If the job errored or was aborted
environment_id (Optional[str]) – The environment ID to use for job runs.
The ID must be specified in order to run the job.
environment_version_id (Optional[str]) – The environment version ID to use for job runs.
If not specified, the latest version of the execution environment will be used.
folder_path (Optional[str]) – The path to a folder containing files to be uploaded.
Each file in the folder is uploaded under path relative
to a folder path.
files (Optional[Union[List[Tuple[str, str]], List[str]]]) – The files to be uploaded to the job.
The files can be defined in 2 ways:
List of tuples where 1st element is the local path of the file to be uploaded
and the 2nd element is the file path in the job file system.
List of local paths of the files to be uploaded.
In this case files are added to the root of the model file system.
file_data (Optional[Dict[str, str]]) – The files content to be uploaded to the job.
Defined as a dictionary where keys are the file paths in the job file system.
and values are the files content.
runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName
must match a fieldName that is listed in the runtimeParameterDefinitions section
of the model-metadata.yaml file.
entry_point (Optional[str]) – The job file item ID to use as an entry point of the job.
environment_id (Optional[str]) – The environment ID to use for job runs.
Must be specified in order to run the job.
environment_version_id (Optional[str]) – The environment version ID to use for job runs.
If not specified, the latest version of the execution environment will be used.
description (str) – The job description.
folder_path (Optional[str]) – The path to a folder containing files to be uploaded.
Each file in the folder is uploaded under path relative
to a folder path.
files (Optional[Union[List[Tuple[str, str]], List[str]]]) – The files to be uploaded to the job.
The files can be defined in 2 ways:
List of tuples where 1st element is the local path of the file to be uploaded
and the 2nd element is the file path in the job file system.
List of local paths of the files to be uploaded.
In this case files are added to the root of the job file system.
file_data (Optional[Dict[str, str]]) – The files content to be uploaded to the job.
Defined as a dictionary where keys are the file paths in the job file system.
and values are the files content.
runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName
must match a fieldName that is listed in the runtimeParameterDefinitions section
of the model-metadata.yaml file.
max_wait (Optional[int]) – max time to wait for a terminal status (“succeeded”, “failed”, “interrupted”, “canceled”).
If set to None - method will return without waiting.
runtime_parameter_values (Optional[List[RuntimeParameterValue]]) – Additional parameters to be injected into a model at runtime. The fieldName
must match a fieldName that is listed in the runtimeParameterDefinitions section
of the model-metadata.yaml file.
class datarobot.models.missing_report.MissingValuesReport¶
Missing values report for model, contains list of reports per feature sorted by missing
count in descending order.
Notes
Report per feature contains:
feature : feature name.
type : feature type – ‘Numeric’ or ‘Categorical’.
missing_count : missing values count in training data.
missing_percentage : missing values percentage in training data.
tasks : list of information per each task, which was applied to feature.
task information contains:
id : a number of task in the blueprint diagram.
name : task name.
descriptions : human readable aggregated information about how the task handles
missing values. The following descriptions may be present: what value is imputed for
missing values, whether the feature being missing is treated as a feature by the task,
whether missing values are treated as infrequent values,
whether infrequent values are treated as missing values,
and whether missing values are ignored.
limit (Optional[int]) – Maximum number of registered models to return
offset (Optional[int]) – Number of registered models to skip before returning results
sort_key (RegisteredModelSortKey, optional) – Key to order result by
sort_direction (RegisteredModelSortDirection, optional) – Sort direction
search (Optional[str]) – A term to search for in registered model name, description, or target name
filters (RegisteredModelListFilters, optional) – An object containing all filters that you’d like to apply to the
resulting list of registered models.
Returns:registered_models – A list of registered models user can view.
Return type:List[RegisteredModel]
Examples
fromdatarobotimportRegisteredModelregistered_models=RegisteredModel.list()>>>[RegisteredModel('My Registered Model'),RegisteredModel('My Other Registered Model')]
fromdatarobotimportRegisteredModelfromdatarobot.models.model_registryimportRegisteredModelListFiltersfromdatarobot.enumsimportRegisteredModelSortKey,RegisteredModelSortDirectionfilters=RegisteredModelListFilters(target_type='Regression')registered_models=RegisteredModel.list(filters=filters,sort_key=RegisteredModelSortKey.NAME.value,sort_direction=RegisteredModelSortDirection.DESC.valuesearch='other')>>>[RegisteredModel('My Other Registered Model')]
fromdatarobotimportRegisteredModelregistered_model=RegisteredModel.get('5c939e08962d741e34f609f0')registered_model_version=registered_model.get_version('5c939e08962d741e34f609f0')>>>RegisteredModelVersion('My Registered Model Version')
filters (Optional[RegisteredModelVersionsListFilters]) – A RegisteredModelVersionsListFilters instance used to filter the list of registered model versions returned.
search (Optional[str]) – A search string used to filter the list of registered model versions returned.
sort_key (Optional[RegisteredModelVersionSortKey]) – The key to use to sort the list of registered model versions returned.
sort_direction (Optional[RegisteredModelSortDirection]) – The direction to use to sort the list of registered model versions returned.
limit (Optional[int]) – The maximum number of registered model versions to return. Default is 100.
offset (Optional[int]) – The number of registered model versions to skip over. Default is 0.
Returns:registered_model_versions – A list of registered model version objects.
Return type:List[RegisteredModelVersion]
Examples
fromdatarobotimportRegisteredModelfromdatarobot.models.model_registryimportRegisteredModelVersionsListFiltersfromdatarobot.enumsimportRegisteredModelSortKey,RegisteredModelSortDirectionregistered_model=RegisteredModel.get('5c939e08962d741e34f609f0')filters=RegisteredModelVersionsListFilters(tags=['tag1','tag2'])registered_model_versions=registered_model.list_versions(filters=filters)>>>[RegisteredModelVersion('My Registered Model Version')]
id (str) – The ID of the registered model version.
registered_model_id (str) – The ID of the parent registered model.
registered_model_version (int) – The version of the registered model.
name (str) – The name of the registered model version.
model_id (str) – The ID of the model.
model_execution_type (str) – Type of model package (version). dedicated (native DataRobot models) and
custom_inference_model` (user added inference models) both execute on DataRobot
prediction servers, external do not
is_archived (bool) –
Whether the model package (version) is permanently archived (cannot be used in deployment or
: replacement)
* import_meta (ImportMeta) – Information from when this Model Package (version) was first saved.
* source_meta (SourceMeta) – Meta information from where this model was generated
* model_kind (ModelKind) – Model attribute information.
* target (Target) – Target information for the registered model version.
* model_description (ModelDescription) – Model description information.
* datasets (Dataset) – Dataset information for the registered model version.
* timeseries (Timeseries) – Timeseries information for the registered model version.
* bias_and_fairness (BiasAndFairness) – Bias and fairness information for the registered model version.
* is_deprecated (bool) –
Whether the model package (version) is deprecated (cannot be used in deployment or
: replacement)
* permissions (List[str]) – Permissions for the registered model version.
* active_deployment_count (int or None) – Number of the active deployments associated with the registered model version.
* build_status (str or None) – Model package (version) build status. One of complete, inProgress, failed.
* user_provided_id (str or None) – User provided ID for the registered model version.
* updated_at (str or None) – The time the registered model version was last updated.
* updated_by (UserMetadata or None) – The user who last updated the registered model version.
* tags (List[TagWithId] or None) – The tags associated with the registered model version.
* mlpkg_file_contents (str or None) – The contents of the model package file.
name (str or None) – Name of the version (model package).
prediction_threshold (float or None) – Threshold used for binary classification in predictions.
distribution_prediction_model_id (str or None) – ID of the DataRobot distribution prediction model
trained on predictions from the DataRobot model.
description (str or None) – Description of the version (model package).
compute_all_ts_intervals (bool or None) – Whether to compute all time series prediction intervals (1-100 percentiles).
registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version).
The model package (version) will be created as version 1 of the created registered model.
If neither registeredModelName nor registeredModelId is provided,
it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID.
Mutually exclusive with registeredModelName.
tags (Optional[List[Tag]]) – Tags for the registered model version.
registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
registered_model_description (Optional[str]) – Description for the registered model.
Returns:regitered_model_version – A new registered model version object.
Create a new registered model version from an external model.
Parameters:
name (str) – Name of the registered model version.
target (ExternalTarget) – Target information for the registered model version.
model_id (Optional[str]) – Model ID of the registered model version.
model_description (Optional[ModelDescription]) – Information about the model.
datasets (Optional[ExternalDatasets]) – Dataset information for the registered model version.
timeseries (Optional[Timeseries]) – Timeseries properties for the registered model version.
registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version).
The model package (version) will be created as version 1 of the created registered model.
If neither registeredModelName nor registeredModelId is provided,
it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID.
Mutually exclusive with registeredModelName.
tags (Optional[List[Tag]]) – Tags for the registered model version.
registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
registered_model_description (Optional[str]) – Description for the registered model.
geospatial_monitoring (Optional[ExternalGeospatialMonitoring]) – Geospatial monitoring settings for the registered model version.
Returns:registered_model_version – A new registered model version object.
Create a new registered model version from a custom model version.
Parameters:
custom_model_version_id (str) – ID of the custom model version.
name (Optional[str]) – Name of the registered model version.
description (Optional[str]) – Description of the registered model version.
registered_model_name (Optional[str]) – Name of the new registered model that will be created from this model package (version).
The model package (version) will be created as version 1 of the created registered model.
If neither registeredModelName nor registeredModelId is provided,
it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
registered_model_id (Optional[str]) – Creates a model package (version) as a new version for the provided registered model ID.
Mutually exclusive with registeredModelName.
tags (Optional[List[Tag]]) – Tags for the registered model version.
registered_model_tags (Optional[List[Tag]]) – Tags for the registered model.
registered_model_description (Optional[str]) – Description for the registered model.
Returns:registered_model_version – A new registered model version object.
Returns:deployments – A list of deployments associated with this registered model version.
Return type:List[VersionAssociatedDeployment]
class datarobot.models.model_registry.deployment.VersionAssociatedDeployment¶
Represents a deployment associated with a registered model version.
Parameters:
id (str) – The ID of the deployment.
currently_deployed (bool) – Whether this version is currently deployed.
registered_model_version (int) – The version of the registered model associated with this deployment.
is_challenger (bool) – Whether the version associated with this deployment is a challenger.
status (str) – The status of the deployment.
label (Optional[str]) – The label of the deployment.
first_deployed_at (datetime.datetime, optional) – The time the version was first deployed.
first_deployed_by (UserMetadata, optional) – The user who first deployed the version.
created_by (UserMetadata, optional) – The user who created the deployment.
prediction_environment (DeploymentPredictionEnvironment, optional) – The prediction environment of the deployment.
class datarobot.models.model_registry.RegisteredModelVersionsListFilters¶
Filters for listing of registered model versions.
Parameters:
target_name (str or None) – Name of the target to filter by.
target_type (str or None) – Type of the target to filter by.
compatible_with_leaderboard_model_id (str or None.) – If specified, limit results to versions (model packages) of the Leaderboard model with the specified ID.
compatible_with_model_package_id (str or None.) – Returns versions compatible with the given model package (version) ID. If used, it will only return versions
that match target.name, target.type, target.classNames (for classification models),
modelKind.isTimeSeries and modelKind.isMultiseries for the specified model package (version).
for_challenger (bool or None) – Can be used with compatibleWithModelPackageId to request similar versions that can be used as challenger
models; for external model packages (versions), instead of returning similar external model packages (versions),
similar DataRobot and Custom model packages (versions) will be retrieved.
prediction_threshold (float or None) – Return versions with the specified prediction threshold used for binary classification models.
imported (bool or None) – If specified, return either imported (true) or non-imported (false) versions (model packages).
prediction_environment_id (str or None) – Can be used to filter versions (model packages) by what is supported by the prediction environment
model_kind (str or None) – Can be used to filter versions (model packages) by model kind.
build_status (str or None) – If specified, filter versions by the build status.
class datarobot.models.model_registry.RegisteredModelListFilters¶
Filters for listing registered models.
Parameters:
created_at_start (datetime.datetime) – Registered models created on or after this timestamp.
created_at_end (datetime.datetime) – Registered models created before this timestamp. Defaults to the current time.
modified_at_start (datetime.datetime) – Registered models modified on or after this timestamp.
modified_at_end (datetime.datetime) – Registered models modified before this timestamp. Defaults to the current time.
target_name (str) – Name of the target to filter by.
target_type (str) – Type of the target to filter by.
created_by (str) – Email of the user that created registered model to filter by.
compatible_with_leaderboard_model_id (str) – If specified, limit results to registered models containing versions (model packages)
for the leaderboard model with the specified ID.
compatible_with_model_package_id (str) – Return registered models that have versions (model packages) compatible with given model package (version) ID.
If used, will only return registered models which have versions that match target.name, target.type,
target.classNames (for classification models), modelKind.isTimeSeries, and modelKind.isMultiseries
of the specified model package (version).
for_challenger (bool) – Can be used with compatibleWithModelPackageId to request similar registered models that contain
versions (model packages) that can be used as challenger models; for external model packages (versions),
instead of returning similar external model packages (versions), similar DataRobot and Custom model packages
will be retrieved.
prediction_threshold (float) – If specified, return any registered models containing one or more versions matching the prediction
threshold used for binary classification models.
imported (bool) – If specified, return any registered models that contain either imported (true) or non-imported (false)
versions (model packages).
prediction_environment_id (str) – Can be used to filter registered models by what is supported by the prediction environment.
model_kind (str) – Return models that contain versions matching a specific format.
build_status (str) – If specified, only return models that have versions with specified build status.