# Create custom tasks

> Create custom tasks - Describes how to create and apply custom tasks and work with the resulting
> custom blueprints.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.602243+00:00` (UTC).

## Primary page

- [Create custom tasks](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html): Full documentation for this topic (HTML).

## Sections on this page

- [Understand custom tasks](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#understand-custom-tasks): In-page section heading.
- [Components of a custom task](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#components-of-a-custom-task): In-page section heading.
- [Task types](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#task-types): In-page section heading.
- [Use a custom task](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#use-a-custom-task): In-page section heading.
- [Understand task content](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#understand-task-content): In-page section heading.
- [custom.py/custom.R](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#custompy-customr): In-page section heading.
- [model-metadata.yaml](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#model-metadatayaml): In-page section heading.
- [requirements.txt](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#requirementstxt): In-page section heading.
- [Define task code](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#define-task-code): In-page section heading.
- [init()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#init): In-page section heading.
- [init()example](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#init-model): In-page section heading.
- [init()input](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#init-input): In-page section heading.
- [init()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#init-output): In-page section heading.
- [fit()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#fit): In-page section heading.
- [fit()examples](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#fit-examples): In-page section heading.
- [Howfit()works](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#how-fit-works): In-page section heading.
- [How to usefit()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#how-to-use-fit): In-page section heading.
- [fit()input parameters](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#fit-input-parameters): In-page section heading.
- [fit()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#fit-output): In-page section heading.
- [load_model()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#load-model): In-page section heading.
- [load_model()example](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#load-model-example): In-page section heading.
- [load_model()input](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#load-model-input): In-page section heading.
- [load_model()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#load-model-output): In-page section heading.
- [predict()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict): In-page section heading.
- [predict()examples](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-examples): In-page section heading.
- [predict()input](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-input): In-page section heading.
- [predict()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-output): In-page section heading.
- [predict_proba()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-proba): In-page section heading.
- [predict_proba()examples](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-proba-examples): In-page section heading.
- [predict_proba()input](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-proba-input): In-page section heading.
- [predict_proba()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#predict-proba-output): In-page section heading.
- [transform()](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#transform): In-page section heading.
- [transform()example](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#transform-example): In-page section heading.
- [transform()input](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#transform-input): In-page section heading.
- [transform()output](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#transform-output): In-page section heading.
- [Define task metadata](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#define-task-metadata): In-page section heading.
- [Define the task environment](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#define-the-task-environment): In-page section heading.
- [Test the task locally](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#test-the-task-locally): In-page section heading.
- [Prerequisites](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#prerequisites): In-page section heading.
- [Test compatibility with DataRobot](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#test-compatibility-with-datarobot): In-page section heading.
- [Test task logic](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#test-task-logic): In-page section heading.
- [Upload the task](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#upload-the-task): In-page section heading.
- [Updating code](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#updating-code): In-page section heading.
- [Compose and train a blueprint](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#compose-and-train-a-blueprint): In-page section heading.
- [Single-task blueprint](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#single-task-blueprint): In-page section heading.
- [Multitask blueprint](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#multitask-blueprint): In-page section heading.
- [Get insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#get-insights): In-page section heading.
- [Built-in insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#built-in-insights): In-page section heading.
- [Custom insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#custom-insights): In-page section heading.
- [Deploy](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#deploy): In-page section heading.
- [Download training artifacts](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#download-training-artifacts): In-page section heading.
- [Implicit sharing](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#implicit-sharing): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [Modeling](https://docs.datarobot.com/en/docs/classic-ui/modeling/index.html): Linked from this page.
- [Specialized workflows](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/index.html): Linked from this page.
- [Composable ML](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/index.html): Linked from this page.
- [here](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-overview.html#how-it-works): Linked from this page.
- [container environment](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-env.html): Linked from this page.
- [how these tasks work](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-blueprint-edit.html#blueprint-task-types): Linked from this page.
- [Create and train a blueprint](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-quickstart.html#apply-new-task-and-train): Linked from this page.
- [Share a task](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/custom-models/custom-model-workshop/custom-model-actions.html): Linked from this page.
- [here](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/cml-ref/cml-validation.html): Linked from this page.
- [Install DRUM](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-drum.html): Linked from this page.
- [deployed, monitored, and governed](https://docs.datarobot.com/en/docs/api/dev-learning/python/mlops/index.html): Linked from this page.

## Documentation content

# Create custom tasks

While DataRobot provides hundreds of built-in tasks, there are situations where you need preprocessing or modeling methods that are not currently supported out-of-the-box. To fill this gap, you can bring a custom task that implements a missing method, plug that task into a blueprint inside DataRobot, and then train, evaluate, and deploy that blueprint in the same way as you would for any DataRobot-generated blueprint. (You can review how the process works [here](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-overview.html#how-it-works).)

The following sections describe creating and applying custom tasks and working with the resulting custom blueprints.

## Understand custom tasks

The following helps to understand, generally, what a task is and how to use it. It then provides an overview of task content.

### Components of a custom task

To bring and use a task, you need to define two components—the task’s content and a container environment where the task’s content will run:

- The task content (described on this page) is code written in Python or R. To be correctly parsed by DataRobot, the code must follow certain criteria. (Optional) You can add files that will be uploaded and used together with the task’s code (for example, you might want to add a separate file with a dictionary if your custom task contains text preprocessing).
- Thecontainer environmentis defined using a Docker file, and additional files, that will allow DataRobot to build an image where the task will run. There are a variety of built-in environments; users only need to build their own environment when they need to install Linux packages.

At a high level, the steps to define a custom task include:

1. Define and test task content locally (i.e., on your computer).
2. (Optional) Create a container environment where the task will run.
3. Upload the task content and environment (if applicable) into DataRobot.

### Task types

When creating a task, you must choose the one most appropriate for your project. DataRobot leverages two types of tasks—estimators and transforms—similar to sklearn. See the blueprint modification page to learn [how these tasks work](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-blueprint-edit.html#blueprint-task-types).

### Use a custom task

Once a task is uploaded, you can:

- Create and train a blueprintthat contains that custom task. The blueprint will then appear on the project’s Leaderboard and can be used just like any other blueprint—in just a few clicks, you can compare it with other models, access model-agnostic insights, and deploy, monitor, and govern the resulting model.
- Share a taskexplicitly within your organization in the same way you share an environment. This can be particularly useful when you want to re-use the task in a future project. Additionally, because recipients don’t need to read and understand the task's code in order to use it, it can be applied by less technical colleagues. Custom tasks are alsoimplicitlyshared when a project or blueprint is shared.

### Understand task content

To define a custom task, create a local folder containing the files listed in the table below (detailed descriptions follow the table).

> [!TIP] Tip
> You can find examples of these files in the [DataRobot task template repository](https://github.com/datarobot/datarobot-user-models/tree/master/task_templates) on GitHub.

| File | Description | Required |
| --- | --- | --- |
| custom.py or custom.R | The task code that DataRobot will run in training and predictions. | Yes |
| model-metadata.yaml | A file describing task's metadata, including input/output data requirements. | Required for custom transform tasks when a custom task outputs non-numeric data. If not provided, a default schema is used. |
| requirements.txt | A list of Python or R packages to add to the base environment. | No |
| Additional files | Other files used by the task (for example, a file that defines helper functions used inside custom.py). | No |

#### custom.py/custom.R

The `custom.py` / `custom.R` file defines a custom task. It must contain the methods (functions) that enable DataRobot to correctly run the code and integrate it with other capabilities.

#### model-metadata.yaml

For a custom task, you can supply a schema that can then be used to validate the task when building and training a blueprint. A schema lets you specify whether a custom task supports or outputs:

- Certain data types
- Missing values
- Sparse data
- A certain number of columns

#### requirements.txt

Use the `requirements.txt` file to pre-install Python or R packages that the custom task is using but are not a part of the base environment.

**Python example:**
For Python, provide a list of packages with their versions (1 package per row). For example:

```
numpy>=1.16.0, <1.19.0
pandas==1.1.0
scikit-learn==0.23.1
lightgbm==3.0.0
gensim==3.8.3
sagemaker-scikit-learn-extension==1.1.0
```

**R example:**
For R, provide a list of packages without versions (1 package per row). For example:

```
dplyr
stats
```


## Define task code

To define a custom task using DataRobot’s framework, your code must meet certain criteria:

- It must have acustom.pyorcustom.Rfile.
- Thecustom.py/custom.Rfile must have methods, such asfit(),score(), ortransform(), that define how a task is trained and how it scores new data. These are provided as interface classes or hooks. DataRobot automatically calls each one and passes the parameters based on the project and blueprint configuration. However, you have full flexibility to define the logic that runs inside each method.

View [an example on GitHub](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/1_transforms/1_python_missing_values/custom.py) of a task implementing missing values imputation using a median.

> [!NOTE] Note
> Log in to GitHub before accessing these GitHub resources.

The following table lists the available methods. Note that most tasks only require the `fit()` method. Classification tasks (binary or multiclass) must have `predict_proba()`, regression tasks require `predict()`,  and transforms must have `transform()`. Other functions can be omitted.

| Method | Purpose |
| --- | --- |
| init() | Load R libraries and files (R only, can be omitted for Python). |
| fit() | Train an estimator/transform task and store it in an artifact file. |
| load_model() | Load the trained estimator/transform from the artifact file. |
| predict or predict_proba (For hook, use score()) | Define the logic used by a custom estimator to generate predictions. |
| transform() | Define the logic used by a custom transform to generate transformed data. |

The schema below illustrates how methods work together in a custom task. In some cases, some methods can be omitted, although `fit()` is always required during training.

The following sections describe each function, with examples.

### init()

The `init` method allows the task to load libraries and additional files for use in other methods. It is required when using R but can typically be skipped with Python.

#### init() example

The following provides a brief code snippet using `init()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/2_estimators/5_r_binary_classification/custom.R).

**R example:**
```
init <- function(code_dir) {
   library(tidyverse)
   library(caret)
   library(recipes)
   library(gbm)
   source(file.path(code_dir, 'create_pipeline.R'))
}
```


#### init() input

| Input parameter | Description |
| --- | --- |
| code_dir | A link to the folder where the code is stored. |

#### init() output

The `init()` method does not return anything.

### fit()

`fit()` must be implemented for any custom task.

#### fit() examples

The following provides a brief code snippet using `fit()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/2_estimators/4_python_binary_classification/custom.py).

**Python example:**
The following is a Python example of `fit()` implementing Logistic Regression:

```
def fit(X, y, output_dir, class_order, row_weights):
 estimator = LogisticRegression()
 estimator.fit(X, y)

 output_dir_path = Path(output_dir)
 if output_dir_path.exists() and output_dir_path.is_dir():
     with open("{}/artifact.pkl".format(output_dir), "wb") as fp:
         pickle.dump(estimator, fp)
```

**R example:**
The following is an example of R creating a regression model:

```
fit <- function(X, y, output_dir, class_order=NULL, row_weights=NULL){
   model <- create_pipeline(X, y, 'regression')

  model_path <- file.path(output_dir, 'artifact.rds')
  saveRDS(model, file = model_path)
}
```


#### How fit() works

DataRobot runs `fit()` when a custom estimator/transform is being trained. It creates an artifact file (e.g., a `.pkl` file) where the trained object, such as a trained sklearn model, is stored. The trained object is loaded from the artifact and then passed as a parameter to `score()` and `transform()` when scoring data.

#### How to use fit()

To use, train and put a trained object into an artifact file (e.g., `.pkl`) inside the `fit()` function. The trained object must contain the information or logic used to score new data. Some examples of trained objects:

- A fitted sklearn estimator .
- A median of training data, for a missing value imputation using a median. When scoring new data, it is used to replace missing values.

DataRobot automatically uses training/validation/holdout partitions based on project settings.

#### fit() input parameters

The `fit()` task takes the following parameters:

| Input parameters | Description |
| --- | --- |
| X | A pandas DataFrame (Python) or R data.frame (R) containing data the task receives during training. |
| y | A pandas Series (Python) or R vector/factor (R) containing project's target data. |
| output_dir | A path to the output folder. The artifact containing the trained object must be saved to this folder. You can also save other files there and once the blueprint is trained, all files added into that folder during fit are downloadable via the UI using the Artifact Download. |
| class_order | Only passed for a binary classification estimator. A list containing the names of classes. The first entry is the class that is considered negative inside DataRobot's project; the second class is the class that is considered positive. |
| row_weights | Only passed in estimator tasks. A list of weights passed when the project uses weights or smart downsampling. |
| **kwargs | Not currently used but maintained for future compatibility. |

#### fit() output

Notes on `fit()` output:

- fit()does not return anything, but it creates an artifact containing the trained object.
- When no trained object is required (for example, a transform task implementing log transformation), create an “artificial” artifact by storing a number or a string in an artifact file. Otherwise (iffit()doesn't output an artifact), you must useload_model, which makes the task more complex.
- The artifact must be saved into theoutput_dirfolder.
- The artifact can use any format.
- Some formats are natively supported. Whenoutput_dircontains exactly one artifact file in a natively supported format, DataRobot automatically picks that artifact when scoring/transforming data. This way, you do not need to write a customload_modelmethod.
- Natively supported formats include:

### load_model()

The `load_model()` method loads one or more trained objects from the artifact(s). It is only required when a trained object is stored in an artifact that uses an unsupported format or when multiple artifacts are used.`load_model()` is not required when there is a single artifact in one of the supported formats:

- Python: .pkl , .pth , .h5 , .joblib
- Java: .mojo
- R: .rds

#### load_model() example

The following provides a brief code snippet using `load_model()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/3_pipelines/14_python3_keras_joblib/custom.py).

**Python example:**
In the following example, replace `deserialize_artifact` with an actual function you use to parse the artifact:

```
def load_model(code_dir: str):
    return deserialize_artifact(code_dir)
```

**R example:**
```
load_model <- function(code_dir) {
   return(deserialize_artifact(code_dir))
}
```


#### load_model() input

| Input parameter | Description |
| --- | --- |
| code_dir | A link to the folder where the artifact is stored. |

#### load_model() output

The `load_model()` method returns a trained object (of any type).

### predict()

The `predict()` method defines how DataRobot uses the trained object from `fit()` to score new data. DataRobot runs this method when the task is used for scoring inside a blueprint. This method is only usable for regression and anomaly tasks. Note that for R, instead use the `score()` hook outlined in the examples below.

#### predict() examples

The following provides a brief code snippet using `predict()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/2_estimators/1_python_regression/custom.py#L45).

**Python examples:**
Python example for a regression or anomaly estimator:

```
def predict(self, data: pd.DataFrame, **kwargs):
    return pd.DataFrame(data=self.estimator.predict(data), columns=["Predictions"])
```

**R examples:**
R example for a regression or anomaly estimator:

```
score <- function(data, model, ...) {
  return(data.frame(Predictions = predict(model, newdata=data, type = response")))
}
```

R example for a binary estimator:

```
score <- function(data, model, ...) {
  scores <- predict(model, data, type = "response")
  scores_df <- data.frame('c1' = scores, 'c2' = 1- scores)
  names(scores_df) <- c("class1", "class2")
  return(scores_df)
}
```


#### predict() input

| Input parameter | Description |
| --- | --- |
| data | A pandas DataFrame (Python) or R data.frame (R) containing the data the custom task will score. |
| **kwargs | Not currently used but maintained for future compatibility. (For R, use score(data, model, …)) |

#### predict() output

Notes on `predict()` output:

- Returns a pandas DataFrame (or R data.frame/tibble).
- For regression or anomaly detection projects, the output must contain a single numeric column namedPredictions.

### predict_proba()

The `predict_proba()` method defines how DataRobot uses the trained object from `fit()` to score new data. This method is only usable for binary and multiclass tasks. DataRobot runs this method when the task is used for scoring inside a blueprint. Note that for R, you instead use the `score()` hook used in the examples below.

#### predict_proba() examples

The following provides a brief code snippet using `predict_proba()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/2_estimators/4_python_binary_classification/custom.py#L40).

**Python examples:**
Python example for a binary or multiclass estimator:

```
def predict_proba(self, data: pd.DataFrame, **kwargs) -> pd.DataFrame:
    return pd.DataFrame(
        data=self.estimator.predict_proba(data), columns=self.estimator.classes_
    )
```

**R examples:**
R example for a regression or anomaly estimator:

```
score <- function(data, model, ...) {
  return(data.frame(Predictions = predict(model, newdata=data, type = response")))
}
```

R example for a binary estimator:

```
score <- function(data, model, ...) {
  scores <- predict(model, data, type = "response")
  scores_df <- data.frame('c1' = scores, 'c2' = 1- scores)
  names(scores_df) <- c("class1", "class2")
  return(scores_df)
}
```


#### predict_proba() input

| Input parameter | Description |
| --- | --- |
| data | A pandas DataFrame (Python) or R data.frame (R) containing the data the custom task will score. |
| **kwargs | Not currently used but maintained for future compatibility. (For R, use score(data, model, …)) |

#### predict_proba() output

Notes on `predict_proba()` output:

- Returns a pandas DataFrame (or R data.frame/tibble).
- For binary or multiclass projects, output must have one column per class, with class names used as column names. Each cell must contain the probability of the respective class, and each row must sum up to 1.0.

### transform()

The `transform()` method defines the output of a custom transform and returns transformed data. Do not use this method for estimator tasks.

#### transform() example

The following provides a brief code snippet using `transform()`; see a more complete example [here](https://github.com/datarobot/datarobot-user-models/blob/master/task_templates/1_transforms/1_python_missing_values/custom.py).

**Python example:**
A Python example that creates a transform and outputs to a dataframe:

```
def transform(X: pd.DataFrame, transformer) -> pd.DataFrame:
  return transformer.transform(X)
```

**R example:**
```
transform <- function(X, transformer, ...){
   X_median <- transformer

   for (i in 1:ncol(X)) {
   X[is.na(X[,i]), i] <- X_median[i]
   }
     X
}
```


#### transform() input

| Input parameter | Description |
| --- | --- |
| X | A pandas DataFrame (Python) or R data.frame (R) containing data the custom task should transform. |
| transformer | A trained object loaded from the artifact (typically, a trained transformer). |
| **kwargs | Not currently used but maintained for future compatibility. |

#### transform() output

The `transform()` method returns a pandas DataFrame or R data.frame with transformed data.

## Define task metadata

To define metadata, create a `model-metadata.yaml` file and put it in the top level of the task/model directory. The file specifies additional information about a custom task and is described in detail [here](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/cml-ref/cml-validation.html).

## Define the task environment

There are multiple options for defining the environment where a custom task runs. You can:

- Choose from a variety of built-in environments.
- If a built-in environment is missing Python or R packages, add missing packages by specifying them in the task'srequirements.txtfile. If provided,requirements.txtmust be uploaded together withcustom.pyorcustom.Rin the task content. If task content contains subfolders, it must be placed in the top folder.
- You canbuild your own environmentif you need to install Linux packages.

## Test the task locally

While it is not a requirement that you test the task locally before uploading it to DataRobot, it is strongly recommended. Validating functionality in advance can save much time and debugging in the future.

A custom task must meet the following basic requirements to be successful:

- The task is compatible with DataRobot requirements and can be used to build a blueprint.
- The task works as intended (for example, a transform produces the output you need).

Use `drum fit` in the command line to quickly run and test your task. It will automatically validate that the task meets DataRobot requirements. To test that the task works as intended, combine `drum fit` with other popular debugging methods, such as printing output to a terminal or file.

### Prerequisites

To test your task:

- Put the task's content into a single folder.
- Install DRUM. Ensure that the Python environment where DRUM is installed is activated. Preferably, also installDocker Desktop.
- Create a CSV file with test data you can use when testing a task.
- Because you will use the command line to run tests, open a terminal window.

### Test compatibility with DataRobot

The following provides an example of using `drum fit` to test whether a task is compatible with DataRobot blueprints. To learn more about using `drum fit`, type `drum fit --help` in the command line.

For a custom task (estimator or transform), use the following basic command in your terminal. Replace placeholder names in `< >` brackets with actual paths and names. Note that the following options are available for TARGET_TYPE:

- For estimators: binary, multiclass, regression, anomaly
- For transforms: transform

```
drum fit --code-dir <folder_with_task_content> --input <test_data.csv>  --target-type <TARGET_TYPE> --target <target_column_name> --docker <folder_with_dockerfile> --verbose
```

Note that the `target` parameter should be omitted when it is not used during training (for example, in case of anomaly detection estimators or some transform tasks). In that case, a command could look like this:

```
drum fit --code-dir <folder_with_task_content> --input <test_data.csv>  --target-type anomaly --docker <folder_with_dockerfile> --verbose
```

### Test task logic

To confirm a task works as intended, combine `drum fit` with other debugging methods, such as adding "print" statements into the task's code:

- Add print(msg) into one of the methods; when running a task using drum fit , DataRobot will print the message in the terminal.
- Write intermediate or final results into a local file for later inspections, which could help to confirm that a custom task works as expected.

## Upload the task

Once a task's content is defined, upload it into DataRobot to use it to build and train a blueprint. Uploading a custom task into DataRobot involves three steps:

1. Create a new task in the Model Registry .
2. Select a container environment where the task will run.
3. Upload the task content .

Once uploaded, the custom task appears in the list of tasks available to the blueprint editor.

### Updating code

You can always upload updated code. To avoid conflicts, DataRobot creates a new version each time code is uploaded. When creating a blueprint, you can select the specific task version to use in your blueprint.

## Compose and train a blueprint

Once a custom task is created, there are two options for composing a blueprint that uses the task:

- Compose a single-task blueprint, using only the task (estimator only) that you created.
- Create a multitask blueprint using the blueprint editor .

### Single-task blueprint

If your custom estimator task contains all the necessary training code, you can build and train a single-task blueprint. To do so, navigate to Model Registry > Custom Model Workshop > Tasks. Select the task and click Train new model:

When complete, a blueprint containing the selected task appears in the project's Leaderboard.

### Multitask blueprint

To compose a blueprint containing more than one task, use the blueprint editor. Below is a summary of the steps; see [the documentation](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-blueprint-edit.html) for complete details.

1. From the project Leaderboard,Repository, orBlueprintstab in the AI Catalog, select a blueprint to use as a template for your new blueprint.
2. Navigate to theBlueprintview and start editing the selected blueprint.
3. Select an existing task or add a new one, then select a custom task from the dropdown of built-in and custom tasks.
4. Save and then train the new blueprint by clickingTrain. A model containing the selected task appears in the project's Leaderboard.

## Get insights

You can use DataRobot insights to help evaluate the models that result from your custom blueprints.

### Built-in insights

Once a blueprint is trained, it appears in the project Leaderboard where you can easily compare accuracy with other models.[Metrics and model-agnostic insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-quickstart.html#evaluate-and-deploy) are available just as for DataRobot models.

### Custom insights

You can generate custom insights by [creating artifacts](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-tasks.html#download-training-artifacts) during training. Additionally, you can generate insights using the Predictions API, just as for any other Leaderboard model.

> [!TIP] Tip
> Custom insights are additional views that help to understand how a model works. They may come in the form of a visualization or a CSV file. For example, if you wanted to leverage [LIME's model-agnostic insights](https://cran.r-project.org/web/packages/lime/index.html), you could import that package, run it on the trained model in the `custom.py` or other helper files, and then write out the resulting model insights.

## Deploy

Once a model containing a custom task is trained, it can be [deployed, monitored, and governed](https://docs.datarobot.com/en/docs/api/dev-learning/python/mlops/index.html) just like any other DataRobot model.

## Download training artifacts

When training a blueprint with a custom task, DataRobot creates an artifact available for download. Any file that is put into `output_dir` inside `fit()` of a custom task becomes a part of the artifact. You can use the artifact to:

- Generate custom insights during training. For this, generate file(s) (such as image or text files) as a part of thefit()function. Write them tooutput_dir.
- Download a trained model (for example, as a.pklfile) that you can then load locally to generate additional insights or to deploy outside of DataRobot.

To download an artifact for a model, navigate to Predict > Downloads > Artifact Download.

You can also [download the code of any environment](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/cml/cml-custom-env.html#share-and-download) you have access to. To download, click on an environment, select the version, and click Download.

## Implicit sharing

A task or environment is not available in the model registry to other users unless it was explicitly shared. That does not, however, limit users' ability to use blueprints that include that task. This is known as implicit sharing.

For example, consider a project shared by User A and User B. If User A creates a new task, and then creates a blueprint using that task, User B can still interact with that blueprint (clone, modify, rerun, etc.) regardless of whether they have Read access to any custom task within that blueprint. Because every task is associated with an environment, implicit sharing applies to environments as well. User A can also [explicitly share](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/custom-models/custom-model-workshop/custom-model-actions.html) just the task or environment, as needed.

Implicit sharing is unique permission model that grants Execute access to everyone in the custom task author’s organization. When a user has access to a blueprint (but not necessarily explicit access to a custom task in that blueprint) Execute access allows:

- Interacting with the resulting model. For example, retraining, running Feature Impact and Feature Effects, deploying, and making batch predictions.
- Cloning and editing a blueprint from the shared project, and then saving the blueprint as their own.
- Viewing and downloading Leaderboard logs.

Some capabilities that Execute access does not allow include:

- Downloading the custom task artifact.
- Viewing, modifying, or deleting the custom task from the model registry.
- Using the task in another blueprint. (Instead you would clone the blueprint containing the task and edit the blueprint and/or task.)
