Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Automation

Note

Automation is a legacy feature that provided you the option to automate individual projects and datasets. Automatic Project Flows (APF), introduced in the 2019.1 release, allows you to intelligently operationalize curated data flows. The new APF feature computes the entire sequence of data prep steps across Data Prep projects, datasets and AnswerSets to produce an end-to-end, automated output Flow for your data. For customers who are currently using the 2018.2 Automation feature and are ready to upgrade their automated jobs to APF, contact your DataRobot representative for assistance.

There are two types of workload automation that reduce the number of repetitive tasks taken to produce AnswerSets: library automation and project automation.

Library automation

When you automate a data library dataset, you schedule it to automatically pull an update from its source based on a schedule you define. During the automation process, a dataset is updated with new versions of the data using the import and parse options specified when the file was originally uploaded into the data library. However, when you set up a dataset for automation, you have the option to modify those parse options.

Note

You cannot automate datasets on your local system.

Project automation

When you schedule a project for automation, you set it up to automatically publish an AnswerSet to the data library based on the schedule and parameters you define. The AnswerSet can also be exported to an external data source, for example AWS S3.

Note

Project lenses are essential for project automation because they define the publishing points to use for your automated jobs. In order to automate a project, you must have a lens defined for each point in the project where you want to publish data. You must have at least one lens defined in your project, otherwise no data can be published. For more information on lenses, see the article for Project lenses.

After you configure automation schedules for data library files and/or projects, both are collectively referred to as automation “jobs.” The Automation dashboard provides you with details of all automation schedules and the status of all automation jobs.

Data library automation configuration page

To open the data library automation configuration page:

  1. Open the data library.
  2. Locate the file you want to automate.
  3. Click the More Actions button that displays, then select the Automation option. The configuration page opens:

Job name and job description

The dataset's name and description are listed in these fields. These are initial default values from when the file was originally imported into the data library. They can be changed here by entering new information into the fields.

Note

You may also notice a check box option for Set me as the owner of this automated schedule. This option only appears if you are not the person who initially set up this dataset for automation or not the person who currently owns its automated schedule. Ownership is significant because it provides a way to identify and audit users who are running automated jobs in the system. Typically, this option is used when automation responsibilities are transitioned to a new person in an organization. If you take ownership of an automation job, you must have all of the permissions that are required to perform every operation performed by the automation.

Schedules

Any upcoming schedules for the dataset are displayed here. The Add button allows you to set up new schedules. The Deactivate link in this pane allows you to indefinitely suspend all scheduled jobs for this dataset until you return and click the Reactivate button. To set up schedules, see Set up a data library dataset for automation.

Notifications

Email notifications can be sent to notify users of either a successful upload into the data library or errors that have occurred. To set up notifications, see Set up a data library dataset for automation.

Importing from

These are the connection parameters inherited from the most recent upload of the dataset. To change these connection parameters, manually upload a new version of the file to the data library with new parameters. Automation will then use the new connection parameters in its next scheduled update.

Import parsing options

For file-based datasets, the import parse options are displayed below the connection details. The import options are inherited from the most recent version of the dataset but are editable here.

Note

If you manually import another version of this dataset into the data library, the parse options you select for the manual upload will not be inherited from the automated version.

Set up a data library dataset for automation

You set up a dataset for automation by:

  • Setting up schedules
  • Setting up notifications
  • Saving your automation configuration settings

Set up schedules

Click Add to set up a new time for the dataset to be updated by automation. The default setting for dataset automation frequency is to repeat on the time and day you specify. The Repeat toggle button lets you switch the automation to run Once at the time you specify.

To set up recurring updates:

  1. Use the up and down arrows to adjust the time.
  2. Toggle the PM or AM button to select the correct period.
  3. Select the frequency: week, day, or month. The default is week. Click in the field to make a different selection.
  4. Depending on your frequency selection, specify the day of the week or date in the month.
  5. Click Okay to add the schedule. Your newly added schedule then appears. Click the pencil icon to edit it or the X button to delete it.

Note

  • The time you select is based on your current time zone.
  • The time, day, or date you select here must be in the future. For example, if it is currently 1pm on Monday and you set up the automation to run at 10am every Monday, the automation will not run today for this file.
  • Datasets that are on your own local system cannot be automated.

To schedule a single update:

  1. Click in the date field to open a calendar picker.
  2. Use the up and down arrows to adjust the time.
  3. Toggle the PM or AM button to select the correct period.
  4. Click Okay to add the schedule. Your newly added schedule appears. Click the pencil icon to edit it or the X button to delete it.

Note

  • The time you select for a schedule is based on your current time zone.
  • When configuring a file’s automation to run only once, do not set the job’s start time too near to the current time. Your local computer’s clock may not be precisely in sync with the web server that will process the job. If your local computer’s clock is running behind the web server’s clock, the time you specify for the job may have already passed on the web server. In this case, your job will not start.
  • If you want to test one automated run of this dataset, use the Add to Queue feature instead of setting it up to run Once. For details on this feature, see Save your automation configuration settings.
  • Datasets that are on your own local system cannot be automated.

Review the following considerations when setting up a dataset for automation:

  • If an automated project uses this dataset for input, you must ensure a safe buffer of time for this dataset update to finish uploading in the data library before the automated run of the project begins.
  • The time you specify here is when this job will be added to the queue for uploading and not necessarily the start time for the automated import.

Set up notifications

Email notifications can be sent to notify users of either a successful upload into the data library or errors that have occurred. An error email provides a link to the file's log file where you can determine the cause of any errors.

To set up notifications:

  1. Click the dropdown menu to select which type of email notification to send: "Errors" or "Success".
  2. Add the email address and press Enter.

Important considerations:

  • An email address can only be added once for each notification type.
  • Recipients must have the required system permissions to view the automation results.

Save your automation configuration settings

Click Save at the top of the configuration form to save all of your settings. After saving, notice the Add to Queue button that displays.

The button allows you to add this automation job to the queue of upcoming jobs that will be run the next time automation starts. This option is useful if you want to test out this automation configuration without having to wait for its scheduled run time.

Tip

The Automation pane provides details of when the next automated run is scheduled to start. You can quickly navigate to the Schedules pane by clicking the View Schedules Now link that displays in the header after you click the Add to Queue button.

Project automation configuration page

To open the project automation page, open your project and click the automation status button:

The Automation configuration page opens:

Job name, job description, and project

The job’s name and description are listed in these fields. This is the name and description provided when the project was originally created. It can be changed here for the automated version of your project by entering new information into the fields.

The project field provides a link that automatically opens the project you are setting up for automation. This link is particularly useful if you have multiple Versions of a project but are automating one specific Version of that project in this configuration form.

Note

You may also notice a check box option for Set me as owner of this automated schedule. This option only appears if you are not the person who initially set up this project for automation or not the person who currently owns its automated schedule. Ownership is significant because it provides a way to identify and audit users who are running automated jobs in the system. Typically, this option is used when automation responsibilities are transitioned to a new person in an organization. If you take ownership of an automation job, you must have all of the permissions that are required to perform every operation performed by the automation.

Import datasets

The datasets you have already imported into your project are listed here. If no datasets are listed, verify that you have saved the most recent set of changes to your project steps.

Your "Base" dataset is listed and any "Lookup" or "Append" datasets for the project are listed above it.

If this project uses a dataset that is set up for automation in the data library, the schedule is displayed here. When using an automated dataset, consider its automation schedule and allow a safe buffer of time for the new version to be published in the data library.

To set up a dataset for automation before you configure automation for this project, click the Set it up now? link adjacent to the dataset’s name. You are taken to the data library scheduling page where you set up the automation parameters and schedule. See Set up a data library dataset for automation.

Note

The Use Latest Version default setting refers to the version of the dataset that is used for this automation configuration. See Set up a project for automation for details on which version you should select for automating this project.

Schedules

Any upcoming schedules for the project are displayed here. The Add button allows you to set up new schedules. The Deactivate link in this pane allows you to indefinitely suspend all scheduled jobs for this dataset until you return and click the Reactivate button. To set up schedules, see Set up a project for automation.

Notifications

Emails can be sent to notify users when automated projects finish updating or have errors. To set up notifications, see Set up a project for automation.

Publish AnswerSets

Select a lens for publishing an AnswerSet. A lens is pinned to a step in your project and creates a publishing point that can be used by automation to publish an AnswerSet. You can save the setup for this project’s automation without selecting a lens, but an automated run of this project will not succeed until you select a lens.

Automated projects are automatically published to the data library. However, automation can also be configured to export the published output to an external data source. See Set up a project for automation.

Set up a project for automation

You set up a project for automation by:

  • Importing datasets
  • Setting up schedules
  • Setting up notifications
  • Selecting lenses and publishing destinations
  • Saving your automation configuration settings

Import datasets

For each dataset used in your project, choose to use the Latest Version or Current Version for input:

  • Latest Version uses the most up-to-date version of the dataset in the data library when the automated job is run.

    Note

    • Using Latest Version will result in a new Version of your project each time this automated configuration runs. When selecting the latest version, an additional option is available to specify if an automated run should fail because the latest version of the dataset has a different layout (schema)—for example new columns added, removed columns that are not used in the project's steps, different column types for existing columns, new order, etc.
    • At least one of the datasets used for automating this project must be Latest Version. Otherwise, if no changes occur in the input datasets after an automated run of your project, the platform will not re-run this job.

  • Current Version pins the dataset in its current state for all future automated runs. Using Current Version may be useful when a static dataset serves as a reference table for your project.

Note

If there have been no changes to the input datasets since the last project automation run, automation will not run again for the project until there are changes to the project. Therefore, at least one of the datasets used for automating this project must be Latest Version.

Set up schedules

Click Add to set up a new time for this project to run. The default setting for project automation frequency is to repeat on the time and day you specify here. The Repeat button is a toggle that you can click to switch the automation to run Once at the time you specify.

To set up recurring runs:

  1. Use the up and down arrows or enter values in the fields to adjust the time.
  2. Toggle the PM or AM button to select the correct period.
  3. Select the frequency: week, day, or month. Note that week is the default. Click in the field to make a different selection.
  4. Depending on your frequency selection, specify the day of the week or date in the month.
  5. Click Okay to add the schedule.
  6. Your newly added schedule displays. Click the pencil icon to edit it or the X button to delete it.

Note

  • The time you select here is based on your current time zone.
  • The time, day, or date you select here must be in the future. For example, if it is currently 1 PM on Monday and you set up the automation to run at 10 AM every Monday, then automation will not run today for this file.

To set up a single run:

  1. Click in the date field to open a calendar picker.
  2. Use the up and down arrows to adjust the time.
  3. Toggle the PM or AM button to select the correct period.
  4. Click Okay to add the schedule.
  5. Your newly added schedule displays. Click the pencil icon to edit it or the X button to delete it.

Note

  • The time you select here is based on your current time zone.
  • When configuring a project’s automation to run only once, do not set the job’s start time too near to the current time. Your local computer’s clock may not be precisely in sync with the web server that will process the job. If your local computer’s clock is running behind the web server’s clock, the time you specify for the job may have already passed on the web server. In this case, your job will not start.
  • If you want to simply test one automated run of this project, use the Add to Queue feature instead of setting it up to run Once. For details, see Save your project automation configuration settings.

Important considerations when setting up a project for automation:

  • If a project’s automation depends on input from an automated data library file or an AnswerSet published from another automated project, ensure a safe buffer of time for all input updates to finish before the automated run of the project begins.
  • The time you specify in the automation set-up is when this project will be added to the queue for publishing an AnswerSet, and not necessarily the publishing start time.

Set up notifications

Emails can be sent to notify users when automated projects finish updating or have errors. An error email provides a link to the project's log file where you can determine the cause of any errors.

To set up notifications:

  1. Click the dropdown menu to select which type of email notification to send: "Errors" or "Success".

  2. Add the email address and press Enter.

Important considerations:

  • An email address can only be added once for each notification type.
  • Recipients must have the required system permissions to view the automation results.

Select lenses and publish destinations

To add a lens:

  1. Click the green Add button:

    A lens from the project is added to this automation configuration. By default, the lens that occurs earliest in your project steps is selected.

  2. To change the default selection, click the dropdown menu and select a different lens that currently exists in your project.

  3. To add additional lenses to be used for this automated run of your project, click the Add button and continue to select lenses.

To disable a lens:

  • Click the green On button for the lens to toggle it off

To remove a lens:

  • Click the X button for the lens.

    If you need to add a new lens for automating this project, you will need to open the project and add the lens on the desired step. For more help on adding lenses to your project, see Use lenses for publishing.

The default location for publishing this project when automation runs is Library Only. If you want to export the published output to an external data source, in addition to publishing it to the data library, click the dropdown menu and select Library & Data Source.

  • Name: The name that will be used for automated versions of this project.
  • Data Source Name: Click the dropdown menu to select an available data source. !!! note Only the data sources that have been configured for export and that you have permissions to access are displayed in the dropdown menu for Data Source Name. Contact your System Administrator if you don't see the data source you want to select for export.
  • Directory Path or Database Name: Provide the path or database on the data source where the export will be written.
  • Format: Depending on the data source you select for export, the option to select a file format is also available. Any applicable parsing options are also presented.
  • Credentials: The user credentials for writing to the selected data source are presented here. You can edit the credentials here.
  • Create unique name: When enabled, automation appends an underscore and time stamp to the file or table name for each successive automated export so that any previous exports of this project are not overwritten on the data source.

    Note

    If you enable this option for a JDBC data source, ensure that your system administrator has also enabled the Automatically Create Table option in the JDBC Connector form. Otherwise, automation for this project will fail.

Save your project automation configuration settings

Click the Save button in the upper right pane to save all configurations you have made for automating this project. After saving the automation schedule, notice the Add to Queue button that displays:

This button allows you to add this automation job to the queue of upcoming jobs that will be run the next time automation starts. This option is useful if you want to test out this automated configuration without having to wait for its scheduled run time.

Note

The Automation pane provides details of when the next automated run is scheduled to start and you can quickly navigate to the Schedules pane by clicking the View Schedules Now link that displays in the header after you click the Add to Queue button.

Automation dashboard

The Automation dashboard provides details and history for all data library files and projects that are set up to be automated. This is where you:

  • View your automation usage details.
  • View and manage the schedules for automated jobs.
  • View job execution history and statuses.
  • Re-run failed jobs.

The dashboard is organized by Schedules and Job Details.

Schedules

The Schedules page displays a list of all data library files and projects that are currently configured for automation. To view your automation usage details, mouse over the meters for additional information regarding the number of automated jobs you’ve already completed and the maximum number you can run for the day, week, or month.

The Schedules page can also be filtered in a variety of ways to display:

  • Active or Inactive jobs that have had their automation schedules deactivated.
  • Types of jobs—Project only or Library only.
  • Automation jobs that you own.
  • Job states—Success, Complete with Error, Error, or Over Limit. See Definition of job states for the meaning of each state.
  • Jobs that last finished during a date range that you specify or that will be run in the next automated run based on the range you provide.

You can rerun any job by clicking the Add to Queue link. This creates an internal schedule on-the-fly for the job, which triggers automation to run the job the next time the automation service wakes up to run regularly scheduled jobs. It's important to keep in mind that system resources must be available in order to run a queued job. For example, the number of threads allocated to run automation jobs must be sufficient. Otherwise, the job will remain in a queued state until resources become available to run it.

Note

  • To determine the errors before re-running a job, go to the Job Details tab and open the Results page for that job run.
  • Add to Queue for rerunning a job with errors does not count against your existing automation guardrail limits.

To make changes to a job's configuration settings, including deactivating it, click the job's name to open its configuration page. Then make and save the configuration changes.

Job Details

The Jobs Details page provides an audit trail for every executed automated run—including automated jobs that have been deleted. To view your automation usage details, mouse over the meters for additional information regarding the number of automated jobs you’ve already completed and the maximum number you can run for the day, week, or month.

You can filter the Job Details page in a variety of ways to display:

  • Types of jobs—Project only or Library only.
  • Job states—Success, Complete with Error, Error, Over Limit, Queued, Running. See Definition of job states for the meaning of each state.
  • Jobs that last started or last finished during a date range that you specify.

To display granular details for a job run, click the row for that job. The Results page for the job opens and displays a snapshot of the configuration settings used for this instance of the job run.

Note

Because this is a snapshot, the job settings may have changed since this automated run.

If this is a project job, click the View Lens link to open the project to the lens that was used to publish the AnswerSet for this run. Click the View AnswerSet link to view the AnswerSet published by this run.

If this is a library job, click the View Dataset link to open the file in the data library.

If any errors occurred during the run, they are displayed here. Download the log file for this job by clicking the Download log link:

Definition of job states

The following are possible states for an automated job.

  • Running: Job run is currently in progress.

  • Success: Job successfully finished with no errors.

  • Error: Job run failed.

  • Completed with errors: Job run completed, but there were errors that prevented a complete run—for example, a job that successfully published to the data library but was unable to export to the specified data source will complete with this type of error.

  • Queued: When a job is queued an internal schedule is created on-the-fly for it, which triggers automation to run the job the next time the automation service wakes up to run regularly scheduled jobs. However, it's important to keep in mind that system resources must be available in order to run a queued job. For example, the number of threads allocated to run automation jobs must be sufficient. Otherwise, the job will remain in a queued state until resources become available to run it.

  • Over limit: When a job run exceeds the daily, weekly or monthly guardrail limits, then the job fails with an Over limit error. Important notes:

    • Automation guardrails are enforced at the tenant level.
    • The weekly automation limit is defined as 00:00 Monday—23:59 Sunday.
    • A job that ends in error is counted toward your limits, but a retry of the failed job (through Add to queue) is not counted.

Updated October 28, 2021
Back to top