DataRobot overview and administrator workflow¶
DataRobot sets up the basic deployment configuration, which defines the available system-wide features and resource allocations. The following describes the typical admin workflow for setting up users on DataRobot:
- Log in using the default administrator account and create an Admin account.
- Create user accounts, starting with your own.
- Set user permissions and user roles.
Optional. Manage personal worker allocation, which determines the maximum number of workers users can allocate to their project.
The following sections explain concepts that are an important part of the on-premise setup and configuration. Later sections assume you understand these elements:
A DataRobot project is the combination of the dataset used for model training and the models built from that dataset. DataRobot builds a project through several distinct phases.
During the first phase, DataRobot imports the specified dataset, reads the raw data, and performs EDA1 (Exploratory Data Analysis) to understand the data. The next phase, EDA2, begins when the user selects a target feature and starts the model building process. Once EDA2 completes, DataRobot ranks the resulting models by score on the model Leaderboard.
What are workers?¶
Workers are the processing power behind the DataRobot platform, used for creating projects, training models, and making predictions. They represent the portion of processing power allocated to a task. DataRobot uses different types of workers for different phases of the project workflow, including DSS workers (Dataset Service workers), EDA workers, secure modeling workers, and quick workers. All workers, with the exception of modeling workers, are based on system and license settings. They are available to the installation's users on a first come, first served basis. Refer to the Installation and Configuration guide (provided with your release) for information about those worker types. This guide explains how to monitor and manage modeling workers.
During EDA2, modeling workers train data on the target feature and build models. Modeling worker allocation is key to building models quickly; more modeling workers means faster build time. Because model development is time and resource intensive, the more models that are training at one time, the greater the chances for resource contention.
In a Hadoop cluster, workers are run within the Hadoop cluster and YARN manages all modeling worker allocation. These are not managed by DataRobot.
Modeling worker allocation¶
The admin and users each have some ability to modify modeling worker allocation. The admin sets a total allocation and the user has the ability to set per-project allocations, up to their assigned limit. Note that modeling worker allocation is independent of hardware resources in the cluster.
Each user is allocated four workers by default. This "personal worker allocation" means, at any one time, no more than four workers (if left to the default) are processing a user's tasks. This task count applies across all projects in the cluster—multiple browser windows building models are all a part of the personal worker count, more windows does not provide more workers.
The number of workers allocated when a project is created is the "project worker allocation." While this allocation stays with the project if it is shared, any user participating on the project is still restricted to their personal worker allocation.
For example, a project owner may have 12 personal workers allocated to a project and share it with a user who has a four-worker personal allocation. The person invited to the project is still limited by their personal allocation, even if the project reflects a higher worker count.
Change worker allocation¶
The workers used during EDA1 (EDA workers) are set and controlled by the system; neither the admin or user can increase allocation of these workers. Increasing the displayed workers count during EDA does not affect how quickly data is analyzed or processed during this phase.
During model development (EDA2), a user can increase the workers count as long as there are workers available to that user (based on personal worker allocation). Adjusting the worker toggle in the worker usage panel causes more workers to participate in processing. Users can read full details about the Worker Queue for a better understanding of how it works.
If the user's personal worker allocation is changed (increased or decreased), existing projects are not affected.
Change personal worker allocation¶
Admins can set the maximum number of workers each user can allocate to their project.
Expand the profile icon located in the upper right and click APP ADMIN > Users from the dropdown menu.
Locate and select the user to open their profile page.
Click Permissions and scroll down to Modeling workers limit.
Enter the worker limit in the field and click Save changes.
What are groups?¶
You can create groups as a way to manage users, control project sharing, apply actions across user populations, and more. A user can be a member of up to ten groups, or not a member of any group. A group can be associated with a single organization, and a single organization can be the "parent organization" for multiple groups. Project owners can share their projects with a group; this makes the project accessible to all members of that group.
Essentially, groups are a container of users that you can take bulk actions on. For example, if you share a project with a group, all users in the group can see the project (and work with it depending on the permission granted when sharing). Or, you can apply bulk permissions so that all users in a group have a permission set.
See the section on creating groups for information on setting up groups in your installation.
What are organizations?¶
To ensure workers are available as needed and prevent resource contention, an admin can add users to organizations. Organizations provide a way to help administrators manage DataRobot users and groups.
You can use organizations to control worker allocation for groups of users to restrict the total number of workers available to all members of the organization.
Project owners can share their projects with an organization; this makes the project accessible to all members of that organization.
Most commonly, organizations are used to set a cap on the total number of shared modeling workers the members of an organization can use in parallel. For example, you can create an organization of five users that has an allocation of ten workers. For this organization, the 5 users can collectively use up to 10 workers at one time, regardless of their personal worker allocations. This is not a cascading allocation; each user in the organization does not receive that allocation, they all share it.
If a user with a personal allocation of 4 workers is defined in an organization with a worker allocation of 10 workers, the user can use no more than his personal allocation, i.e., 4 workers in this example. Organization membership is not a requirement for DataRobot users and users can be defined in only one organization at a time.
The system admin manages the type of organization described above. Additionally there is a Organization User Admin, which is a user that has access to manage all users within their own organization and create groups for the organization. This type of admin does not have access to view other organizations or view users/groups outside of their organization.