During the course of building predictive models, DataRobot runs several different versions of each algorithm and tests thousands of possible combinations of data preprocessing and parameter settings. (Many of the models use DataRobot proprietary approaches to data preprocessing.) The result of this testing is provided in the Blueprints tab. A blueprint represents the high-level end-to-end procedure for fitting the model, including any preprocessing steps, algorithms, and post-processing.
What is the difference between a model and a blueprint?
A modeling algorithm fits a model to data, which is just one component of a blueprint. A blueprint represents the high-level, end-to-end procedure for fitting the model, including any preprocessing steps, modeling, and post-processing steps.
To view a graphical representation of a blueprint, click a model in the Leaderboard.
The level of detail displayed in a blueprint is dependent on your feature enablement. If your organization does not currently display full, uncensored blueprints, contact your DataRobot representative for information on changing the access level.
Each blueprint has a few key sections.
||The incoming data, separated into each type (categorical, numeric, text, image, geospatial, etc.).|
|Transformations||The tasks that perform transformations on the data (for example,
|Model(s)||The model(s) making predictions or possibly supplying stacked predictions to a subsequent model.|
|Post-processing||Any post-processing steps, such as
||The data being sent as the final predictions.|
Each blueprint has nodes and edges (i.e., connections). A node will take in data, perform an operation, and output the data in its new form. An edge is a representation of the flow of data.
When two edges are received by a single node:
It is a representation of two sets of columns being received by the node— the two sets of columns are stacked horizontally. That is, the column count of the incoming data is the sum of the two sets of columns and the row count remains the same.
If two edges are output by a single node, it is a representation of two copies of the output data being sent to other nodes. Other nodes in the blueprint are other types of data transformations or models.
Click a blueprint node to display additional information, including access to model documentation.