Tune blueprints for preprocessing and model hyperparameters¶
Access this AI accelerator on GitHub
In machine learning, hyperparameter tuning is the act of adjusting the "settings" (referred to as hyperparameters) in a machine learning algorithm, whether that's the learning rate for an XGBoost model or the activation function in a neural network. Many methods for doing this exist, with the simplest being a brute-force search over every feasible combination. While this requires little effort, it's extremely time-consuming as each combination requires fitting the machine learning algorithm. To this end, practitioners strive to find more efficient ways to search for the best combination of hyperparameters to use in a given prediction problem. DataRobot employs a proprietary version of pattern search for optimization not only for the machine learning algorithm's specific hyperparameters, but also the respective data preprocessing needed to fit the algorithm, with the goal of quickly producing high-performance models tailored to your dataset.
While the approach used at DataRobot is sufficient in most cases, you may want to build upon the Autopilot modeling process by custom tuning methods. In this AI Accelerator, you will familiarize yourself with DataRobot's fine-tuning API calls to control DataRobot's pattern search approach as well as implement a modified brute-force grid-search for the text and categorical data pipeline and hyperparameters of an XGBoost model. This accelerator serves as an introductory learning example that other approaches can be built from. Bayesian Optimization, for example, leverages a probabilistic model to judiciously sift through the hyperparameter space to converge on an optimal solution, and will be presented next in this accelerator bundle.
Note that as a best practice, it is generally best to wait until the model is in a near-finished state before searching for the best hyperparameters to use. Specifically, the following have already been finalized:
- Training data (e.g., data sources)
- Model validation method (e.g., group cross-validation, random cross-validation, or backtesting. How the problem is framed influences all subsequent steps, as it changes error minimization.)
- Feature engineering (particularly, calculations driven by subject matter expertise)
- Preprocessing and data transformations (e.g., word or character tokenizers, PCA, embeddings, normalization, etc.)
- Algorithm type (e.g. GLM, tree-based, neural net)
These decisions typically have a larger impact on model performance compared to adjusting a machine learning algorithm's hyperparameters (especially when using DataRobot, as the hyperparameters chosen automatically are pretty competitive).
This AI Accelerator teaches you how to access, understand, and tune blueprints for both preprocessing and model hyperparameters. You'll programmatically work with DataRobot advanced tuning which you can then adapt to your other projects.
You'll learn how to:
- Prepare for tuning a model via the DataRobot API
- Load a project and model for tuning
- Set the validation type for minimizing errors
- Extract model metadata
- Get model performance
- Review hyperparameters
- Run a single advanced tuning session
- Implement your own custom gridsearch for single and multiple models to evaluate