Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Predictions for fantasy baseball

Access this AI accelerator on GitHub

In this accelerator, you will leverage the DataRobot API to quickly build multiple models that work together to predict common fantasy baseball metrics for each player in the upcoming season. Millions of people play fantasy baseball every year—more than 15 million in the United States and Canada in 2022, according to the Fantasy Sports and Gaming Association. It is the second-most popular fantasy sport in the US and Canada, behind American football, and like most fantasy sports, fantasy team managers typically select players for their team through classic drafts or auction-style processes. Choosing a team of baseball players based on who is your favorite—or even based on last year's performance without any regard for regression to the mean—is likely to field a relatively weak team year in and year out. Baseball is one of the most well documented of all sports, statistics-wise, and with the wealth of data available you can derive a better estimate of each player's true talent level and their likely performance in the coming year using machine learning. This allows for better drafting, helping to avoid overpaying for players coming off of "career" seasons while identifying undervalued players that can effectively fill out a quality team in later rounds of the draft (or for fewer auction dollars).

When drafting players for fantasy baseball, you must make decisions based on the player's performance over their career to date, as well as effects like aging, changing positions, changing teams, etc. You will leverage DataRobot to produce better predictions of the players' performances in the next year based on what they have done in prior years, and from patterns you can learn from similar players in the past.

Learning objectives

  • How to query a rich dataset of MLB players' statistics from the Fangraphs' API.
  • How to set up a project with automated time-aware feature engineering (Automated Feature Discovery).
  • How to update the player data in a Feature Discovery project (i.e., secondary data) to re-predict without building a new project.
  • How to loop over a project creation function to build many DataRobot projects automatically--in this case, to build one project/model for each of the five common fantasy baseball stats: batting average (AVG), home runs (HR), runs (R), runs batted in (RBI), and stolen bases (SB), though you could repeat the same process on pitching statistics, as well.

Retrieve baseball data

This notebook uses Python's pybaseball module to get data from player-seasons between 2012 and 2023. In this workflow, the machine learning algorithm learns patterns from pre-COVID era data, as well as data from 2020 and 2021. This data should help show how well the top model is able to learn how to work around the shortened 2020 season.

Fangraphs provides more than 300 features about hitters each season, from the most superficial statistics like batting average (AVG) and home run counts (HR), to the most in-depth statistics like expected weighted on-base average (xWOBA) and barrel contact percentage (Barrel%). You will use DataRobot to sift through many of these feature to find the ones that best signal future performance.

Updated September 28, 2023