# Production ML with tables

> Production ML with tables - Review an AI accelerator that uses a repeatable framework for a
> production pipeline from multiple tables.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:09.581488+00:00` (UTC).

## Primary page

- [Production ML with tables](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/model-building-tuning/ml-tables.html): Full documentation for this topic (HTML).

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Developer learning](https://docs.datarobot.com/en/docs/api/dev-learning/index.html): Linked from this page.
- [AI accelerators](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/index.html): Linked from this page.
- [Model building and fine-tuning](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/model-building-tuning/index.html): Linked from this page.

## Documentation content

[Access this AI accelerator on GitHub](https://github.com/datarobot-community/ai-accelerators/tree/main/use_cases_and_horizontal_approaches/Automated_Feature_Discovery_template_ML_pipeline/End-to-end%20Automated%20Feature%20Discovery%20Production%20Workflow.ipynb)

We've all been there: data for customer transactions are in one table, but the customer membership history is in another. Or, you have sensor-level data at the sub-second level in one table, machine errors in another table, and production demand in yet another table, all at different time frequencies.  Electronic Medical Records (EMRs) are another common instance of this challenge. You have a use case for your business you want to explore, so you build a v0 dataset and use simple aggregations from before, perhaps in a feature store.  But moving past v0 is hard.

The reality is, the hypothesis space of relevant features explodes when considering multiple data sources with multiple data types in them. By dynamically exploring the feature space across tables, you minimize the risk of missing signal by feature omission and further reduce the burden of a priori knowledge of all possible relevant features.

Event-based data is present in every vertical and is becoming more ubiquitous across industries. Building the right features can drastically improve performance. However, understanding which joins and time horizons are best suited to your data is challenging, and also time-consuming and error-prone to explore.

In this accelerator, you'll find a repeatable framework for a production pipeline from multiple tables. This code uses Snowflake as a data source, but it can be extended to any supported database. Specifically, the accelerator provides a template to:

- Build time-aware features across multiple historical time-windows and datasets using DataRobot and multiple tables in Snowflake (or any database).
- Build and evaluate multiple feature engineering approaches and algorithms for all data types.
- Extract insights and identify the best feature engineering and modeling pipeline.
- Test predictions locally.
- Deploy the best-performing model and all data preprocessing/feature engineering in a Docker container, and expose a REST API.
- Score from Snowflake and write predictions back to Snowflake.
