# Synthetic training data

> Synthetic training data - Learn how to generate synthetic datasets that mimic real-world data for
> training, validation, and testing—enabling safe data sharing and model development when access to
> real data is limited due to privacy or regulatory constraints.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:09.576878+00:00` (UTC).

## Primary page

- [Synthetic training data](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/data-enrichment-prep/synth-data.html): Full documentation for this topic (HTML).

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Developer learning](https://docs.datarobot.com/en/docs/api/dev-learning/index.html): Linked from this page.
- [AI accelerators](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/index.html): Linked from this page.
- [Data enrichment and preparation](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/data-enrichment-prep/index.html): Linked from this page.

## Documentation content

[Access this AI accelerator on GitHub](https://github.com/datarobot-community/ai-accelerators/blob/main/advanced_ml_and_api_approaches/dr_synth_data/dr_synth_data.ipynb)

This notebook provides a powerful code-first accelerator to help  generate synthetic datasets in tabular format. It enables you to create synthetic data that mimics the structure and statistical properties of real-world datasets, offering a safe and efficient way to augment existing data or create entirely new datasets. The generated synthetic datasets can be uploaded directly to AI Catalog, where they can be organized, managed, and reused for various machine learning projects.

This approach is particularly useful in scenarios where access to real data is limited due to privacy, security, or regulatory constraints. By generating synthetic datasets, users can expand their training data without compromising sensitive information. These synthetic datasets can be used for model training, validation, and testing, allowing for more robust model development and better generalization on unseen data.

The notebook outlines how to create a synthetic training data set in a CSV file, with name, address, phone number, company, account number, and credit score.