Synthetic data

New 2026

Synthetic datasets, experiment protocols, and evaluation code for "Governed Memory: A Production Architecture for Multi-Agent Workflows"

New 2026

Generate realistic multi-agent workflow traces with LLM-enriched content, semantic validation, and PM4Py compatibility. pip install open-agent-traces

New 2026

[SIGGRAPH 2026] SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking

reverse engineering Gemini's SynthID detection

[CVPR 2026 Oral] Learning to Drive via Real-World Simulation at Scale

🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

Generate coherent, synthetic data at scale

Using deep research workflow to generate datasets for finetuning LLMs.

Cyber-Zero: Training Cybersecurity Agents Without Runtime

ACE-Step: A Step Towards Music Generation Foundation Model

Simulation Platform from AgiBot

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

Deep learning tools for peptide substrate prediction and generation

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

AI-native platform for tabular data generation via CLI, WebUI or app.

A 15TB Collection of Physics Simulation Datasets

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Official implementation of "En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data", CVPR 2024; 3D Avatar Generation and Animation