Browse all topics

Synthetic data

[CVPR 2026] Learning to Drive via Real-World Simulation at Scale
Generate coherent, synthetic data at scale
Using deep research workflow to generate datasets for finetuning LLMs.
Cyber-Zero: Training Cybersecurity Agents Without Runtime
ACE-Step: A Step Towards Music Generation Foundation Model
Simulation Platform from AgiBot
JavaScript
13.7k
1.4k
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
Deep learning tools for peptide substrate prediction and generation
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
AI-native platform for tabular data generation via CLI, WebUI or app.
Jupyter Notebook
2.1k
206
A 15TB Collection of Physics Simulation Datasets
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Python
4.7k
349
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Official implementation of "En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data", CVPR 2024; 3D Avatar Generation and Animation
Database anonymization and synthetic data generation tool
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
SDG is a specialized framework designed to generate high-quality structured tabular data.
Generation of protein sequences and evolutionary alignments via discrete diffusion models