Instant voice cloning by MIT and MyShell. Audio foundation model.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.
A lightweight coding agent for open models like Deepseek, Kimi, and Qwen
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
📚 Freely available programming books
Machine Learning Engineering Open Book
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
A collective list of free APIs
OCR, layout analysis, reading order, table recognition in 90+ languages
Question and Answer based on Anything.
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Stable Diffusion web UI
An opinionated list of Python frameworks, libraries, tools, and resources
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
The agent engineering platform.
Run Mixtral-8x7B models in Colab or consumer desktops
Code and dataset for photorealistic Codec Avatars driven from audio
🏛️ Diagram as Code for prototyping cloud system architectures