Audio processing

New 2026

MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.

New 2026

Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.

New 2026

Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.

New 2026

Local meeting transcription → Obsidian vault. No cloud, no API keys.

New 2026

Local meeting transcription → Obsidian vault. No cloud, no API keys.

New 2026

100% private on-device voice models for speech-to-text and meeting transcription on macOS

New 2026

Transcribe microphone and computer audio to markdown.

New 2026

Agentic Hours-Long Video Editing via Music Synchronization

New 2026

Every meeting, every idea, every voice note — searchable by your AI. Open-source, privacy-first conversation memory layer.

New 2026

Personal project on Rust aimed to help understand foreign language better. Uses VAD+Whisper to transcribe, then translate according to the custom dictionary.

New 2026

Fully local meeting transcription with speaker diarization, AI summaries, and PDF output

New 2026

The Infinite Crate is a DAW plugin built on JUCE, React, and the Lyria RealTime live music model

New 2026

NPM Library to transcribe Audio & Videos completely in browser with WebGPU and WebCodecs. 100% private and offline with WASM fallbacks

New 2026

Real-time transcription and AI assistant for Meta Ray-Ban smart glasses. Live speech-to-text, speaker diarization, Gemini Live vision+voice, and WebRTC streaming.

New 2026

Muesli - local meeting transcription + dictation for macOS (Granola + WisprFlow alternative)

New 2026

YouTube Music client for Android

New 2026

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

New 2026

A beautiful Windows program for analyzing audio quality — detects fake lossless, clipping, MQA, and AI-generated audio; includes a spectrogram viewer and more. Built-in player with EQ and spatial audio.

New 2026

MacOS desktop speech-to-text. Private, offline, open-source. Choose your own STT (Speech to Text) providers, and LLM providers that polish your mumbles. Bind any prompt templates with any shorcuts