Voice AI

Gen Audio 
 
 Stability AI 
 
 Stable Audio 
 HF: https://huggingface.co/stabilityai/stable-audio-open-1.0   
 Stability AI Launches Open-Source Model to Generate Audio (itsfoss.com) 
 
 
 FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
 
 GitHub: https://github.com/FunAudioLLM   
 
 
 
 Instant voice cloning 
 
 OpenVoice 
 
 Text to Speech (TTS) 
 
 ChatTTS 
 
 6drf21e/ChatTTS_colab: 🚀 一键部署（含离线整合包）！基于 ChatTTS ，支持音色抽卡、长音频生成和分角色朗读。简单易用，无需复杂安装。 (github.com) 
 
 
 MARS 5
 
 GitHub: https://github.com/Camb-ai/MARS5-TTS   
 HF: https://huggingface.co/CAMB-AI/MARS5-TTS   
 
 
 edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
 
 GitHub: https://github.com/rany2/edge-tts   
 
 
 fish-speech 
 
 GitHub: https://github.com/fishaudio/fish-speech   
 Overview - Fish Audio 
 👍TTS升级啦！Fish Audio1.5本地部署🟢效果完美，速度更快，快来部署体验一下！🟢牛哥AI实验室 NIUGEE AI（119） - YouTube 
 
 
 雅婷智慧 (台灣人工智慧實驗室)
 
 https://github.com/TaiwanAILabs-Yating 

 
 
 
 Qwen-TTS (通義千問 TTS)
 
 Qwen3-TTS vs ElevenLabs : Voice Cloning & Real-Time Streaming - Geeky Gadgets 
 
 
 VibeVoice (Microsoft)
 
 HF: https://huggingface.co/microsoft/VibeVoice-1.5B   
 
 
 Pocket TTS   
 Kitten TTS - 輕量毋須 GPU，還不支援中文。 
 聲音克隆
 
 GPT-SoVITS-WebUI - A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI. 
 Voice Cloning - Fish Audio 
 
 
 
 ASR - Automatic Speech Recognition 
 
 FrogBase  - OpenAI 影片逐字稿生成與翻譯 
 InstantID  - 文字生成圖像 AI，個人風格頭像生成 
 WhisperDesktop  - 影片生成字幕逐字稿，For Windows Only
 
 [Video]  免安裝版Whisper　無須安裝便可使用｜硬體需求大幅降低｜使用Ｃ＋＋編寫　無須額外安裝函式庫 
 
 
 OpenAI Whisper 
 Whisper WebUI - 網頁操作介面 
 WhisperX - 比 whisper large-v2 快 70 倍 
 Fast Whisper - 比 OpenAI Whisper 的速度快，資源消耗較低 
 Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node 
 Handy - A free, open source, and extensible speech-to-text application that works completely offline. 
 
 MTK Breeze 3 
 聯發創新基地（MediaTek Research）發表全新 MediaTek Research Breeze 3（後略 MR Breeze 3）系列，包含台語語音辨識模型 Breeze ASR 26、台語語音合成模型 BreezyVoice 26，以及專為台灣設計的 AI 內容安全防護模型 Breeze Guard 26。 
 
 MediaTek Research Breeze 3：讓 AI 聽懂台語、說出台味、守護台灣 
 MediaTek-Research/Breeze-ASR-25 · Hugging Face 
 
 Vibevoice (Microsoft) 
 Microsoft VibeVoice 是一套開源語音 AI 模型家族，涵蓋 TTS（文字轉語音）與 ASR（語音辨識）。核心創新採用 7.5Hz 超低幀率連續語音 tokenizer，搭配  next-token diffusion 框架，能單次生成最長 90 分鐘多人對話語音、或辨識 60 分鐘長音訊。TTS 支援最多 4 人多語合成；ASR 能同時產出說話者、時間戳與內容的結構化逐字稿 
 
 https://github.com/microsoft/VibeVoice   
 
 Meetily 
 以隱私為先的人工智慧會議助理與筆記記錄工具 
 
 Meetily (Meetly AI) - Privacy-First AI Meeting Assistant | Otter.ai & Granola Alternative 
 https://github.com/Zackriya-Solutions/meetily