Voice AI Gen Audio Stability AI Stable Audio HF: https://huggingface.co/stabilityai/stable-audio-open-1.0   Stability AI Launches Open-Source Model to Generate Audio (itsfoss.com) FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs GitHub: https://github.com/FunAudioLLM   Instant voice cloning OpenVoice Text to Speech (TTS) ChatTTS 6drf21e/ChatTTS_colab: 🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。 (github.com) MARS 5 GitHub: https://github.com/Camb-ai/MARS5-TTS   HF: https://huggingface.co/CAMB-AI/MARS5-TTS   edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command. GitHub: https://github.com/rany2/edge-tts   fish-speech GitHub: https://github.com/fishaudio/fish-speech   雅婷智慧 (台灣人工智慧實驗室) https://github.com/TaiwanAILabs-Yating Qwen-TTS (通義千問 TTS) Qwen3-TTS vs ElevenLabs : Voice Cloning & Real-Time Streaming - Geeky Gadgets VibeVoice (Microsoft) HF: https://huggingface.co/microsoft/VibeVoice-1.5B   Pocket TTS   Kitten TTS - 輕量毋須 GPU,還不支援中文。 ASR - Automatic Speech Recognition FrogBase  - OpenAI 影片逐字稿生成與翻譯 InstantID  - 文字生成圖像 AI,個人風格頭像生成 WhisperDesktop  - 影片生成字幕逐字稿,For Windows Only [Video]  免安裝版Whisper 無須安裝便可使用|硬體需求大幅降低|使用C++編寫 無須額外安裝函式庫 OpenAI Whisper Whisper WebUI - 網頁操作介面 WhisperX - 比 whisper large-v2 快 70 倍 Fast Whisper - 比 OpenAI Whisper 的速度快,資源消耗較低 Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node Handy - A free, open source, and extensible speech-to-text application that works completely offline. MTK Breeze 3 聯發創新基地(MediaTek Research)發表全新 MediaTek Research Breeze 3(後略 MR Breeze 3)系列,包含台語語音辨識模型 Breeze ASR 26、台語語音合成模型 BreezyVoice 26,以及專為台灣設計的 AI 內容安全防護模型 Breeze Guard 26。 MediaTek Research Breeze 3:讓 AI 聽懂台語、說出台味、守護台灣 Vibevoice (Microsoft) Microsoft VibeVoice 是一套開源語音 AI 模型家族,涵蓋 TTS(文字轉語音)與 ASR(語音辨識)。核心創新採用 7.5Hz 超低幀率連續語音 tokenizer,搭配  next-token diffusion 框架,能單次生成最長 90 分鐘多人對話語音、或辨識 60 分鐘長音訊。TTS 支援最多 4 人多語合成;ASR 能同時產出說話者、時間戳與內容的結構化逐字稿 https://github.com/microsoft/VibeVoice