AI Voice
Gen Audio
- Stability AI
- FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
- GitHub: https://github.com/FunAudioLLM
Instant voice cloning
Text to Speech (TTS)
- ChatTTS
- MARS 5
- edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
- fish-speech
- 雅婷智慧 (台灣人工智慧實驗室)
- Qwen-TTS (通義千問 TTS)
- VibeVoice (Microsoft)
- Pocket TTS
- Kitten TTS - 輕量毋須 GPU,還不支援中文。
ASR - Automatic Speech Recognition
- FrogBase - OpenAI 影片逐字稿生成與翻譯
- InstantID - 文字生成圖像 AI,個人風格頭像生成
- WhisperDesktop - 影片生成字幕逐字稿,For Windows Only
- OpenAI Whisper
- Whisper WebUI - 網頁操作介面
- WhisperX - 比 whisper large-v2 快 70 倍
- Fast Whisper - 比 OpenAI Whisper 的速度快,資源消耗較低
- Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
- Handy - A free, open source, and extensible speech-to-text application that works completely offline.
MTK Breeze 3
聯發創新基地(MediaTek Research)發表全新 MediaTek Research Breeze 3(後略 MR Breeze 3)系列,包含台語語音辨識模型 Breeze ASR 26、台語語音合成模型 BreezyVoice 26,以及專為台灣設計的 AI 內容安全防護模型 Breeze Guard 26。
Vibevoice (Microsoft)
Microsoft VibeVoice 是一套開源語音 AI 模型家族,涵蓋 TTS(文字轉語音)與 ASR(語音辨識)。核心創新採用 7.5Hz 超低幀率連續語音 tokenizer,搭配 next-token diffusion 框架,能單次生成最長 90 分鐘多人對話語音、或辨識 60 分鐘長音訊。TTS 支援最多 4 人多語合成;ASR 能同時產出說話者、時間戳與內容的結構化逐字稿