Voice
Gen Audio
- Stability AI
- FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
- GitHub: https://github.com/FunAudioLLM
Instant voice cloning
Text to Speech (TTS)
- ChatTTS
- MARS 5
- edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
- fish-speech
- 雅婷智慧 (台灣人工智慧實驗室)
- Qwen-TTS (通義千問 TTS)
- VibeVoice (Microsoft)
- Pocket TTS
ASR - Automatic Speech Recognition
- FrogBase - OpenAI 影片逐字稿生成與翻譯
- InstantID - 文字生成圖像 AI,個人風格頭像生成
- WhisperDesktop - 影片生成字幕逐字稿,For Windows Only
- OpenAI Whisper
- Whisper WebUI - 網頁操作介面
- WhisperX - 比 whisper large-v2 快 70 倍
- Fast Whisper - 比 OpenAI Whisper 的速度快,資源消耗較低
- Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
- Handy - A free, open source, and extensible speech-to-text application that works completely offline.
MTK Breeze 3
聯發創新基地(MediaTek Research)發表全新 MediaTek Research Breeze 3(後略 MR Breeze 3)系列,包含台語語音辨識模型 Breeze ASR 26、台語語音合成模型 BreezyVoice 26,以及專為台灣設計的 AI 內容安全防護模型 Breeze Guard 26。