Voice

Gen Audio

Stability AI
FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
- GitHub: https://github.com/FunAudioLLM

Instant voice cloning

OpenVoice

Text to Speech (TTS)

ChatTTS
- 6drf21e/ChatTTS_colab: 🚀 一键部署（含离线整合包）！基于 ChatTTS ，支持音色抽卡、长音频生成和分角色朗读。简单易用，无需复杂安装。 (github.com)
MARS 5
- GitHub: https://github.com/Camb-ai/MARS5-TTS
- HF: https://huggingface.co/CAMB-AI/MARS5-TTS
edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
- GitHub: https://github.com/rany2/edge-tts
fish-speech
- GitHub: https://github.com/fishaudio/fish-speech
雅婷智慧 (台灣人工智慧實驗室)
- https://github.com/TaiwanAILabs-Yating
Qwen-TTS (通義千問 TTS)
Qwen3-TTS vs ElevenLabs : Voice Cloning & Real-Time Streaming - Geeky Gadgets

VibeVoice (Microsoft)

HF: https://huggingface.co/microsoft/VibeVoice-1.5B

ASR - Automatic Speech Recognition

FrogBase - OpenAI 影片逐字稿生成與翻譯
InstantID - 文字生成圖像 AI，個人風格頭像生成
WhisperDesktop - 影片生成字幕逐字稿，For Windows Only
- [Video] 免安裝版Whisper　無須安裝便可使用｜硬體需求大幅降低｜使用Ｃ＋＋編寫　無須額外安裝函式庫
OpenAI Whisper
Whisper WebUI - 網頁操作介面
WhisperX - 比 whisper large-v2 快 70 倍
Fast Whisper - 比 OpenAI Whisper 的速度快，資源消耗較低
Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Handy - A free, open source, and extensible speech-to-text application that works completely offline.

Back to top