Voice
Gen Audio
- Stability AI
- FunAudioLLM - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
- GitHub: https://github.com/FunAudioLLM
Instant voice cloning
Text to Speech (TTS)
- ChatTTS
- MARS 5
- edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
- fish-speech
- 雅婷智慧 (台灣人工智慧實驗室)
ASR - Automatic Speech Recognition
- FrogBase - OpenAI 影片逐字稿生成與翻譯
- InstantID - 文字生成圖像 AI,個人風格頭像生成
- WhisperDesktop - 影片生成字幕逐字稿,For Windows Only
- OpenAI Whisper
- Whisper WebUI - 網頁操作介面
- WhisperX - 比 whisper large-v2 快 70 倍
- Fast Whisper - 比 OpenAI Whisper 的速度快,資源消耗較低
- Vosk - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Translator - 翻譯機
- OpenAI Translator - 基於 ChatGPT API 的翻譯擴充功能,Chrome、Edge 都能用