# Voice AI

##### Gen Audio

- [Stability AI](https://stability.ai/)
    - [Stable Audio](https://www.stableaudio.com/)
    - HF: [https://huggingface.co/stabilityai/stable-audio-open-1.0](https://huggingface.co/stabilityai/stable-audio-open-1.0)
    - [Stability AI Launches Open-Source Model to Generate Audio (itsfoss.com)](https://news.itsfoss.com/stability-ai-open-audio/)
- [FunAudioLLM](https://fun-audio-llm.github.io/) - Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs 
    - GitHub: [https://github.com/FunAudioLLM](https://github.com/FunAudioLLM)

##### Instant voice cloning

- [OpenVoice](https://github.com/myshell-ai/OpenVoice)

##### Text to Speech (TTS)

- [ChatTTS](https://github.com/2noise/ChatTTS)
    - [6drf21e/ChatTTS\_colab: 🚀 一键部署（含离线整合包）！基于 ChatTTS ，支持音色抽卡、长音频生成和分角色朗读。简单易用，无需复杂安装。 (github.com)](https://github.com/6drf21e/ChatTTS_colab)
- MARS 5 
    - GitHub: [https://github.com/Camb-ai/MARS5-TTS](https://github.com/Camb-ai/MARS5-TTS)
    - HF: [https://huggingface.co/CAMB-AI/MARS5-TTS](https://huggingface.co/CAMB-AI/MARS5-TTS)
- edge-tts - An Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command. 
    - GitHub: [https://github.com/rany2/edge-tts](https://github.com/rany2/edge-tts)
- [fish-speech](https://speech.fish.audio/)
    - GitHub: [https://github.com/fishaudio/fish-speech](https://github.com/fishaudio/fish-speech)
- [雅婷智慧](https://developer.yating.tw/zh-TW/doc/introduction-%E7%94%A2%E5%93%81%E8%88%87%E4%BD%BF%E7%94%A8%E4%BB%8B%E7%B4%B9) (台灣人工智慧實驗室) 
    - [https://github.com/TaiwanAILabs-Yating](https://github.com/TaiwanAILabs-Yating)
- [Qwen-TTS](https://help.aliyun.com/zh/model-studio/qwen-tts) (通義千問 TTS) 
    - [Qwen3-TTS vs ElevenLabs : Voice Cloning &amp; Real-Time Streaming - Geeky Gadgets](https://www.geeky-gadgets.com/qwen3-tts-voice-cloning/)
- [VibeVoice](https://microsoft.github.io/VibeVoice/) (Microsoft) 
    - HF: [https://huggingface.co/microsoft/VibeVoice-1.5B](https://huggingface.co/microsoft/VibeVoice-1.5B)
- [Pocket TTS](https://github.com/kyutai-labs/pocket-tts)
- [Kitten TTS](https://github.com/KittenML/KittenTTS) - 輕量毋須 GPU，還不支援中文。

##### ASR - Automatic Speech Recognition

- [FrogBase](https://frogbase.dev/) - OpenAI 影片逐字稿生成與翻譯
- [InstantID](https://instantid.github.io/) - 文字生成圖像 AI，個人風格頭像生成
- [WhisperDesktop](https://github.com/Const-me/Whisper) - 影片生成字幕逐字稿，For Windows Only 
    - \[Video\] [免安裝版Whisper 無須安裝便可使用｜硬體需求大幅降低｜使用Ｃ＋＋編寫 無須額外安裝函式庫](https://www.youtube.com/watch?v=jnGjP3siF6o)
- [OpenAI Whisper](https://github.com/openai/whisper)
- [Whisper WebUI](https://gitlab.com/aadnk/whisper-webui/-/tree/main) - 網頁操作介面
- [WhisperX](https://github.com/m-bain/whisperX) - 比 whisper large-v2 快 70 倍
- [Fast Whisper](https://github.com/SYSTRAN/faster-whisper) - 比 OpenAI Whisper 的速度快，資源消耗較低
- [Vosk](https://github.com/alphacep/vosk-api) - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
- [Handy](https://github.com/cjpais/handy) - A free, open source, and extensible speech-to-text application that works completely offline.

##### MTK Breeze 3

聯發創新基地（MediaTek Research）發表全新 MediaTek Research Breeze 3（後略 MR Breeze 3）系列，包含台語語音辨識模型 Breeze ASR 26、台語語音合成模型 BreezyVoice 26，以及專為台灣設計的 AI 內容安全防護模型 Breeze Guard 26。

- [MediaTek Research Breeze 3：讓 AI 聽懂台語、說出台味、守護台灣](https://www.mediatek.com/zh-tw/tek-talk-blogs/mediatek-research-breeze-3)

##### Vibevoice (Microsoft)

Microsoft VibeVoice 是一套開源語音 AI 模型家族，涵蓋 TTS（文字轉語音）與 ASR（語音辨識）。核心創新採用 7.5Hz 超低幀率連續語音 tokenizer，搭配 [next-token diffusion](https://arxiv.org/abs/2412.08635) 框架，能單次生成最長 90 分鐘多人對話語音、或辨識 60 分鐘長音訊。TTS 支援最多 4 人多語合成；ASR 能同時產出說話者、時間戳與內容的結構化逐字稿

- [https://github.com/microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)