STT on FOSS Engineer

Local AI Voice Tools

Sat, 06 Jun 2026 19:30:00 +0200

A chooser guide for Voicebox, KittenTTS, Chatterbox, and related local AI voice workflows, focused on what was actually validated locally.

Voicebox - Local AI Voice Studio for Speech, Dictation, and Agents

Sat, 06 Jun 2026 10:50:00 +0200

Voicebox is a local AI voice studio for cloning voices, generating speech, dictating into apps, transcribing captures, adding effects, composing stories, and giving AI agents voices through REST and MCP. It ships a Tauri desktop app, FastAPI backend, web UI, Docker setup, and multiple local TTS/STT engines.

Chatterbox - Local Open-Source Text-to-Speech by Resemble AI

Fri, 05 Jun 2026 11:45:00 +0200

Chatterbox is Resemble AI’s MIT-licensed open-source text-to-speech toolkit. It ships Python APIs, Gradio demos, English and multilingual models, voice conversion, Turbo inference, paralinguistic tags, and built-in Perth watermarking. It is not a Docker-first self-hosted app; it is a local ML package where GPU access matters.

ComfyUI-Qwen-TTS - Qwen3 Voice Nodes for ComfyUI

Thu, 04 Jun 2026 13:45:00 +0100

ComfyUI-Qwen-TTS is a custom-node pack for running Qwen3-TTS workflows inside ComfyUI: text-to-speech, zero-shot voice cloning, designed voices, saved speakers, multi-role dialogue, and experimental fine-tuning.