An open, real-time voice chat built on Hugging Face's speech-to-speech stack.
Tap to start
Speech-to-speech is Hugging Face's open framework for real-time voice agents. Rather than one end-to-end model, it chains four open models from the Hub (speech detection, transcription, a vision-language model, and synthesis), so any stage can be swapped or run locally. This demo wires that pipeline to hosted inference.
The pipeline
http://localhost:8080
/v1/realtime
Let the assistant act during the conversation. Changes apply live.