RealtimeVoiceChat
RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use the microphone to input voice, the system captures the audio through the browser, quickly converts it to text, generates a reply from a large language model (LLM), and then converts the text to speech output, the whole...
OpenAI Realtime Agents
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides a high-level intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short period of time. The project ...
Bailing
Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a voice conversation robot similar to GPT-4o...
"Always-On" Deepseek AI Assistant: Building an Intelligent Voice Interaction System Based on Deepseek-V3
Always-On AI Assistant is an innovative AI assistant project that creates a powerful and permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer. The project is especially optimized for engineering development scenarios, providing a complete...
Xiaozhi AI Chatbot
Xiaozhi AI Chatbot is an open source project based on the ESP32 development board, designed to help users build their own AI chat companion. The project is developed by Shrimp and is mainly used for teaching purposes to help more people get started with AI hardware development and understand how to apply the big language model to real hardware devices. Project ...
Fish Agent
Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it adopts an innovative semantic tagless architecture design, which does not need to rely on the traditional language such as Whisper .....
Ichigo (llama3-s)
Ichigo is an open source real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo's goal is to become an open source data, open source weighted native device speech...