RealtimeVoiceChat
RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use the microphone to input voice, the system captures the audio through the browser, quickly converts it to text, generates a reply from a large language model (LLM), and then converts the text to speech output, the whole...
Transkriptor
Transkriptor is an AI-driven transcription tool that focuses on converting audio and video to text quickly. It supports over 100 languages with an accuracy rate of up to 99% and is suitable for a wide range of scenarios such as meetings, interviews, classroom notes and more. Users can upload files, record directly or transcribe via links to Zoom, Go...
Conch Speech (MiniMax Audio): AI tool for generating natural speech
MiniMax Audio is an AI speech generation tool from MiniMax, with the core feature of quickly converting text into highly similar natural speech. It is based on the Speech-02 model, with a speech synthesis similarity of up to 99%, studio-grade sound quality, and support for more than 30 languages and a wide range of mouth...
TwinMind
TwinMind is a smart tool developed by ThirdEar AI, Inc. that "helps you remember everything". TwinMind is a smart tool developed by ThirdEar AI Inc. that "remembers everything for you". It can record conversations, meetings or lectures in real time and convert them to text in more than 100 languages, and it can be used offline even if you have your phone in your pocket. Users don't have to take notes themselves, TwinMind will...
OpenAI Realtime Agents
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides a high-level intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short period of time. The project ...
Bailing
Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a voice conversation robot similar to GPT-4o...
"Always-On" Deepseek AI Assistant: Building an Intelligent Voice Interaction System Based on Deepseek-V3
Always-On AI Assistant is an innovative AI assistant project that creates a powerful and permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer. The project is especially optimized for engineering development scenarios, providing a complete...
Xiaozhi AI Chatbot
Xiaozhi AI Chatbot is an open source project based on the ESP32 development board, designed to help users build their own AI chat companion. The project is developed by Shrimp and is mainly used for teaching purposes to help more people get started with AI hardware development and understand how to apply the big language model to real hardware devices. Project ...
Fish Agent
Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it adopts an innovative semantic tagless architecture design, which does not need to rely on the traditional language such as Whisper .....
Voice-Pro
Voice-Pro 是一个基于 Gradio WebUI 的多功能工具,支持语音转文字、文本转语音、实时翻译、YouTube 视频下载和人声分离。它集成了 Whisper、Faster-Whisper 和 Whisper-Timestamp...
Ichigo (llama3-s)
Ichigo is an open source real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo's goal is to become an open source data, open source weighted native device speech...
AI Hear
如果你在用 MacBook,试试 AI Hear:可以录音、实时本地语音转文字、并翻译、最终导出字幕。可以用它辅助你听跨国会议、英文有声书。 AI Hear是一款本地运行的软件,提供一键实时翻译和转录功能,支持多种语言。...
Fukumaru Chione
Funmaru Thousand Voices is a multilingual AI voice synthesis platform that provides realistic and natural voice generation solutions. Users can easily convert text content into professional-grade audio and support the creation of exclusive AI voices (voice clones) from zero samples to meet personalized needs. The platform also provides video translation function to help users realize...
understand through listening
Tongyi Listening and Understanding is a work-study AI assistant launched by Aliyun, focusing on transcribing and analyzing audio and video content. It relies on AliCloud's powerful AI models to transcribe audio and video content into text in real time, and provides translation, summarization, positioning and other functions. Tongyi Listening Woo supports multiple languages and scenarios to help users...
Tencent Smartfilm (developers of the QQ instant messaging platform)
腾讯智影是腾讯公司推出的在线智能视频创作平台,通过云端服务提供的强大AI工具,能支持文本配音、数字人播报、自动字幕识别等功能,它集素材搜索、视频剪辑、渲染出口和发布于一体,为用户带来便捷的视频编辑和...