Comprehensive Introduction RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use a microphone to input speech, the system captures the audio through a browser, quickly converts it to text, generates a reply from a large language model (LLM), and then converts the text to speech...
Stepsailor is a tool for developers with an AI command bar at its core. Developers can use it to make their software products understand what the user says, for example, the user says "add new task", the software will automatically execute. It is integrated into SaaS products through a simple SDK and does not require...
综合介绍 OpenAvatarChat 是由 HumanAIGC-Engineering 团队开发的一个开源项目,托管在 GitHub 上。它是一个模块化的数字人对话工具,用户可以在单台 PC 上运行完整功能。项目结合实时视频、语音识别和数...
General Introduction VideoMind is an open source multimodal AI tool focused on inference, Q&A and summary generation for long videos. It was developed by Ye Liu of the Hong Kong Polytechnic University and a team from Show Lab at the National University of Singapore. The tool mimics the way humans understand video by splitting tasks into planning,...
综合介绍 MoshiVis 是 Kyutai Labs 开发的一个开源项目,托管在 GitHub 上。它基于 Moshi 语音-文本模型(7B 参数),新增了约 2.06 亿个适配参数和冻结的 PaliGemma2 视觉编码器(400M 参...
Comprehensive Introduction Qwen2.5-Omni is an open source multimodal AI model developed by Alibaba Cloud Qwen team. It can process multiple inputs such as text, images, audio and video, and generate text or natural speech responses in real time. The model was released on March 26, 2025, and the code and model files tor...
综合介绍 xiaozhi-esp32-server 是一个为 小智AI聊天机器人(xiaozhi-esp32)提供后端服务的工具。它用 Python 编写,基于 WebSocket 协议,帮助用户快速搭建一个控制 ESP32 设备的服务器。...
Comprehensive Introduction Baichuan-Audio is an open source project developed by Baichuan Intelligence (baichuan-inc), hosted on GitHub, focusing on end-to-end voice interaction technology. The project provides a complete audio processing framework that can convert speech input into discrete audio tokens , and then through a large ...
General Introduction PowerAgents is an AI intelligences platform focused on web automation tasks, which allows users to create and deploy AI intelligences capable of clicking, entering and extracting data. The platform supports setting tasks to run automatically on an hourly, daily or weekly basis, and users can also watch the intelligences work through in real time...
Comprehensive Introduction Step-Audio is an open source intelligent voice interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan), and can...
综合介绍 Gemini Cursor 是一个基于 Google 的 Gemini 2.0 Flash(实验性)模型的桌面智能助手。它能够通过多模态 API 实现视觉、听觉和语音交互,提供实时低延迟的用户体验。该项目由 @13point5 创...
综合介绍 DeepSeek-VL2 是一系列高级的 Mixture-of-Experts (MoE) 视觉语言模型,显著提升了其前身 DeepSeek-VL 的性能。该模型在视觉问答、光学字符识别、文档/表格/图表理解和视觉定位等任务中表现...
综合介绍 AI Web Operator 是一个开源的 AI 浏览器操作工具,旨在通过集成多种 AI 技术和 SDK,简化用户在浏览器中的操作体验。该工具基于 Browserbase 和 Vercel AI SDK 构建,支持多种大型语言模...
综合介绍 SpeechGPT 2.0-preview 是 OpenMOSS 推出的首个拟人化实时交互系统,基于百万小时级语音数据训练而成。该系统具备拟人口语化表达与百毫秒级低延迟响应,支持自然流畅的实时打断交互。SpeechGPT 2.0-...
综合介绍 OpenAI Realtime Agents是一个开源项目,旨在展示如何利用OpenAI的实时API来构建多智能体的语音应用。它提供了高级的智能体模式(借鉴 OpenAI Swarm),允许开发者在短时间内搭建出复杂的多智能体语音...
Comprehensive Introduction Bailing (Bailing) is an open source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM) and speech synthesis (TTS) technologies to achieve a GPT-4o-like speech...
综合介绍 Weebo 是一个开源的实时语音聊天机器人,利用 Whisper Small 进行语音识别,Llama 3.2 进行自然语言生成,以及 Kokoro-82M 进行语音合成。该项目由 Amanvir Parhar 开发,旨在提供一个...
Comprehensive Introduction OmAgent is a multimodal intelligent body framework developed by Om AI Lab, aiming to provide powerful AI-powered features for smart devices. The project enables developers to create efficient, real-time interactive experiences on a wide range of smart devices by integrating state-of-the-art multimodal base models and intelligent body algorithms....
综合介绍 Always-On AI Assistant是一个创新的AI助手项目,它通过整合Deepseek-V3、RealtimeSTT和Typer等先进技术,打造了一个功能强大的永久在线AI助理系统。该项目特别针对工程开发场景进行优化,提...