Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Project developed by Jaesung Huh , hosted on GitHub , aims to provide a simple and efficient subtitle generation solution . Tools through the audio processing technology .....
Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, making it suitable for audiobooks, video dubbing or learning aids. Users can choose...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
On-Device AI is an AI app that runs completely offline, designed for Apple devices, supporting iOS, macOS, and visionOS.It provides local large-scale language model (LLM) running, real-time speech transcription, document analysis, and other features, and it can be used without an internet connection to ensure data privacy. Users can voice...
Vexa is an open source real-time meeting transcription and knowledge management platform designed to provide efficient meeting recording and intelligent knowledge extraction services for enterprises and individuals. It automatically joins Google Meet, Zoom and other platforms through API-driven meeting robots, transcribes voice to text in real time, and supports 99...
realtime-transcription-fastrtc is an open source project that focuses on converting speech to text in real time. It uses FastRTC technology to process low-latency audio streams , combined with the local Whisper model to achieve efficient speech recognition . The project is maintained by the developer sofi444 , tor...
Transkriptor is an AI-driven transcription tool that focuses on converting audio and video to text quickly. It supports over 100 languages with an accuracy rate of up to 99% and is suitable for a wide range of scenarios such as meetings, interviews, classroom notes and more. Users can upload files, record directly or transcribe via links to Zoom, Go...
Otter.ai is an AI-powered meeting management and voice transcription tool with core functionality to convert voice to text in real-time and automatically generate meeting notes, summaries and action items. It is intelligently powered by an AI Meeting Agent that automatically joins meetings such as Zoom, Google Meet, etc., capturing...
TurboScribe is an AI-based transcription tool that focuses on quickly converting audio and video to text. It supports more than 98 languages with an accuracy rate of 99.8% for users who need to process voice content efficiently. Users can upload files to generate transcripts or subtitles, which is easy to operate and fast...
Aqua Voice is an intelligent speech-based text generation tool focused on quickly converting user speech into formatted text. It was founded in 2023 by Finnian Brown and Jack McIntire, is based in San Francisco, USA, and is part of Y Combinator W24 ...
Dolphin is an open source model developed by DataoceanAI in collaboration with Tsinghua University, focusing on speech recognition and language recognition for Asian languages. It supports 40 languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 Chinese dialects. The model is based on over 210,000 hours of audio data trained...
TwinMind is a smart tool developed by ThirdEar AI, Inc. that "helps you remember everything". TwinMind is a smart tool developed by ThirdEar AI Inc. that "remembers everything for you". It can record conversations, meetings or lectures in real time and convert them to text in more than 100 languages, and it can be used offline even if you have your phone in your pocket. Users don't have to take notes themselves, TwinMind will...
Wispr Flow is a voice-enabled text input tool that helps users write quickly on their computers. With a "3x faster than typing" experience, users can enter text into any application, such as Word, Slack or Gmail, just by speaking naturally.Wispr Flow supports more than 100 languages....
Local-NotebookLM is an open source project aimed at providing locally run intelligent document processing and content generation tools. It is inspired by Google NotebookLM , focusing on helping users to PDF and other documents into a variety of output formats , such as podcasts , interviews or lectures , etc. , while supporting local deployment ....
AssemblyAI is a platform focused on speech AI technology, providing developers and enterprises with efficient speech-to-text and audio analysis tools. Its core highlight is the Universal family of models, especially the newly released Universal-2, which is AssemblyAI's most advanced speech-to-text...
FireRedASR is a speech recognition model developed and open-sourced by the Little Red Book FireRed team, focusing on providing high-precision, multi-language-supported automatic speech recognition (ASR) solutions. The project is hosted on GitHub for developers and researchers, provides industrial-grade design, and supports Mandarin, Chinese dialects,...
WhisperChain is an AI-based open source project hosted on GitHub and led by developer Chris Choy. It is mainly used to convert speech into text and automatically optimize the expression through AI technology, removing redundant colloquial words (such as "ah", "hmm" and other filler words), to improve the text ....
LLPlayer is an open source media player designed for language learners, hosted on GitHub and created by developer umlx5h. It integrates a variety of useful features, such as bilingual subtitle display, AI auto-generated subtitles, real-time translation, and word search, etc. It aims to help users improve their language by watching videos...
CapsWriter-Offline is a voice input and subtitle transcription tool for PC, hosted on GitHub and built by developer HaujetZhao. It runs completely offline and does not require an Internet connection to realize speech-to-text and audio/video file to subtitle transcription functions, supporting unlimited time recording, Chinese and English .....