AI Speech to Text

 Submit Website

Simple Subtitling: an open source tool for automatically generating video subtitles and speaker identification
Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Project developed by Jaesung Huh , hosted on GitHub , aims to provide a simple and efficient subtitle generation solution . Tools through the audio processing technology .....
05-16 6140kudos
Abogen: a tool for converting multiple text formats to audiobooks
Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, making it suitable for audiobooks, video dubbing or learning aids. Users can choose...
05-05 5370kudos
Kimi-Audio: Open Source Audio Processing and Dialogue Base Modeling
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
05-05 5890kudos
On Device AI: AI Voice Transcription and Chat Tool for iPhone Native Running
On-Device AI is an AI app that runs completely offline, designed for Apple devices, supporting iOS, macOS, and visionOS.It provides local large-scale language model (LLM) running, real-time speech transcription, document analysis, and other features, and it can be used without an internet connection to ensure data privacy. Users can voice...
05-04 7520kudos
Vexa: a real-time meeting transcription and intelligent knowledge extraction tool
Vexa is an open source real-time meeting transcription and knowledge management platform designed to provide efficient meeting recording and intelligent knowledge extraction services for enterprises and individuals. It automatically joins Google Meet, Zoom and other platforms through API-driven meeting robots, transcribes voice to text in real time, and supports 99...
04-22 6210kudos
Open source tool for real-time speech to text
realtime-transcription-fastrtc is an open source project that focuses on converting speech to text in real time. It uses FastRTC technology to process low-latency audio streams , combined with the local Whisper model to achieve efficient speech recognition . The project is maintained by the developer sofi444 , tor...
04-13 6730kudos
Transkriptor
Transkriptor is an AI-driven transcription tool that focuses on converting audio and video to text quickly. It supports over 100 languages with an accuracy rate of up to 99% and is suitable for a wide range of scenarios such as meetings, interviews, classroom notes and more. Users can upload files, record directly or transcribe via links to Zoom, Go...
04-12 9700kudos
Otter.ai
Otter.ai is an AI-powered meeting management and voice transcription tool with core functionality to convert voice to text in real-time and automatically generate meeting notes, summaries and action items. It is intelligently powered by an AI Meeting Agent that automatically joins meetings such as Zoom, Google Meet, etc., capturing...
04-12 8240kudos
TurboScribe: Online tool to quickly convert audio and video to text
TurboScribe is an AI-based transcription tool that focuses on quickly converting audio and video to text. It supports more than 98 languages with an accuracy rate of 99.8% for users who need to process voice content efficiently. Users can upload files to generate transcripts or subtitles, which is easy to operate and fast...
04-12 5790kudos
Aqua Voice: Cross-Application Speech Input to Generate Accurate Text
Aqua Voice is an intelligent speech-based text generation tool focused on quickly converting user speech into formatted text. It was founded in 2023 by Finnian Brown and Jack McIntire, is based in San Francisco, USA, and is part of Y Combinator W24 ...
04-10 6160kudos
Dolphin: Asian Language Recognition and Speech-to-Text Modeling for Asian Languages
Dolphin is an open source model developed by DataoceanAI in collaboration with Tsinghua University, focusing on speech recognition and language recognition for Asian languages. It supports 40 languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 Chinese dialects. The model is based on over 210,000 hours of audio data trained...
04-08 7400kudos
TwinMind
TwinMind is a smart tool developed by ThirdEar AI, Inc. that "helps you remember everything". TwinMind is a smart tool developed by ThirdEar AI Inc. that "remembers everything for you". It can record conversations, meetings or lectures in real time and convert them to text in more than 100 languages, and it can be used offline even if you have your phone in your pocket. Users don't have to take notes themselves, TwinMind will...
04-05 8680kudos
Wispr Flow: Use your voice to quickly enter text in any application
Wispr Flow is a voice-enabled text input tool that helps users write quickly on their computers. With a "3x faster than typing" experience, users can enter text into any application, such as Word, Slack or Gmail, just by speaking naturally.Wispr Flow supports more than 100 languages....
03-14 8210kudos
Local-NotebookLM: local PDF to generate voice podcasts of open source tools
Local-NotebookLM is an open source project aimed at providing locally run intelligent document processing and content generation tools. It is inspired by Google NotebookLM , focusing on helping users to PDF and other documents into a variety of output formats , such as podcasts , interviews or lectures , etc. , while supporting local deployment ....
03-10 7540kudos
AssemblyAI: High-precision Speech-to-Text and Audio Intelligence Analysis Platform
AssemblyAI is a platform focused on speech AI technology, providing developers and enterprises with efficient speech-to-text and audio analysis tools. Its core highlight is the Universal family of models, especially the newly released Universal-2, which is AssemblyAI's most advanced speech-to-text...
03-06 7900kudos
FireRedASR: An Open Source Model for Multilingual High-Precision Speech Recognition
FireRedASR is a speech recognition model developed and open-sourced by the Little Red Book FireRed team, focusing on providing high-precision, multi-language-supported automatic speech recognition (ASR) solutions. The project is hosted on GitHub for developers and researchers, provides industrial-grade design, and supports Mandarin, Chinese dialects,...
03-04 7780kudos
WhisperChain: real-time speech-to-text and optimization of spoken words
WhisperChain is an AI-based open source project hosted on GitHub and led by developer Chris Choy. It is mainly used to convert speech into text and automatically optimize the expression through AI technology, removing redundant colloquial words (such as "ah", "hmm" and other filler words), to improve the text ....
03-02 7690kudos
LLPlayer
LLPlayer is an open source media player designed for language learners, hosted on GitHub and created by developer umlx5h. It integrates a variety of useful features, such as bilingual subtitle display, AI auto-generated subtitles, real-time translation, and word search, etc. It aims to help users improve their language by watching videos...
02-27 1.1 K0kudos
CapsWriter-Offline: Speech Input and Subtitle Transcription Tool for the PC
CapsWriter-Offline is a voice input and subtitle transcription tool for PC, hosted on GitHub and built by developer HaujetZhao. It runs completely offline and does not require an Internet connection to realize speech-to-text and audio/video file to subtitle transcription functions, supporting unlimited time recording, Chinese and English .....
02-24 9060kudos