Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI News

RealtimeSTT: Real-time Speech-to-Text Tool for Low-Latency Streaming Speech Recognition Based on Whisper

2025-01-18 1.1 K

General Introduction

RealtimeSTT is an efficient, low-latency real-time speech-to-text library with advanced speech activity detection and wake word activation. It was developed by Kolja Beigel to support applications that require fast and accurate speech-to-text transcription. Whether it's a voice assistant or an application that requires accurate speech transcription, RealtimeSTT provides excellent performance and ease of use.

RealtimeSTT: Real-time Speech to Text Tool, Low Latency Speech Recognition-1

 

Function List

  • Real-time speech to text: transcribe speech to text in real time for a variety of application scenarios.
  • Speech Activity Detection: Automatically detects when a user starts and stops speaking, improving transcription accuracy.
  • Wake-up word activation: Support wake-up word function, users can activate the system by specific words.
  • Low Latency: Ensure low latency in the speech-to-text process to enhance user experience.
  • Multi-Platform Support: Compatible with multiple operating systems and platforms for easy integration.
  • Open source code: Provide complete open source code for developers to carry out secondary development and customization.

 

Using Help

Installation process

  1. Cloning Project Warehouse:
   git clone https://github.com/KoljaB/RealtimeSTT.git
  1. Go to the project catalog:
   cd RealtimeSTT
  1. Install the dependencies:
   pip install -r requirements.txt
  1. (Optional) Install GPU support:
   pip install -r requirements-gpu.txt

Usage

Start the server

  1. Start the speech-to-text server:
   stt-server
  1. After the server starts, wait for the prompt "speak now".

Client Usage

  1. Start the client and connect to the server:
   stt
  1. Once the client is launched, start talking and the system will transcribe the speech to text in real time.

Main function operation flow

real time speech to text

  1. import (data) AudioToTextRecorder Class:
   from RealtimeSTT import AudioToTextRecorder
  1. Defines functions that process text:
   def process_text(text):
print(text)
  1. Starts the recording and processes the text:
   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

Voice Activity Detection

  1. The system automatically detects when the user starts and stops talking, with no additional configuration required.

wake-up call activation

  1. Configure the wake-up word function so that users can activate the system with specific words, please refer to the project documentation for specific configuration.

Detailed operation examples

Typing everything that is said

  1. import (data) AudioToTextRecorder cap (a poem) pyautogui::
   from RealtimeSTT import AudioToTextRecorder
import pyautogui
  1. Defines functions that process text:
   def process_text(text):
pyautogui.typewrite(text + " ")
  1. Starts the recording and processes the text:
   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish