Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

Sherpa-ONNX: Offline Speech Recognition and Synthesis with ONNXRuntime

2025-01-16 1.3 K

General Introduction

sherpa-onnx is an open source project developed by the Next-gen Kaldi team to provide efficient offline speech recognition and speech synthesis solutions. It supports multiple platforms including Android, iOS, Raspberry Pi, etc. and is capable of real-time speech processing without an internet connection. The project relies on the ONNX Runtime framework and provides functions from speech-to-text (ASR), text-to-speech (TTS), and voice activity detection (VAD) for various embedded systems and mobile devices. The project not only supports offline use , but also through the WebSocket server and client communication .

Sherpa-ONNX: Offline Speech Recognition and Synthesis with ONNXRuntime-1

Online demo: https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos

 

Function List

  • Offline Speech Recognition (ASR): Supports real-time speech-to-text in multiple languages, without the need for an Internet connection.
  • Offline speech synthesis (TTS): Provides high-quality text-to-speech service, again without the need for internet.
  • Voice Activity Detection (VAD): Real-time detection of voice activity, suitable for a variety of voice interaction scenarios.
  • Multi-platform support: Available for Linux, macOS, Windows, Android, iOS, and many other operating systems.
  • Cross-language model support: Support advanced speech models such as Zipformer, Paraformer, etc. to improve the accuracy of recognition and synthesis.
  • low resource consumption: The optimized model can run smoothly on resource-limited devices.

 

Using Help

Installation process

sherpa-onnx is an open source project, you can download the source code directly from GitHub for compilation, or use the pre-compiled binaries directly:

1.clone warehouse::

git clone https://github.com/k2-fsa/sherpa-onnx.git
cd sherpa-onnx
  1. Compile source code::
    • For Linux and macOS users:
      mkdir build
      cd build
      cmake -DCMAKE_BUILD_TYPE=Release ..
      make -j4
      
    • For Windows users, you may need to use Visual Studio or another compiler supported by CMake.
  2. Download pre-compiled files::
    • Visit the GitHub release page (e.g. https://github.com/k2-fsa/sherpa-onnx/releases) and select the precompiled version for your operating system to download.

Usage

Speech Recognition (ASR) Example::

  • command-line mode::
    Download pre-trained models (e.g. sherpa-onnx-streaming-zipformer-bilingual-zh-en):

    wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en.tar.bz2
    tar xvf sherpa-onnx-streaming-zipformer-bilingual-zh-en.tar.bz2
    

    Then run:

    ./build/bin/sherpa-onnx --tokens=sherpa-onnx-streaming-zipformer-bilingual-zh-en/tokens.txt --encoder=sherpa-onnx-streaming-zipformer-bilingual-zh-en/encoder.onnx --decoder=sherpa-onnx-streaming-zipformer-bilingual-zh-en/decoder.onnx your_audio.wav
    
  • real time recognition::
    Real-time speech recognition using a microphone:

    ./build/bin/sherpa-onnx-microphone --tokens=sherpa-onnx-streaming-zipformer-bilingual-zh-en/tokens.txt --encoder=sherpa-onnx-streaming-zipformer-bilingual-zh-en/encoder.onnx --decoder=sherpa-onnx-streaming-zipformer-bilingual-zh-en/decoder.onnx
    

Speech Synthesis (TTS) Example::

  • Download a pre-trained TTS model (e.g. VITS model):
    wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-tts-vits.tar.bz2
    tar xvf sherpa-onnx-tts-vits.tar.bz2
    
  • Run TTS:
    ./build/bin/sherpa-onnx-offline-tts --model=sherpa-onnx-tts-vits/model.onnx "你好,世界"
    

Voice Activity Detection (VAD)::

  • Run the VAD:
    ./build/bin/sherpa-onnx-vad --model=path/to/vad_model.onnx your_audio.wav
    

caveat

  • Model Selection: Choose the appropriate model (e.g. streaming or non-streaming version) for your needs. Different models differ in terms of performance and real-time performance.
  • hardware requirement: While sherpa-onnx is intended to be low resource consumption, complex models may require higher computational power, especially on mobile devices.
  • Language Support: Pre-trained models may support multiple languages, make sure to choose the right model for your language.

With these steps and tips, you can start using sherpa-onnx for speech-related application development, whether it's a real-time dialog system or offline speech processing.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish