Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction

2025-01-30 681

General Introduction

SpeechGPT 2.0-preview is the first anthropomorphic real-time interaction system introduced by OpenMOSS, trained on millions of hours of speech data. With anthropomorphic spoken expression and 100ms low latency response, SpeechGPT 2.0-preview supports natural and smooth real-time interruptions and interactions.SpeechGPT 2.0-preview aligns two modes of speech and text, and demonstrates the ability of precise control and intelligent switching of multi-emotions, multi-styles and multi-tones. It can not only simulate the tone and emotional state of various characters, but also has a variety of voice talents such as poetry recitation, storytelling and dialect speaking. In addition, SpeechGPT 2.0-preview also supports tool invocation, network search and plug-in knowledge base, providing rich voice expressiveness and text capabilities.

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction-1

 

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction-1

Demo address: https://sp2.open-moss.com/

 

Function List

  • anthropomorphic colloquial expression
  • Hundred milliseconds low latency response
  • Multi-emotion, multi-style, multi-tone control
  • role-playing ability
  • Voice talents such as poetry recitation, storytelling, and speaking in tongues
  • Support for tool calls, network searches and plug-in knowledge base
  • Efficient Voice Data Crawling System
  • Versatile and efficient speech data cleaning pipeline
  • All-aspect multi-granularity speech data annotation system
  • Joint Semantic-Acoustic Modeling for Ultra-Low Bitrate Streaming Speech Codecs

 

Using Help

Installation process

  1. Cloning Warehouse:
   git clone https://github.com/OpenMOSS/SpeechGPT-2.0-preview.git
cd SpeechGPT-2.0-preview
  1. Download the model weights (requires git-lfs to be installed):
   git lfs install
git clone https://huggingface.co/fnlp/SpeechGPT-2.0-preview-Codec
git clone https://huggingface.co/fnlp/SpeechGPT-2.0-preview-7B
  1. Prepare the environment:
   pip3 install -r requirements.txt
pip3 install flash-attn==2.7.3 --no-build-isolation
  1. Launch the web demo:
   python3 demo_gradio.py --codec_ckpt_path SpeechGPT-2.0-preview-Codec/sg2_codec_ckpt.pkl --model_path SpeechGPT-2.0-preview-7B/

Functional operation flow

  1. anthropomorphic colloquial expression: SpeechGPT 2.0-preview is able to simulate human's oral expression and provide a natural and smooth conversation experience.
  2. Low latency response: The system responds to user inputs in the hundred millisecond level, enabling real-time interaction.
  3. Multi-emotion, multi-style, multi-tone control: Users can control the emotion, style and timbre of the system through commands, adapting to different conversational scenarios.
  4. role-playing (as a game of chess): The system is able to simulate the tone of voice and emotional state of different characters and is suitable for a variety of application scenarios.
  5. phonological talent: SpeechGPT 2.0-preview enriches conversations with a variety of voice talents such as poetry recitation, storytelling and dialect expression.
  6. Tool calls and network searches: The system supports the calling of external tools and the conduct of networked searches, expanding the functionality of the dialog and access to information.
  7. Plugin Knowledge Base: By accessing an external knowledge base, the system is able to provide more detailed and specialized answers.

usage example

  • emotional control: The user can enter the command "Tell a joke in a happy tone" and the system will tell the joke in a happy tone.
  • role-playing (as a game of chess): Enter the command "Simulate a teacher's tone of voice to explain a quadratic function" and the system will explain it in the teacher's tone of voice.
  • phonological talent: Enter the command "Tell a story in dialect" and the system will tell a story in the specified dialect.

Through the above steps and examples, users can fully experience the powerful functions and diverse application scenarios of SpeechGPT 2.0-preview.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish