Current Position:fig. beginning " AI Tool Library

Qwen-TTS: Speech Synthesis Tool with Chinese Dialect and Bilingual Support

2025-07-05

AI Tool Library/basic model/speech model

29 0

https://qwenlm.github.io/zh/blog/qwen-tts/

Qwen-TTS is a text-to-speech (TTS) tool developed by Alibaba Cloud's Qwen team and provided through the Qwen API. Qwen-TTS supports Mandarin, English, and three Chinese dialects: Beijing, Shanghainese, and Sichuan, and provides seven bilingual voices, including Cherry, Ethan, Chelsie, Serena, Dylan (Beijing), Jada (Shanghainese), and Sunny (Sichuan). Cherry, Ethan, Chelsie, Serena, Dylan (Beijing dialect), Jada (Shanghainese dialect) and Sunny (Sichuan dialect). This tool is suitable for scenarios that require high-quality speech synthesis, such as voice assistants and audio content generation. In the future, Qwen-TTS plans to support more language and style options.

Qwen-TTS：支持中文方言和双语的语音合成工具-1

Function List

Supports Mandarin and Chinese/English bilingual speech synthesis to output natural and smooth speech.
Supports three Chinese dialects: Beijing, Shanghainese, and Sichuanese, preserving local accent characteristics.
Provides seven bilingual voices to meet the personalized needs of different scenarios.
Automatically adjusts the tone, speed and emotion of the voice to closely match the real person's expression.
Services are provided through the Qwen API for easy integration into applications by developers.
Supports large-scale speech dataset training to ensure high quality and realism of speech output.
Provide audio file download function, which is convenient for users to save the generated voice.

Using Help

Preparation of installation and operating environment

Qwen-TTS is currently served via the Qwen API, which eliminates the need for a locally installed model, but does require an API key to be configured to invoke the service. Below are the detailed steps to use it:

Getting the API key
Users need to register for an Aliyun account and enable the Qwen API service first. Log in to the AliCloud Hundred Refinement Platform and apply for DASHSCOPE_API_KEYThe
- Visit the official website of Aliyun Hundred Refine Platform and click "Register" or "Login".
- Find the Qwen API service in the console and follow the instructions to enable it and get the API key.
- Save the key in an environment variable with the command:
```
export DASHSCOPE_API_KEY='your_api_key'
```
Install the necessary Python environment
Qwen-TTS requires API calls via Python, Python 3.6 or above is recommended. Install the required dependency libraries:
```
pip install dashscope
pip install requests
```
Ensure that the network connection is stable to avoid API call timeouts.

Calling the Qwen-TTS API to generate speech
Qwen-TTS provides a simple Python interface for converting text to speech. The following is a basic sample code:

import os
import requests
import dashscope
def get_api_key():
api_key = os.getenv("DASHSCOPE_API_KEY")
if not api_key:
raise EnvironmentError("DASHSCOPE_API_KEY environment variable not set.")
return api_key
def synthesize_speech(text, voice="Dylan", model="qwen-tts-latest"):
api_key = get_api_key()
try:
response = dashscope.audio.qwen_tts.SpeechSynthesizer.call(
model=model,
api_key=api_key,
text=text,
voice=voice
)
if response is None:
raise RuntimeError("API call returned None response")
if response.output is None:
raise RuntimeError("API call failed: response.output is None")
if not hasattr(response.output, 'audio') or response.output.audio is None:
raise RuntimeError("API call failed: response.output.audio is None or missing")
audio_url = response.output.audio["url"]
return audio_url
except Exception as e:
raise RuntimeError(f"Speech synthesis failed: {e}")
def download_audio(audio_url, save_path):
try:
resp = requests.get(audio_url, timeout=10)
resp.raise_for_status()
with open(save_path, 'wb') as f:
f.write(resp.content)
print(f"Audio file saved to: {save_path}")
except Exception as e:
raise RuntimeError(f"Download failed: {e}")
def main():
text = "哟，您猜怎么着？今儿个我看NBA，库里投篮跟闹着玩似的，张手就来，篮筐都得喊他“亲爹”了"
save_path = "downloaded_audio.wav"
try:
audio_url = synthesize_speech(text, voice="Dylan")
download_audio(audio_url, save_path)
except Exception as e:
print(e)
if __name__ == "__main__":
main()

Code Description::
- text: Input the text to be converted, support mixing Chinese and English.
- voice: Select a voice type, e.g. "Dylan" for Beijing style. Other available voices include Cherry, Ethan, Chelsie, Serena, Jada, Sunny.
- model: Specify the model as qwen-tts-latest maybe qwen-tts-2025-05-22The
- save_path: Set the path to save the generated audio file in WAV format.

Selection of voices and dialects
Qwen-TTS offers seven voices, each corresponding to a different style and dialect:
- Cherry, Ethan, Chelsie, Serena: Bilingual in Mandarin and English for generalized scenarios.
- Dylan: Beijing dialect with an authentic Beijing accent, suitable for localized content.
- Jada: Shanghainese, suitable for users in Wu-speaking areas.
- Sunny: Sichuanese, characterized by a southwestern accent.
  When the API is called, the voice parameter specifies the desired sound. For example, a parameter set to voice="Jada" Generate Shanghainese voice.
Adjustment of voice effects
Qwen-TTS automatically adjusts intonation, speech rate and emotion based on the input text, eliminating the need to manually configure parameters. For example, typing a sentence with an exclamation mark generates a more dynamic voice. Users can control the emotion of the voice by adjusting the text content, such as adding intonation or punctuation.
Saving and using generated audio
The audio file returned by the API is provided as a URL. Users can access the audio files via the download_audio Functions are downloaded as WAV files and saved locally for playback, editing or embedding in other applications. Make sure the download path has write permissions.
error handling
- If the API key is not set, the program throws the EnvironmentError. Please check the environment variable configuration.
- If the network connection is unstable, it may cause requests.get Timeout. It is recommended to check the network or extend the timeout (timeout (Parameters).
- If the audio URL returned is invalid, make sure you enter the correct text and sound parameters.

Precautions for use

Make sure the text is clear and avoid overly complex sentences for optimal phonics.
API calls require a stable network environment and are recommended to run on a server or high performance device.
Currently, Qwen-TTS is only available via API and does not support offline use.
Future versions may support more languages and sound styles, so we recommend following the official blog for updates.

application scenario

Voice assistant development
Qwen-TTS can be used to develop intelligent voice assistants that support both Chinese and English languages and dialects for localized scenarios. For example, develop a Beijing dialect voice assistant to provide a friendly service experience for local users.
Audiobook and podcast production
Use Qwen-TTS to convert novels or articles into audiobooks with a variety of voice options to suit different listeners' preferences. Shanghainese or Sichuanese versions are available to appeal to specific regions.
Educational content generation
The online education platform can utilize Qwen-TTS to create instructional audio that is bilingual and suitable for language learning or cross-cultural courses.
Advertising and promotional voice-overs
Enterprises can use Qwen-TTS to generate natural speech for advertisement videos, choosing dialect versions to enhance local characteristics and improve user-friendliness.
Game and virtual character voices
Game developers can voice characters, combining dialect and emotional expression to create more realistic avatars.

QA

What languages and dialects does Qwen-TTS support?
Supports Mandarin, English, and three Chinese dialects: Beijing, Shanghai, and Sichuan. More languages may be supported in the future.
How do I choose different sounds?
In an API call through the voice parameter specifies the sound name, such as voice="Dylan"(Beijing dialect) or voice="Sunny"(speaking sichuanese)
Do I need to install the model locally?
No, Qwen-TTS runs in the cloud via the Qwen API, just configure the API key.
Can the generated audio be saved?
Yes, the API returns the audio URL, which can be downloaded as a WAV file by the user via code and saved locally.
How can I optimize the naturalness of my voice?
Enter clear, properly punctuated and inflected text, and Qwen-TTS automatically adjusts intonation and emotion to produce a more natural sounding voice.

AI Open Services AI Text-to-Speech

Chief AI Sharing Circle " Qwen-TTS: Speech Synthesis Tool with Chinese Dialect and Bilingual Support Posted on 2025-07-05, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

Qwen-TTS: Speech Synthesis Tool with Chinese Dialect and Bilingual Support

Function List

Using Help

Preparation of installation and operating environment

Precautions for use

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Recommended Tools

New Releases

Qwen-TTS: Speech Synthesis Tool with Chinese Dialect and Bilingual Support

Function List

Using Help

Preparation of installation and operating environment

Precautions for use

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Recommended Tools

New Releases

Quick query station AI tool