Qwen-TTS is a text-to-speech (TTS) tool developed by Alibaba Cloud's Qwen team and provided through the Qwen API. Qwen-TTS supports Mandarin, English, and three Chinese dialects: Beijing, Shanghainese, and Sichuan, and provides seven bilingual voices, including Cherry, Ethan, Chelsie, Serena, Dylan (Beijing), Jada (Shanghainese), and Sunny (Sichuan). Cherry, Ethan, Chelsie, Serena, Dylan (Beijing dialect), Jada (Shanghainese dialect) and Sunny (Sichuan dialect). This tool is suitable for scenarios that require high-quality speech synthesis, such as voice assistants and audio content generation. In the future, Qwen-TTS plans to support more language and style options.
Function List
- Supports Mandarin and Chinese/English bilingual speech synthesis to output natural and smooth speech.
- Supports three Chinese dialects: Beijing, Shanghainese, and Sichuanese, preserving local accent characteristics.
- Provides seven bilingual voices to meet the personalized needs of different scenarios.
- Automatically adjusts the tone, speed and emotion of the voice to closely match the real person's expression.
- Services are provided through the Qwen API for easy integration into applications by developers.
- Supports large-scale speech dataset training to ensure high quality and realism of speech output.
- Provide audio file download function, which is convenient for users to save the generated voice.
Using Help
Preparation of installation and operating environment
Qwen-TTS is currently served via the Qwen API, which eliminates the need for a locally installed model, but does require an API key to be configured to invoke the service. Below are the detailed steps to use it:
- Getting the API key
Users need to register for an Aliyun account and enable the Qwen API service first. Log in to the AliCloud Hundred Refinement Platform and apply forDASHSCOPE_API_KEY
The- Visit the official website of Aliyun Hundred Refine Platform and click "Register" or "Login".
- Find the Qwen API service in the console and follow the instructions to enable it and get the API key.
- Save the key in an environment variable with the command:
export DASHSCOPE_API_KEY='your_api_key'
- Install the necessary Python environment
Qwen-TTS requires API calls via Python, Python 3.6 or above is recommended. Install the required dependency libraries:pip install dashscope pip install requests
Ensure that the network connection is stable to avoid API call timeouts.
- Calling the Qwen-TTS API to generate speech
Qwen-TTS provides a simple Python interface for converting text to speech. The following is a basic sample code:import os import requests import dashscope def get_api_key(): api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: raise EnvironmentError("DASHSCOPE_API_KEY environment variable not set.") return api_key def synthesize_speech(text, voice="Dylan", model="qwen-tts-latest"): api_key = get_api_key() try: response = dashscope.audio.qwen_tts.SpeechSynthesizer.call( model=model, api_key=api_key, text=text, voice=voice ) if response is None: raise RuntimeError("API call returned None response") if response.output is None: raise RuntimeError("API call failed: response.output is None") if not hasattr(response.output, 'audio') or response.output.audio is None: raise RuntimeError("API call failed: response.output.audio is None or missing") audio_url = response.output.audio["url"] return audio_url except Exception as e: raise RuntimeError(f"Speech synthesis failed: {e}") def download_audio(audio_url, save_path): try: resp = requests.get(audio_url, timeout=10) resp.raise_for_status() with open(save_path, 'wb') as f: f.write(resp.content) print(f"Audio file saved to: {save_path}") except Exception as e: raise RuntimeError(f"Download failed: {e}") def main(): text = "哟,您猜怎么着?今儿个我看NBA,库里投篮跟闹着玩似的,张手就来,篮筐都得喊他“亲爹”了" save_path = "downloaded_audio.wav" try: audio_url = synthesize_speech(text, voice="Dylan") download_audio(audio_url, save_path) except Exception as e: print(e) if __name__ == "__main__": main()
- Code Description::
text
: Input the text to be converted, support mixing Chinese and English.voice
: Select a voice type, e.g. "Dylan" for Beijing style. Other available voices include Cherry, Ethan, Chelsie, Serena, Jada, Sunny.model
: Specify the model asqwen-tts-latest
maybeqwen-tts-2025-05-22
Thesave_path
: Set the path to save the generated audio file in WAV format.
- Code Description::
- Selection of voices and dialects
Qwen-TTS offers seven voices, each corresponding to a different style and dialect:- Cherry, Ethan, Chelsie, Serena: Bilingual in Mandarin and English for generalized scenarios.
- Dylan: Beijing dialect with an authentic Beijing accent, suitable for localized content.
- Jada: Shanghainese, suitable for users in Wu-speaking areas.
- Sunny: Sichuanese, characterized by a southwestern accent.
When the API is called, thevoice
parameter specifies the desired sound. For example, a parameter set tovoice="Jada"
Generate Shanghainese voice.
- Adjustment of voice effects
Qwen-TTS automatically adjusts intonation, speech rate and emotion based on the input text, eliminating the need to manually configure parameters. For example, typing a sentence with an exclamation mark generates a more dynamic voice. Users can control the emotion of the voice by adjusting the text content, such as adding intonation or punctuation. - Saving and using generated audio
The audio file returned by the API is provided as a URL. Users can access the audio files via thedownload_audio
Functions are downloaded as WAV files and saved locally for playback, editing or embedding in other applications. Make sure the download path has write permissions. - error handling
- If the API key is not set, the program throws the
EnvironmentError
. Please check the environment variable configuration. - If the network connection is unstable, it may cause
requests.get
Timeout. It is recommended to check the network or extend the timeout (timeout
(Parameters). - If the audio URL returned is invalid, make sure you enter the correct text and sound parameters.
- If the API key is not set, the program throws the
Precautions for use
- Make sure the text is clear and avoid overly complex sentences for optimal phonics.
- API calls require a stable network environment and are recommended to run on a server or high performance device.
- Currently, Qwen-TTS is only available via API and does not support offline use.
- Future versions may support more languages and sound styles, so we recommend following the official blog for updates.
application scenario
- Voice assistant development
Qwen-TTS can be used to develop intelligent voice assistants that support both Chinese and English languages and dialects for localized scenarios. For example, develop a Beijing dialect voice assistant to provide a friendly service experience for local users. - Audiobook and podcast production
Use Qwen-TTS to convert novels or articles into audiobooks with a variety of voice options to suit different listeners' preferences. Shanghainese or Sichuanese versions are available to appeal to specific regions. - Educational content generation
The online education platform can utilize Qwen-TTS to create instructional audio that is bilingual and suitable for language learning or cross-cultural courses. - Advertising and promotional voice-overs
Enterprises can use Qwen-TTS to generate natural speech for advertisement videos, choosing dialect versions to enhance local characteristics and improve user-friendliness. - Game and virtual character voices
Game developers can voice characters, combining dialect and emotional expression to create more realistic avatars.
QA
- What languages and dialects does Qwen-TTS support?
Supports Mandarin, English, and three Chinese dialects: Beijing, Shanghai, and Sichuan. More languages may be supported in the future. - How do I choose different sounds?
In an API call through thevoice
parameter specifies the sound name, such asvoice="Dylan"
(Beijing dialect) orvoice="Sunny"
(speaking sichuanese) - Do I need to install the model locally?
No, Qwen-TTS runs in the cloud via the Qwen API, just configure the API key. - Can the generated audio be saved?
Yes, the API returns the audio URL, which can be downloaded as a WAV file by the user via code and saved locally. - How can I optimize the naturalness of my voice?
Enter clear, properly punctuated and inflected text, and Qwen-TTS automatically adjusts intonation and emotion to produce a more natural sounding voice.