Overseas access: www.kdjingpai.com
Ctrl + D Favorites

Qwen-TTS is a text-to-speech (TTS) tool developed by Alibaba Cloud's Qwen team and provided through the Qwen API. Qwen-TTS supports Mandarin, English, and three Chinese dialects: Beijing, Shanghainese, and Sichuan, and provides seven bilingual voices, including Cherry, Ethan, Chelsie, Serena, Dylan (Beijing), Jada (Shanghainese), and Sunny (Sichuan). Cherry, Ethan, Chelsie, Serena, Dylan (Beijing dialect), Jada (Shanghainese dialect) and Sunny (Sichuan dialect). This tool is suitable for scenarios that require high-quality speech synthesis, such as voice assistants and audio content generation. In the future, Qwen-TTS plans to support more language and style options.

Qwen-TTS:支持中文方言和双语的语音合成工具-1

 

Function List

  • Supports Mandarin and Chinese/English bilingual speech synthesis to output natural and smooth speech.
  • Supports three Chinese dialects: Beijing, Shanghainese, and Sichuanese, preserving local accent characteristics.
  • Provides seven bilingual voices to meet the personalized needs of different scenarios.
  • Automatically adjusts the tone, speed and emotion of the voice to closely match the real person's expression.
  • Services are provided through the Qwen API for easy integration into applications by developers.
  • Supports large-scale speech dataset training to ensure high quality and realism of speech output.
  • Provide audio file download function, which is convenient for users to save the generated voice.

 

Using Help

Preparation of installation and operating environment

Qwen-TTS is currently served via the Qwen API, which eliminates the need for a locally installed model, but does require an API key to be configured to invoke the service. Below are the detailed steps to use it:

  1. Getting the API key
    Users need to register for an Aliyun account and enable the Qwen API service first. Log in to the AliCloud Hundred Refinement Platform and apply for DASHSCOPE_API_KEYThe

    • Visit the official website of Aliyun Hundred Refine Platform and click "Register" or "Login".
    • Find the Qwen API service in the console and follow the instructions to enable it and get the API key.
    • Save the key in an environment variable with the command:
      export DASHSCOPE_API_KEY='your_api_key'
      
  2. Install the necessary Python environment
    Qwen-TTS requires API calls via Python, Python 3.6 or above is recommended. Install the required dependency libraries:

    pip install dashscope
    pip install requests
    

    Ensure that the network connection is stable to avoid API call timeouts.

  3. Calling the Qwen-TTS API to generate speech
    Qwen-TTS provides a simple Python interface for converting text to speech. The following is a basic sample code:

    import os
    import requests
    import dashscope
    def get_api_key():
    api_key = os.getenv("DASHSCOPE_API_KEY")
    if not api_key:
    raise EnvironmentError("DASHSCOPE_API_KEY environment variable not set.")
    return api_key
    def synthesize_speech(text, voice="Dylan", model="qwen-tts-latest"):
    api_key = get_api_key()
    try:
    response = dashscope.audio.qwen_tts.SpeechSynthesizer.call(
    model=model,
    api_key=api_key,
    text=text,
    voice=voice
    )
    if response is None:
    raise RuntimeError("API call returned None response")
    if response.output is None:
    raise RuntimeError("API call failed: response.output is None")
    if not hasattr(response.output, 'audio') or response.output.audio is None:
    raise RuntimeError("API call failed: response.output.audio is None or missing")
    audio_url = response.output.audio["url"]
    return audio_url
    except Exception as e:
    raise RuntimeError(f"Speech synthesis failed: {e}")
    def download_audio(audio_url, save_path):
    try:
    resp = requests.get(audio_url, timeout=10)
    resp.raise_for_status()
    with open(save_path, 'wb') as f:
    f.write(resp.content)
    print(f"Audio file saved to: {save_path}")
    except Exception as e:
    raise RuntimeError(f"Download failed: {e}")
    def main():
    text = "哟,您猜怎么着?今儿个我看NBA,库里投篮跟闹着玩似的,张手就来,篮筐都得喊他“亲爹”了"
    save_path = "downloaded_audio.wav"
    try:
    audio_url = synthesize_speech(text, voice="Dylan")
    download_audio(audio_url, save_path)
    except Exception as e:
    print(e)
    if __name__ == "__main__":
    main()
    
    • Code Description::
      • text: Input the text to be converted, support mixing Chinese and English.
      • voice: Select a voice type, e.g. "Dylan" for Beijing style. Other available voices include Cherry, Ethan, Chelsie, Serena, Jada, Sunny.
      • model: Specify the model as qwen-tts-latest maybe qwen-tts-2025-05-22The
      • save_path: Set the path to save the generated audio file in WAV format.
  4. Selection of voices and dialects
    Qwen-TTS offers seven voices, each corresponding to a different style and dialect:

    • Cherry, Ethan, Chelsie, Serena: Bilingual in Mandarin and English for generalized scenarios.
    • Dylan: Beijing dialect with an authentic Beijing accent, suitable for localized content.
    • Jada: Shanghainese, suitable for users in Wu-speaking areas.
    • Sunny: Sichuanese, characterized by a southwestern accent.
      When the API is called, the voice parameter specifies the desired sound. For example, a parameter set to voice="Jada" Generate Shanghainese voice.
  5. Adjustment of voice effects
    Qwen-TTS automatically adjusts intonation, speech rate and emotion based on the input text, eliminating the need to manually configure parameters. For example, typing a sentence with an exclamation mark generates a more dynamic voice. Users can control the emotion of the voice by adjusting the text content, such as adding intonation or punctuation.
  6. Saving and using generated audio
    The audio file returned by the API is provided as a URL. Users can access the audio files via the download_audio Functions are downloaded as WAV files and saved locally for playback, editing or embedding in other applications. Make sure the download path has write permissions.
  7. error handling
    • If the API key is not set, the program throws the EnvironmentError. Please check the environment variable configuration.
    • If the network connection is unstable, it may cause requests.get Timeout. It is recommended to check the network or extend the timeout (timeout (Parameters).
    • If the audio URL returned is invalid, make sure you enter the correct text and sound parameters.

Precautions for use

  • Make sure the text is clear and avoid overly complex sentences for optimal phonics.
  • API calls require a stable network environment and are recommended to run on a server or high performance device.
  • Currently, Qwen-TTS is only available via API and does not support offline use.
  • Future versions may support more languages and sound styles, so we recommend following the official blog for updates.

 

application scenario

  1. Voice assistant development
    Qwen-TTS can be used to develop intelligent voice assistants that support both Chinese and English languages and dialects for localized scenarios. For example, develop a Beijing dialect voice assistant to provide a friendly service experience for local users.
  2. Audiobook and podcast production
    Use Qwen-TTS to convert novels or articles into audiobooks with a variety of voice options to suit different listeners' preferences. Shanghainese or Sichuanese versions are available to appeal to specific regions.
  3. Educational content generation
    The online education platform can utilize Qwen-TTS to create instructional audio that is bilingual and suitable for language learning or cross-cultural courses.
  4. Advertising and promotional voice-overs
    Enterprises can use Qwen-TTS to generate natural speech for advertisement videos, choosing dialect versions to enhance local characteristics and improve user-friendliness.
  5. Game and virtual character voices
    Game developers can voice characters, combining dialect and emotional expression to create more realistic avatars.

 

QA

  1. What languages and dialects does Qwen-TTS support?
    Supports Mandarin, English, and three Chinese dialects: Beijing, Shanghai, and Sichuan. More languages may be supported in the future.
  2. How do I choose different sounds?
    In an API call through the voice parameter specifies the sound name, such as voice="Dylan"(Beijing dialect) or voice="Sunny"(speaking sichuanese)
  3. Do I need to install the model locally?
    No, Qwen-TTS runs in the cloud via the Qwen API, just configure the API key.
  4. Can the generated audio be saved?
    Yes, the API returns the audio URL, which can be downloaded as a WAV file by the user via code and saved locally.
  5. How can I optimize the naturalness of my voice?
    Enter clear, properly punctuated and inflected text, and Qwen-TTS automatically adjusts intonation and emotion to produce a more natural sounding voice.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish