Overseas access: www.kdjingpai.com
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

Simple Subtitling: an open source tool for automatically generating video subtitles and speaker identification

2025-05-16 7

General Introduction

Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Developed by Jaesung Huh and hosted on GitHub, the project aims to provide a simple and efficient subtitle generation solution. The tool uses audio processing techniques combined with machine learning models to generate timestamped and speaker-identified subtitle files for users who need to quickly add subtitles to their videos. The project is developed in Python, supports mono 16kHz audio processing, and is easy to install and use.Simple Subtitling is one of the open source projects from Dr. Jaesung Huh's research, with a focus on practicality and open source community contributions.

Simple Subtitling: An Open Source Tool for Automatic Generation of Video Subtitles and Speaker Identification-1

 

Function List

  • Automatic Subtitle Generation: Extract text from audio files to generate timestamped subtitle files.
  • Speaker labeling: Distinguish different speakers through speech analysis and add speaker labels to subtitles.
  • SRT format support: generates standard SRT subtitle files compatible with most video players.
  • Audio pre-processing: Supports conversion of audio to mono 16kHz format to ensure processing results.
  • Configuration file customization: Users can adjust the subtitle generation parameters through the configuration file.
  • Open source and free of charge: the code is open to the public, allowing users to freely modify and extend the function.

 

Using Help

Installation process

Simple Subtitling needs to run in an environment that supports Python, Python 3.9 or above is recommended. Here are the detailed installation steps:

  1. clone warehouse
    Open a terminal and run the following command to clone the project code:

    git clone https://github.com/JaesungHuh/simple-subtitling.git --recursive

utilization --recursive Make sure to clone all submodules.

  1. Creating a Virtual Environment
    To avoid dependency conflicts, it is recommended that you create a Python virtual environment:

    conda create -n simple python=3.9
    conda activate simple
    

    If Conda is not used, it can also be used with the venv Create a virtual environment:

    python -m venv simple_env
    source simple_env/bin/activate  # Linux/Mac
    simple_env\Scripts\activate  # Windows
    
  2. Installation of dependencies
    Go to the project directory and install the required Python packages:

    cd simple-subtitling
    pip install -r requirements.txt
    

    Ensure that your internet connection is working and the dependency packages will be downloaded automatically.

  3. Install FFmpeg
    Simple Subtitling uses FFmpeg for audio pre-processing. Please install FFmpeg according to your operating system:

    • Ubuntu/Debian::
      sudo apt update && sudo apt install ffmpeg
      
    • MacOS(using Homebrew):
      brew install ffmpeg
      
    • Windows (computer)(using Chocolatey):
      choco install ffmpeg
      

    After installation, run ffmpeg -version Confirm that the installation was successful.

  4. Configuration environment
    The project contains a config.yaml Configuration file for setting audio processing and subtitle generation parameters. Open the config.yaml, modify the following key fields as required:

    • audio_path: Enter the audio file path.
    • output_path: The path where the subtitle file is saved.
    • sample_rate: Ensure that the setting is 16000(16kHz).
      Example Configuration:
    audio_path: "input_audio.wav"
    output_path: "output_subtitle.srt"
    sample_rate: 16000
    

Usage

Once the installation is complete, users can run Simple Subtitling from the command line.The following are the main steps:

  1. Preparing Audio Files
    Simple Subtitling currently only supports mono, 16kHz, PCM_16 format audio. If the audio format does not meet the requirements, use FFmpeg to convert it:

    ffmpeg -i input_audio.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output_audio.wav
    

    interchangeability input_audio.mp3 is the path to your audio file, and the output is output_audio.wavThe

  2. Run Subtitle Generation
    In the project directory, run the main script:

    python main.py
    

    The scripts will be based on the config.yaml The settings in the "Subtitle" dialog box process the audio and generate a subtitle file in SRT format. The generated subtitle file contains timestamps, speaker identifiers, and text content, and is saved in the output_path The specified path.

  3. View Subtitle File
    Example of a generated SRT file:

    1
    00:00:01,000 --> 00:00:03,000
    Speaker_1: 你好,欢迎使用 Simple Subtitling。
    2
    00:00:04,000 --> 00:00:06,000
    Speaker_2: 这是一个开源字幕工具。
    

    Users can import SRT files into video editing software (e.g. Adobe Premiere, DaVinci Resolve) or players (e.g. VLC).

Featured Function Operation

  • Speaker identification
    Simple Subtitling uses machine learning models to analyze audio and distinguish between different speakers. The model is based on the ECAPA-TDNN architecture, with pre-training weights provided by Jaesung Huh. If you need more accuracy, you can download the pre-trained model from Hugging Face:

    model = ECAPA_gender.from_pretrained("JaesungHuh/voice-gender-classifier")
    

    exist config.yaml enable speaker_diarization: trueYou can add speaker tags to the subtitles.

  • Customizing subtitle styles
    Users can modify the config.yaml hit the nail on the head subtitle_style fields to adjust subtitle fonts, colors and more. Currently supports basic SRT format styles and may be extended to ASS format in the future.
  • batch file
    If you need to process multiple audio files, you can write a simple script to loop through the calls. main.pyor modification config.yaml Multiple file input is supported. Community-contributed batch processing scripts can be found in the GitHub discussion forum.

caveat

  • Ensure that the input audio quality is clear; background noise may affect subtitle accuracy.
  • The model performs best with English audio, other languages may require additional training.
  • The project is in development, so we recommend following the GitHub repository for updates on the latest features and fixes.

 

application scenario

  1. Video content creators
    YouTubers or short-form video producers can use Simple Subtitling to automatically generate subtitles for their videos, enhancing the viewer experience. The tool's speaker identification feature is perfect for interviews or multi-person conversations, clearly labeling each speaker's content.
  2. Educational content production
    Teachers or online course creators can generate subtitles for instructional videos to make them easier for students to read, and SRT files support a multi-language player for internationalized courses.
  3. podcast transcription
    Podcast producers can convert audio to subtitles with speaker logos for publishing text versions of content or creating video clips.
  4. proceedings
    Business users can generate subtitles for meeting recordings and quickly organize meeting minutes. Speaker identification feature helps to distinguish between different speakers.

 

QA

  1. What audio formats does Simple Subtitling support?
    Currently only mono, 16kHz, PCM_16 format WAV files are supported. Other formats need to be converted by FFmpeg.
  2. How can I improve the accuracy of subtitle generation?
    Ensure clear audio quality and reduce background noise. In the config.yaml enable speaker_diarization and use pre-trained models.
  3. Can you handle non-English audio?
    The model works best for English audio, other languages may require additional training or model tuning.
  4. How to edit the generated subtitle file?
    SRT files can be opened and modified with a text editor (such as Notepad++) or subtitle editing software (such as Aegisub).
  5. Does the project support real-time caption generation?
    The current version does not support real-time processing, only offline audio files. Real-time functionality may be added in a future update.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish