General Introduction
Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Developed by Jaesung Huh and hosted on GitHub, the project aims to provide a simple and efficient subtitle generation solution. The tool uses audio processing techniques combined with machine learning models to generate timestamped and speaker-identified subtitle files for users who need to quickly add subtitles to their videos. The project is developed in Python, supports mono 16kHz audio processing, and is easy to install and use.Simple Subtitling is one of the open source projects from Dr. Jaesung Huh's research, with a focus on practicality and open source community contributions.
Function List
- Automatic Subtitle Generation: Extract text from audio files to generate timestamped subtitle files.
- Speaker labeling: Distinguish different speakers through speech analysis and add speaker labels to subtitles.
- SRT format support: generates standard SRT subtitle files compatible with most video players.
- Audio pre-processing: Supports conversion of audio to mono 16kHz format to ensure processing results.
- Configuration file customization: Users can adjust the subtitle generation parameters through the configuration file.
- Open source and free of charge: the code is open to the public, allowing users to freely modify and extend the function.
Using Help
Installation process
Simple Subtitling needs to run in an environment that supports Python, Python 3.9 or above is recommended. Here are the detailed installation steps:
- clone warehouse
Open a terminal and run the following command to clone the project code:git clone https://github.com/JaesungHuh/simple-subtitling.git --recursive
utilization --recursive
Make sure to clone all submodules.
- Creating a Virtual Environment
To avoid dependency conflicts, it is recommended that you create a Python virtual environment:conda create -n simple python=3.9 conda activate simple
If Conda is not used, it can also be used with the
venv
Create a virtual environment:python -m venv simple_env source simple_env/bin/activate # Linux/Mac simple_env\Scripts\activate # Windows
- Installation of dependencies
Go to the project directory and install the required Python packages:cd simple-subtitling pip install -r requirements.txt
Ensure that your internet connection is working and the dependency packages will be downloaded automatically.
- Install FFmpeg
Simple Subtitling uses FFmpeg for audio pre-processing. Please install FFmpeg according to your operating system:- Ubuntu/Debian::
sudo apt update && sudo apt install ffmpeg
- MacOS(using Homebrew):
brew install ffmpeg
- Windows (computer)(using Chocolatey):
choco install ffmpeg
After installation, run
ffmpeg -version
Confirm that the installation was successful. - Ubuntu/Debian::
- Configuration environment
The project contains aconfig.yaml
Configuration file for setting audio processing and subtitle generation parameters. Open theconfig.yaml
, modify the following key fields as required:audio_path
: Enter the audio file path.output_path
: The path where the subtitle file is saved.sample_rate
: Ensure that the setting is16000
(16kHz).
Example Configuration:
audio_path: "input_audio.wav" output_path: "output_subtitle.srt" sample_rate: 16000
Usage
Once the installation is complete, users can run Simple Subtitling from the command line.The following are the main steps:
- Preparing Audio Files
Simple Subtitling currently only supports mono, 16kHz, PCM_16 format audio. If the audio format does not meet the requirements, use FFmpeg to convert it:ffmpeg -i input_audio.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output_audio.wav
interchangeability
input_audio.mp3
is the path to your audio file, and the output isoutput_audio.wav
The - Run Subtitle Generation
In the project directory, run the main script:python main.py
The scripts will be based on the
config.yaml
The settings in the "Subtitle" dialog box process the audio and generate a subtitle file in SRT format. The generated subtitle file contains timestamps, speaker identifiers, and text content, and is saved in theoutput_path
The specified path. - View Subtitle File
Example of a generated SRT file:1 00:00:01,000 --> 00:00:03,000 Speaker_1: 你好,欢迎使用 Simple Subtitling。 2 00:00:04,000 --> 00:00:06,000 Speaker_2: 这是一个开源字幕工具。
Users can import SRT files into video editing software (e.g. Adobe Premiere, DaVinci Resolve) or players (e.g. VLC).
Featured Function Operation
- Speaker identification
Simple Subtitling uses machine learning models to analyze audio and distinguish between different speakers. The model is based on the ECAPA-TDNN architecture, with pre-training weights provided by Jaesung Huh. If you need more accuracy, you can download the pre-trained model from Hugging Face:model = ECAPA_gender.from_pretrained("JaesungHuh/voice-gender-classifier")
exist
config.yaml
enablespeaker_diarization: true
You can add speaker tags to the subtitles. - Customizing subtitle styles
Users can modify theconfig.yaml
hit the nail on the headsubtitle_style
fields to adjust subtitle fonts, colors and more. Currently supports basic SRT format styles and may be extended to ASS format in the future. - batch file
If you need to process multiple audio files, you can write a simple script to loop through the calls.main.py
or modificationconfig.yaml
Multiple file input is supported. Community-contributed batch processing scripts can be found in the GitHub discussion forum.
caveat
- Ensure that the input audio quality is clear; background noise may affect subtitle accuracy.
- The model performs best with English audio, other languages may require additional training.
- The project is in development, so we recommend following the GitHub repository for updates on the latest features and fixes.
application scenario
- Video content creators
YouTubers or short-form video producers can use Simple Subtitling to automatically generate subtitles for their videos, enhancing the viewer experience. The tool's speaker identification feature is perfect for interviews or multi-person conversations, clearly labeling each speaker's content. - Educational content production
Teachers or online course creators can generate subtitles for instructional videos to make them easier for students to read, and SRT files support a multi-language player for internationalized courses. - podcast transcription
Podcast producers can convert audio to subtitles with speaker logos for publishing text versions of content or creating video clips. - proceedings
Business users can generate subtitles for meeting recordings and quickly organize meeting minutes. Speaker identification feature helps to distinguish between different speakers.
QA
- What audio formats does Simple Subtitling support?
Currently only mono, 16kHz, PCM_16 format WAV files are supported. Other formats need to be converted by FFmpeg. - How can I improve the accuracy of subtitle generation?
Ensure clear audio quality and reduce background noise. In theconfig.yaml
enablespeaker_diarization
and use pre-trained models. - Can you handle non-English audio?
The model works best for English audio, other languages may require additional training or model tuning. - How to edit the generated subtitle file?
SRT files can be opened and modified with a text editor (such as Notepad++) or subtitle editing software (such as Aegisub). - Does the project support real-time caption generation?
The current version does not support real-time processing, only offline audio files. Real-time functionality may be added in a future update.