On this page
How can I do OCR for subtitles in languages other than Chinese and English?
Introduction
- Currently, RapidVideOCR directly uses the default configuration of
rapidocr_onnxruntime
, so it can only do OCR for subtitles in Chinese and English. - Since
rapidocr_onnxruntime
has an interface for passing in other multilingual recognition models, RapidVieOCR has scalability. This article is here to explain how to use it. - This article takes the French OCR solution proposed in discussions #40 as an example, and other languages can be done in the same way.
1. Correctly install and use RapidVideOCR
Please refer to this link
2. Use PaddleOCR Convert tool to convert French recognition model to ONNX
Refer to the tutorial link
Using,
- Model path:
https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar
, - Dictionary path:
https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/ppocr/utils/dict/french_dict.txt
For model download links for other languages, please refer to: paddleocr whl in the source paddleocr.py
file
For dictionary models, see: link
Finally, a French recognition model can be obtained: french_mobile_v2.0_rec_infer.onnx
3. OCR French subtitles
rapid_videocr>=v2.2.8
from rapid_videocr import RapidVideOCR
extractor = RapidVideOCR(rec_model_path="french_mobile_v2.0_rec_infer.onnx")
rgb_dir = "test_files/RGBImagesTiny"
save_dir = "outputs"
save_name = "a"
# outputs/a.srt outputs/a.txt
extractor(rgb_dir, save_dir, save_name=save_name)
Last updated 21 May 2025, 20:18 -0600 .