Current Position:fig. beginning " AI Tool Library

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

2024-09-15

1.3 K

General Introduction

GOT-OCR2.0 is a StepStar co-presented de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including plain text recognition, formatted text recognition, fine-grained OCR, multi-crop OCR, and multi-page OCR.GOT-OCR2.0 is designed with the goal of providing a versatile and efficient solution for a wide range of complex OCR application scenarios.

Based on QWen2 0.5 B model. Called OCR 2.0, the end-to-end OCR model with 580M parameters got a BLEU score of 0.972. Online experience at https://huggingface.co/spaces/ucaslcl/GOT_online

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

Function List

Ordinary Text Recognition: Recognize ordinary text content in images.
Formatted Text Recognition: Recognizes and retains formatting information of text, such as tables, paragraphs, etc.
Fine-grained OCR: Recognize fine text in images and text against complex backgrounds.
Multi-crop OCR: Supports multiple cropping of images and recognizes the text in each cropped area.
Multi-page OCR: Supports OCR of multi-page documents.

Using Help

Installation process

Clone the project code:

git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
cd GOT-OCR2.0

Create and activate a virtual environment:

conda create -n got python=3.10 -y
conda activate got

Install project dependencies:
```
pip install -e .
```

Install Flash-Attention:

pip install ninja
pip install flash-attn --no-build-isolation

Obtaining GOT model weights

Huggingface
Google Drive
Baidu cloud(Extraction code: OCR2)

Usage Process

Prepare input data: Place the image or document to be OCR'd in the specified input directory.

Run the OCR model:

python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr

View Output Results: The OCR processed text will be saved in the specified output directory, and users can further process it as needed.

Functional operation details

Plain text recognition: Recognizes and outputs ordinary text content in images as plain text files, suitable for simple text extraction tasks.
Formatted Text Recognition: Preserve formatting information, such as tables, paragraphs, etc., while recognizing text, for scenarios where the original formatting of the document needs to be preserved.
Fine-grained OCR: Recognize fine text in complex backgrounds, suitable for scenes requiring high-precision text extraction.
Multi-crop OCR: Crops the image multiple times and recognizes the text in each cropped region, suitable for scenarios that require multi-region recognition of images.
Multi-page OCR: Supports OCR of multi-page documents, suitable for scenarios where long documents or multi-page PDF files are processed.

With the above steps, users can easily install and use the GOT-OCR2.0 model for various OCR tasks. The model provides a rich set of functional modules that can meet the OCR needs in different scenarios.

AI open source project OCR

May not be reproduced without permission:AI productivity tools " GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

General Introduction

Function List

Using Help

Installation process

Obtaining GOT model weights

Usage Process

Functional operation details

Related articles

Recommended

Can't find AI tools? Try here!

testimonials

newest

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

General Introduction

Function List

Using Help

Installation process

Obtaining GOT model weights

Usage Process

Functional operation details

Related articles

Recommended

Can't find AI tools? Try here!

testimonials

newest

Quick query station AI tool