Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

2024-09-15 1.3 K

General Introduction

GOT-OCR2.0 is a StepStar co-presented de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including plain text recognition, formatted text recognition, fine-grained OCR, multi-crop OCR, and multi-page OCR.GOT-OCR2.0 is designed with the goal of providing a versatile and efficient solution for a wide range of complex OCR application scenarios.

Based on QWen2 0.5 B model. Called OCR 2.0, the end-to-end OCR model with 580M parameters got a BLEU score of 0.972. Online experience at https://huggingface.co/spaces/ucaslcl/GOT_online

 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

 

Function List

  • Ordinary Text Recognition: Recognize ordinary text content in images.
  • Formatted Text Recognition: Recognizes and retains formatting information of text, such as tables, paragraphs, etc.
  • Fine-grained OCR: Recognize fine text in images and text against complex backgrounds.
  • Multi-crop OCR: Supports multiple cropping of images and recognizes the text in each cropped area.
  • Multi-page OCR: Supports OCR of multi-page documents.

 

 

Using Help

Installation process

  1. Clone the project code:
    git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
    cd GOT-OCR2.0
    
  2. Create and activate a virtual environment:
    conda create -n got python=3.10 -y
    conda activate got
    
  3. Install project dependencies:
    pip install -e .
    
  4. Install Flash-Attention:
    pip install ninja
    pip install flash-attn --no-build-isolation
    

Obtaining GOT model weights

Usage Process

  1. Prepare input data: Place the image or document to be OCR'd in the specified input directory.
  2. Run the OCR model:
    python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr
    
  3. View Output Results: The OCR processed text will be saved in the specified output directory, and users can further process it as needed.

Functional operation details

  • Plain text recognition: Recognizes and outputs ordinary text content in images as plain text files, suitable for simple text extraction tasks.
  • Formatted Text Recognition: Preserve formatting information, such as tables, paragraphs, etc., while recognizing text, for scenarios where the original formatting of the document needs to be preserved.
  • Fine-grained OCR: Recognize fine text in complex backgrounds, suitable for scenes requiring high-precision text extraction.
  • Multi-crop OCR: Crops the image multiple times and recognizes the text in each cropped region, suitable for scenarios that require multi-region recognition of images.
  • Multi-page OCR: Supports OCR of multi-page documents, suitable for scenarios where long documents or multi-page PDF files are processed.

With the above steps, users can easily install and use the GOT-OCR2.0 model for various OCR tasks. The model provides a rich set of functional modules that can meet the OCR needs in different scenarios.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish