Jinhybr
Models by this creator
✨
OCR-Donut-CORD
118
The OCR-Donut-CORD model is a Donut model fine-tuned on the CORD dataset, a document parsing dataset. Donut, introduced in the paper "OCR-free Document Understanding Transformer" by Geewok et al., consists of a Swin Transformer vision encoder and a BART text decoder. Given an image, the encoder first encodes it into a tensor of embeddings, which the decoder then uses to autoregressively generate text. Similar models include the donut-base-finetuned-cord-v2 and donut-base-finetuned-docvqa models, which are also Donut models fine-tuned on different datasets. The donut-base model is the pre-trained base version of Donut, meant to be fine-tuned on a downstream task. Model inputs and outputs Inputs Image of a document Outputs Text extracted from the document image Capabilities The OCR-Donut-CORD model is capable of extracting text from document images without the need for optical character recognition (OCR). This can be useful for tasks like document parsing, where the model can directly generate structured text from the image without a separate OCR step. What can I use it for? You can use the OCR-Donut-CORD model to parse and extract text from document images, such as receipts, forms, or scientific papers. This can be particularly useful in scenarios where you need to process a large volume of documents, as the model can automate the text extraction process. Things to try One interesting thing to try with the OCR-Donut-CORD model is to compare its performance on different types of documents, such as handwritten notes, complex layouts, or documents with low image quality. This can help you understand the model's strengths and limitations, and guide you in selecting the best model for your specific use case.
Updated 5/28/2024