Naver-clova-ix
Rank:Average Model Cost: $0.0000
Number of Runs: 70,173
Models by this creator
donut-base
donut-base
Donut is a model that combines a vision encoder (Swin Transformer) and a text decoder (BART) to convert images into text. It encodes images into embeddings and then generates text based on the image encoding. Donut is designed to be fine-tuned on downstream tasks such as document image classification or document parsing. However, it is a base-sized model and has only been pre-trained so far.
$-/run
32.4K
Huggingface
donut-base-finetuned-cord-v2
donut-base-finetuned-cord-v2
The donut-base-finetuned-cord-v2 model is a language model that can generate textual descriptions of images. It consists of a vision encoder (Swin Transformer) and a text decoder (BART). The model has been fine-tuned on the CORD dataset, which is a document parsing dataset. Its intended use is for tasks related to document understanding and image-to-text generation.
$-/run
17.2K
Huggingface
donut-base-finetuned-docvqa
donut-base-finetuned-docvqa
donut-base-finetuned-docvqa is a model trained for document-based question answering. It is based on the donut-base model, which in turn is built on the T5 architecture. The model takes a document and a question as input and predicts the most probable answer. It has been fine-tuned specifically for document-based question answering tasks.
$-/run
16.9K
Huggingface
donut-base-finetuned-rvlcdip
donut-base-finetuned-rvlcdip
Donut (base-sized model, fine-tuned on RVL-CDIP) Donut model fine-tuned on RVL-CDIP. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on RVL-CDIP, a document image classification dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info
$-/run
2.3K
Huggingface
donut-base-finetuned-zhtrainticket
donut-base-finetuned-zhtrainticket
Donut (base-sized model, fine-tuned on ZhTrainTicket) Donut model fine-tuned on ZhTrainTicket. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on ZhTrainTicket, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info
$-/run
488
Huggingface
donut-base-finetuned-cord-v1-2560
donut-base-finetuned-cord-v1-2560
Donut (base-sized model, fine-tuned on CORD) Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on CORD, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info
$-/run
403
Huggingface
donut-proto
donut-proto
Donut (base-sized model, pre-trained only) Donut model pre-trained only. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is meant to be fine-tuned on a downstream task, like document image classification or document parsing. See the model hub to look for fine-tuned versions on a task that interests you. We refer to the documentation which includes code examples. BibTeX entry and citation info
$-/run
247
Huggingface
donut-base-finetuned-cord-v1
donut-base-finetuned-cord-v1
Donut (base-sized model, fine-tuned on CORD, v1) Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on CORD, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info
$-/run
181
Huggingface
donut-base-finetuned-kuzushiji
donut-base-finetuned-kuzushiji
Platform did not provide a description for this model.
$-/run
43
Huggingface