Naver-clova-ix

Rank:

Average Model Cost: $0.0000

Number of Runs: 70,173

Models by this creator

donut-base

donut-base

naver-clova-ix

Donut is a model that combines a vision encoder (Swin Transformer) and a text decoder (BART) to convert images into text. It encodes images into embeddings and then generates text based on the image encoding. Donut is designed to be fine-tuned on downstream tasks such as document image classification or document parsing. However, it is a base-sized model and has only been pre-trained so far.

Read more

$-/run

32.4K

Huggingface

donut-base-finetuned-rvlcdip

donut-base-finetuned-rvlcdip

Donut (base-sized model, fine-tuned on RVL-CDIP) Donut model fine-tuned on RVL-CDIP. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on RVL-CDIP, a document image classification dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info

Read more

$-/run

2.3K

Huggingface

donut-base-finetuned-zhtrainticket

donut-base-finetuned-zhtrainticket

Donut (base-sized model, fine-tuned on ZhTrainTicket) Donut model fine-tuned on ZhTrainTicket. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on ZhTrainTicket, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info

Read more

$-/run

488

Huggingface

donut-base-finetuned-cord-v1-2560

donut-base-finetuned-cord-v1-2560

Donut (base-sized model, fine-tuned on CORD) Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on CORD, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info

Read more

$-/run

403

Huggingface

donut-proto

donut-proto

Donut (base-sized model, pre-trained only) Donut model pre-trained only. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is meant to be fine-tuned on a downstream task, like document image classification or document parsing. See the model hub to look for fine-tuned versions on a task that interests you. We refer to the documentation which includes code examples. BibTeX entry and citation info

Read more

$-/run

247

Huggingface

donut-base-finetuned-cord-v1

donut-base-finetuned-cord-v1

Donut (base-sized model, fine-tuned on CORD, v1) Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on CORD, a document parsing dataset. We refer to the documentation which includes code examples. BibTeX entry and citation info

Read more

$-/run

181

Huggingface

Similar creators