Jinhybr
Rank:Average Model Cost: $0.0000
Number of Runs: 4,458
Models by this creator
OCR-Donut-CORD
OCR-Donut-CORD
Donut (base-sized model, fine-tuned on CORD) Donut model fine-tuned on CORD. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on CORD, a document parsing dataset. We refer to the documentation which includes code examples. CORD Dataset CORD: A Consolidated Receipt Dataset for Post-OCR Parsing.
$-/run
3.1K
Huggingface
OCR-LayoutLMv3
OCR-LayoutLMv3
OCR-LayoutLMv3 This model is a fine-tuned version of microsoft/layoutlmv3-base on the funsd-layoutlmv3 dataset. It achieves the following results on the evaluation set: Loss: 0.9788 Precision: 0.8989 Recall: 0.9051 F1: 0.9020 Accuracy: 0.8404 Model description LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022. Training hyperparameters The following hyperparameters were used during training: learning_rate: 1e-05 train_batch_size: 2 eval_batch_size: 2 seed: 42 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 lr_scheduler_type: linear training_steps: 2000 Training results Framework versions Transformers 4.25.0.dev0 Pytorch 1.12.1 Datasets 2.6.1 Tokenizers 0.13.1
$-/run
523
Huggingface
OCR-DocVQA-Donut
OCR-DocVQA-Donut
Donut (base-sized model, fine-tuned on DocVQA) Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team. Model description Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder. Intended uses & limitations This model is fine-tuned on DocVQA, a document visual question answering dataset. We refer to the documentation which includes code examples.
$-/run
498
Huggingface
OCR-LayoutLMv3-Invoice
$-/run
204
Huggingface
OCR-LM-v1
OCR-LM-v1
OCR-LM-v1 This model is a fine-tuned version of jinhybr/layoutlm-funsd-pytorch on the funsd dataset. It achieves the following results on the evaluation set: Loss: 1.1740 Answer: {'precision': 0.7201327433628318, 'recall': 0.8046971569839307, 'f1': 0.7600700525394046, 'number': 809} Header: {'precision': 0.4246575342465753, 'recall': 0.5210084033613446, 'f1': 0.46792452830188674, 'number': 119} Question: {'precision': 0.8236380424746076, 'recall': 0.8375586854460094, 'f1': 0.8305400372439479, 'number': 1065} Overall Precision: 0.7525 Overall Recall: 0.8053 Overall F1: 0.7780 Overall Accuracy: 0.8146 Model description More information needed Intended uses & limitations More information needed Training and evaluation data More information needed Training procedure Training hyperparameters The following hyperparameters were used during training: learning_rate: 3e-05 train_batch_size: 16 eval_batch_size: 8 seed: 42 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 lr_scheduler_type: linear num_epochs: 50 Training results Framework versions Transformers 4.23.1 Pytorch 1.12.1 Datasets 2.6.1 Tokenizers 0.13.1
$-/run
26
Huggingface
text-summarization-t5base-xsum
text-summarization-t5base-xsum
t5-small-finetuned-xsum This model is a fine-tuned version of t5-small on the xsum dataset. It achieves the following results on the evaluation set: Loss: 2.4789 Rouge1: 28.282 Rouge2: 7.6989 Rougel: 22.2019 Rougelsum: 22.197 Gen Len: 18.8238 Model description More information needed Intended uses & limitations More information needed Training and evaluation data More information needed Training procedure Training hyperparameters The following hyperparameters were used during training: learning_rate: 2e-05 train_batch_size: 16 eval_batch_size: 16 seed: 42 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 lr_scheduler_type: linear num_epochs: 1 mixed_precision_training: Native AMP Training results Framework versions Transformers 4.22.2 Pytorch 1.12.1+cu113 Datasets 2.5.1 Tokenizers 0.12.1
$-/run
19
Huggingface
LiLt-funsd-en
$-/run
16
Huggingface
layoutlm-funsd-pytorch
$-/run
16
Huggingface
OCR-CDIP-DONUT
$-/run
4
Huggingface
layoutlm-funsd-tf
$-/run
4
Huggingface