[](#nougat-latex-based)Nougat-LaTeX-based
=========================================

*   **Model type:** [Donut](https://huggingface.co/docs/transformers/model_doc/donut)
*   **Finetuned from:** [facebook/nougat-base](https://huggingface.co/facebook/nougat-base)
*   **Repository:** [source code](https://github.com/NormXU/nougat-latex-ocr)

Nougat-LaTeX-based is fine-tuned from [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) with [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) to boost its proficiency in generating LaTeX code from images. Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and uses an adaptive padding approach to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.

### [](#evaluation)Evaluation

Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and im2latex-100k, curated by [lukas-blecher](https://github.com/lukas-blecher/LaTeX-OCR#data)

model

token\_acc 

normed edit distance 

pix2tex

0.5346

0.10312

pix2tex\*

0.60

0.10

nougat-latex-based

**0.623850**

**0.06180**

pix2tex is a ResNet + ViT + Text Decoder architecture introduced in [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR).

**pix2tex**\*: reported from [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR); **pix2tex**: my evaluation with the released [checkpoint](https://github.com/lukas-blecher/LaTeX-OCR/releases/tag/v0.0.1) ; **nougat-latex-based**: evaluated on results generated with beam-search strategy.

[](#requirements)Requirements
-----------------------------

    pip install transformers >= 4.34.0
    

[](#uses)Uses
-------------

> The inference API widget sometimes cuts the response short. Please check [this](https://github.com/NormXU/nougat-latex-ocr/issues/2#issuecomment-1948310237) issue for more details. You may want to run the model yourself in case the inference API bug cuts the results short.

1.  Download the repo

    git clone git@github.com:NormXU/nougat-latex-ocr.git
    cd ./nougat-latex-ocr
    

2.  Inference

    import torch
    from PIL import Image
    from transformers import VisionEncoderDecoderModel
    from transformers.models.nougat import NougatTokenizerFast
    from nougat_latex import NougatLaTexProcessor
    
    model_name = "Norm/nougat-latex-base"
    device = "cuda" if torch.cuda.is_available() else "cpu"
    # init model
    model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
    
    # init processor
    tokenizer = NougatTokenizerFast.from_pretrained(model_name)
    
    latex_processor = NougatLaTexProcessor.from_pretrained(model_name)
    
    # run test
    image = Image.open("path/to/latex/image.png")
    if not image.mode == "RGB":
        image = image.convert('RGB')
    
    pixel_values = latex_processor(image, return_tensors="pt").pixel_values
    
    decoder_input_ids = tokenizer(tokenizer.bos_token, add_special_tokens=False,
                                  return_tensors="pt").input_ids
    with torch.no_grad():
        outputs = model.generate(
            pixel_values.to(device),
            decoder_input_ids=decoder_input_ids.to(device),
            max_length=model.decoder.config.max_length,
            early_stopping=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
            use_cache=True,
            num_beams=5,
            bad_words_ids=[[tokenizer.unk_token_id]],
            return_dict_in_generate=True,
        )
    sequence = tokenizer.batch_decode(outputs.sequences)[0]
    sequence = sequence.replace(tokenizer.eos_token, "").replace(tokenizer.pad_token, "").replace(tokenizer.bos_token, "")
    print(sequence)

## Model overview

The `nougat-latex-base` model is a Donut-based model fine-tuned from the `facebook/nougat-base` model to boost its proficiency in generating LaTeX code from images. The model was developed by the maintainer [Norm](https://aimodels.fyi/creators/huggingFace/Norm) and aims to improve upon the initial `nougat-base` model, which struggled with generating high-quality LaTeX code from image inputs due to the unsuitable input resolution.

The `nougat-latex-base` model addresses this issue by adjusting the input resolution and using an adaptive padding approach to ensure that equation image segments are resized to closely match the resolution of the training data. This helps to mitigate potential rescaling artifacts and improve the generation quality of LaTeX code.

Similar models like the [nougat](https://aimodels.fyi/models/huggingFace/nougat-alaradirik) model and the [Nous-Hermes-Llama2-70b](https://aimodels.fyi/models/huggingFace/nous-hermes-llama2-70b-nousresearch) model also focus on tasks related to academic document understanding and generation, though with different approaches and specialized capabilities.

## Model inputs and outputs

### Inputs
- Image inputs containing mathematical equations or scientific formulas

### Outputs
- LaTeX code generated to represent the mathematical content in the input image

## Capabilities

The `nougat-latex-base` model excels at generating high-quality LaTeX code from image inputs, particularly for equation and formula-heavy content. It has been evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and the `im2latex-100k` dataset, outperforming the `pix2tex` model on both token accuracy and normalized edit distance metrics.

## What can I use it for?

The `nougat-latex-base` model can be a valuable tool for researchers, academics, and anyone working with scientific or mathematical content. It can be used to automate the process of converting handwritten or typeset equations into LaTeX format, which is widely used in academic and technical publications.

This model could be integrated into various applications, such as academic paper writing tools, educational platforms, or research analysis software, to streamline the process of incorporating mathematical expressions into digital documents.

## Things to try

One interesting aspect of the `nougat-latex-base` model is its ability to handle a wide range of equation and formula types, from simple expressions to more complex mathematical notation. Users can experiment with different input images, ranging from scanned handwritten notes to typeset equations, and observe how the model performs in generating the corresponding LaTeX code.

Additionally, users can explore the model's limitations, such as its handling of edge cases or its ability to generate LaTeX code for more advanced mathematical concepts. By testing the model's capabilities and understanding its strengths and weaknesses, users can find creative ways to incorporate it into their workflows and leverage its potential to enhance their work with mathematical and scientific content.