[](#longformer-base-4096)longformer-base-4096
=============================================

[Longformer](https://arxiv.org/abs/2004.05150) is a transformer model for long documents.

`longformer-base-4096` is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096.

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations. Please refer to the examples in `modeling_longformer.py` and the paper for more details on how to set global attention.

### [](#citing)Citing

If you use `Longformer` in your research, please cite [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150).

    @article{Beltagy2020Longformer,
      title={Longformer: The Long-Document Transformer},
      author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
      journal={arXiv:2004.05150},
      year={2020},
    }
    

`Longformer` is an open-source project developed by [the Allen Institute for Artificial Intelligence (AI2)](http://www.allenai.org). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.

## Model overview

The `longformer-base-4096` is a transformer model developed by [the Allen Institute for Artificial Intelligence (AI2)](https://aimodels.fyi/creators/huggingFace/allenai), a non-profit institute focused on high-impact AI research and engineering. It is a BERT-like model that has been pre-trained on long documents using masked language modeling. The key innovation of this model is its use of a combination of sliding window (local) attention and global attention, which allows it to handle sequences of up to 4,096 tokens.

The `longformer-base-4096` model is similar to other long-context transformer models like [LongLLaMA](https://aimodels.fyi/models/huggingFace/longllama3b-syzymon) and [BTLM-3B-8k-base](https://aimodels.fyi/models/huggingFace/btlm-3b-8k-base-cerebras), which have also been designed to handle longer input sequences than standard transformer models.

## Model inputs and outputs

### Inputs
- **Text sequence**: The `longformer-base-4096` model can process text sequences of up to 4,096 tokens.

### Outputs
- **Masked language modeling logits**: The primary output of the model is a set of logits representing the probability distribution over the vocabulary for each masked token in the input sequence.

## Capabilities

The `longformer-base-4096` model is designed to excel at tasks that involve processing long documents, such as summarization, question answering, and document classification. Its ability to handle longer input sequences makes it particularly useful for applications where the context is spread across multiple paragraphs or pages.

## What can I use it for?

The `longformer-base-4096` model can be fine-tuned on a variety of downstream tasks, such as text summarization, question answering, and document classification. It could be particularly useful for applications that involve processing long-form content, such as research papers, legal documents, or technical manuals.

## Things to try

One interesting aspect of the `longformer-base-4096` model is its use of global attention, which allows the model to learn task-specific representations. Experimenting with different configurations of global attention could be a fruitful area of exploration, as it may help the model perform better on specific tasks.

Additionally, the model's ability to handle longer input sequences could be leveraged for tasks that require a more holistic understanding of a document, such as long-form question answering or document-level sentiment analysis.