**Clinical-Longformer** is a clinical knowledge enriched version of Longformer that was further pre-trained using MIMIC-III clinical notes. It allows up to 4,096 tokens as the model input. Clinical-Longformer consistently out-performs ClinicalBERT across 10 baseline dataset for at least 2 percent. Those downstream experiments broadly cover named entity recognition (NER), question answering (QA), natural language inference (NLI) and text classification tasks. For more details, please refer to [our paper](https://arxiv.org/pdf/2201.11838.pdf). We also provide a sister model at [Clinical-BigBIrd](https://huggingface.co/yikuan8/Clinical-BigBird)

### [](#pre-training)Pre-training

We initialized Clinical-Longformer from the pre-trained weights of the base version of Longformer. The pre-training process was distributed in parallel to 6 32GB Tesla V100 GPUs. FP16 precision was enabled to accelerate training. We pre-trained Clinical-Longformer for 200,000 steps with batch size of 63. The learning rates were 3e-5 for both models. The entire pre-training process took more than 2 weeks.

### [](#usage)Usage

Load the model directly from Transformers:

    from transformers import AutoTokenizer, AutoModelForMaskedLM
    tokenizer = AutoTokenizer.from_pretrained("yikuan8/Clinical-Longformer")
    model = AutoModelForMaskedLM.from_pretrained("yikuan8/Clinical-Longformer")
    

### [](#citing)Citing

If you find our model helps, please consider citing this :)

    @article{li2023comparative,
      title={A comparative study of pretrained language models for long clinical text},
      author={Li, Yikuan and Wehbe, Ramsey M and Ahmad, Faraz S and Wang, Hanyin and Luo, Yuan},
      journal={Journal of the American Medical Informatics Association},
      volume={30},
      number={2},
      pages={340--347},
      year={2023},
      publisher={Oxford University Press}
    }
    

### [](#questions)Questions

Please email [yikuanli2018@u.northwestern.edu](mailto:yikuanli2018@u.northwestern.edu)

## Model overview

`Clinical-Longformer` is a variant of the Longformer model that has been further pre-trained on clinical notes from the MIMIC-III dataset. This allows the model to handle longer input sequences of up to 4,096 tokens and achieve improved performance on a variety of clinical NLP tasks compared to the original ClinicalBERT model. The model was initialized from the pre-trained weights of the base Longformer and then trained for an additional 200,000 steps on the MIMIC-III corpus.

The maintainer, [yikuan8](https://aimodels.fyi/creators/huggingFace/yikuan8), also provides a similar model called [Clinical-BigBIrd](https://huggingface.co/yikuan8/Clinical-BigBird) that is optimized for long clinical text. Compared to `Clinical-Longformer`, the `Clinical-BigBIrd` model uses the BigBird attention mechanism which is more efficient for processing long sequences.

## Model inputs and outputs

### Inputs
- Clinical text data, such as electronic health records or medical notes, with a maximum sequence length of 4,096 tokens.

### Outputs
- Depending on the downstream task, the model can be used for a variety of text-to-text applications, including:
  - Named entity recognition (NER)
  - Question answering (QA)
  - Natural language inference (NLI)
  - Text classification

## Capabilities

The `Clinical-Longformer` model consistently outperformed the ClinicalBERT model by at least 2% on 10 different benchmark datasets covering a range of clinical NLP tasks. This demonstrates the value of further pre-training on domain-specific clinical data to improve performance on healthcare-related applications.

## What can I use it for?

The `Clinical-Longformer` model can be useful for a variety of healthcare-related NLP tasks, such as extracting medical entities from clinical notes, answering questions about patient histories, or classifying the sentiment or tone of physician communications. Organizations in the medical and pharmaceutical industries could leverage this model to automate or assist with clinical documentation, patient data analysis, and medication management.

## Things to try

One interesting aspect of the `Clinical-Longformer` model is its ability to handle longer input sequences compared to previous clinical language models. Researchers or developers could experiment with using the model for tasks that require processing of full medical records or lengthy treatment notes, rather than just focused snippets of text. Additionally, the model could be fine-tuned on specific healthcare datasets or tasks to further improve performance on domain-specific applications.