[](#openelm-an-efficient-language-model-family-with-open-source-training-and-inference-framework)OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
==============================================================================================================================================================================================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open**\-source **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

See the list below for the details of each model:

*   [OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)
*   [OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)
*   [OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)
*   [OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)
*   [OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)
*   [OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)
*   [OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)
*   [OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

    
    from transformers import AutoModelForCausalLM
    
    openelm_270m = AutoModelForCausalLM.from_pretrained("apple/OpenELM-270M", trust_remote_code=True)
    openelm_450m = AutoModelForCausalLM.from_pretrained("apple/OpenELM-450M", trust_remote_code=True)
    openelm_1b = AutoModelForCausalLM.from_pretrained("apple/OpenELM-1_1B", trust_remote_code=True)
    openelm_3b = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B", trust_remote_code=True)
    
    openelm_270m_instruct = AutoModelForCausalLM.from_pretrained("apple/OpenELM-270M-Instruct", trust_remote_code=True)
    openelm_450m_instruct = AutoModelForCausalLM.from_pretrained("apple/OpenELM-450M-Instruct", trust_remote_code=True)
    openelm_1b_instruct = AutoModelForCausalLM.from_pretrained("apple/OpenELM-1_1B-Instruct", trust_remote_code=True)
    openelm_3b_instruct = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Instruct", trust_remote_code=True)
    

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL_NAME]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-270M
    hf_model=OpenELM-270M
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}-source {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model overview

`OpenELM` is an open-source family of efficient language models developed by Apple. They use a layer-wise scaling strategy to allocate parameters efficiently within each transformer layer, leading to enhanced accuracy. The `OpenELM` models range from 270M to 3B parameters and are available in both base and instruction-tuned versions.

The models were pretrained on a large corpus of around 1.8 trillion tokens, including datasets like RefinedWeb, PILE, RedPajama, and Dolma v1.6. Compared to similar models like [openchat_3.5](https://aimodels.fyi/models/huggingFace/openchat35-openchat), `OpenELM` offers improved efficiency and performance.

## Model inputs and outputs

### Inputs
- **Text**: The `OpenELM` models accept text as input and can be used for a variety of natural language processing tasks.

### Outputs
- **Text**: The models generate human-readable text as output, making them suitable for tasks like language generation, question answering, and dialogue.

## Capabilities

`OpenELM` has shown strong performance on a range of benchmarks, including MMLU, HumanEval, MATH, and GSM8k. The instruction-tuned versions are particularly adept at following prompts and generating helpful, coherent responses.

## What can I use it for?

The `OpenELM` models can be used as a foundation for building various natural language applications, such as:

- **Language generation**: Use the models to generate human-like text for creative writing, content creation, or chatbots.
- **Question answering**: Fine-tune the models to answer questions on a wide range of topics.
- **Dialogue systems**: Leverage the instruction-tuned versions to build conversational AI assistants.

## Things to try

One interesting aspect of `OpenELM` is its use of layer-wise scaling to optimize parameter allocation. This approach could lead to insights about efficient model design and potentially inspire new architectures or training techniques.

Additionally, the open-source nature of the models presents opportunities for the community to further fine-tune and adapt them for specialized use cases, contributing to the broader progress of language models.

![DCLM Logo](https://cdn-uploads.huggingface.co/production/uploads/63118add64939fabc0108b28/BB42g4V8HTxb5dR4tcy8A.png)

[](#model-card-for-dclm-baseline-7b)Model Card for DCLM-Baseline-7B
===================================================================

DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.

[](#model-details)Model Details
-------------------------------

Size

Training Tokens

Layers

Hidden Size

Attention Heads

Context Length

7B

2.5T

32

4096

32

2048

### [](#model-description)Model Description

*   **Developed by:** DataComp for Language Models (DCLM) Team
*   **Model type:** Decoder-only Transformer language model
*   **Language(s):** English (primarily)
*   **License:** Apple Sample Code License
*   **Contact:** [contact@datacomp.ai](mailto:contact@datacomp.ai)
*   **Date:** June 2024

### [](#model-sources)Model Sources

*   **Repository:** [https://github.com/mlfoundations/dclm](https://github.com/mlfoundations/dclm)
*   **Dataset:** [https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0)
*   **Paper:** [DataComp-LM: In search of the next generation of training sets for language models](https://arxiv.org/abs/2406.11794)

[](#using-model)Using Model
---------------------------

First install open\_lm

    pip install git+https://github.com/mlfoundations/open_lm.git
    

Then:

    from open_lm.hf import *
    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("apple/DCLM-Baseline-7B")
    model = AutoModelForCausalLM.from_pretrained("apple/DCLM-Baseline-7B")
    
    inputs = tokenizer(["Machine learning is"], return_tensors="pt")
    gen_kwargs = {"max_new_tokens": 50, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
    output = model.generate(inputs['input_ids'], **gen_kwargs)
    output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
    print(output)
    

### [](#training-details)Training Details

The model was trained using the following setup:

*   **Architecture:** Decoder-only Transformer
*   **Framework:** PyTorch with OpenLM
*   **Optimizer:** AdamW
*   **Learning Rate:** 2e-3 (peak)
*   **Weight Decay:** 0.05
*   **Batch Size:** 2048 sequences
*   **Sequence Length:** 2048 tokens
*   **Total Training Tokens:** 2.5T
*   **Hardware:** Trained on H100 GPUs

For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper. To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.

[](#evaluation)Evaluation
-------------------------

Here are the evaluation results for DCLM-Baseline-7B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite)

Task

Score

MMLU (zero-shot)

0.5766

MMLU (few-shot)

0.6372

HellaSwag (zero-shot)

0.7987

HellaSwag

0.8043

Jeopardy

0.4745

TriviaQA

0.5270

GSM8K (CoT)

0.0250

AGI Eval SAT Math (CoT)

0.0136

AQuA (CoT)

0.0490

SVAMP (CoT)

0.4900

BigBench QA Wikidata

0.7120

ARC Easy

0.8220

ARC Challenge

0.5990

BigBench Misconceptions

0.6986

COPA

0.8500

SIQA

0.8291

CommonsenseQA

0.8018

PIQA

0.8128

OpenBookQA

0.4540

BigBench Novel Concepts

0.7188

BigBench Strange Stories

0.7586

BigBench Strategy QA

0.6173

LAMBADA

0.8220

Winograd

0.8828

Winogrande

0.7269

BigBench Conlang Translation

0.0244

BigBench Language Identification

0.5219

BigBench Conceptual Combinations

0.6990

BigBench Elementary Math QA

0.3431

BigBench Dyck Languages

0.4930

AGI Eval LSAT AR

0.2435

BigBench CS Algorithms

0.6121

BigBench Logical Deduction

0.3620

BigBench Operators

0.4857

BigBench Repeat Copy Logic

0.4063

Simple Arithmetic (no spaces)

0.2940

Simple Arithmetic (with spaces)

0.3110

MathQA

0.3098

LogiQA

0.4132

PubMedQA

0.7060

SQuAD

0.5856

AGI Eval LSAT RC

0.6716

AGI Eval LSAT LR

0.5392

CoQA

0.4074

BigBench Understanding Fables

0.6825

BoolQ

0.8343

AGI Eval SAT EN

0.7670

Winogender MC (Female)

0.6000

Winogender MC (Male)

0.5500

Enterprise PII Classification

0.7676

BBQ

0.6912

GPQA Main

0.2612

GPQA Diamond

0.2475

Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.

[](#comparison)Comparison
-------------------------

Below are comparisions of this model with other models in the 7B regime.

Model

Params

Tokens

Open dataset?

CORE

MMLU

EXTENDED

**Open weights, closed datasets**

Llama2

7B

2T



49.2

45.8

34.1

DeepSeek

7B

2T



50.7

48.5

35.3

Mistral-0.3

7B

?



57.0

62.7

45.1

QWEN-2

7B

?



57.5

**71.9**

50.5

Llama3

8B

15T



57.6

66.2

46.3

Gemma

8B

6T



57.8

64.3

44.6

Phi-3

7B

?



**61.0**

69.9

**57.9**

**Open weights, open datasets**

Falcon

7B

1T



44.1

27.4

25.1

OLMo-1.7

7B

2.1T



47.0

54.0

34.2

MAP-Neo

7B

4.5T



**50.2**

**57.1**

**40.4**

**DCLM-7B**

7B

2.5T



**56.1**

**63.7**

**43.6**

[](#limitations-and-biases)Limitations and Biases
-------------------------------------------------

While DCLM-Baseline-7B demonstrates strong performance across a range of tasks, it's important to note:

1.  The model may exhibit biases present in its training data, which is derived from web crawl data.
2.  It has not undergone specific alignment or safety fine-tuning, so outputs should be used with caution.
3.  Performance on tasks not included in the evaluation suite may vary.
4.  The model's knowledge is limited to its training data cutoff date.

[](#ethical-considerations)Ethical Considerations
-------------------------------------------------

Users should be aware that this model, like all large language models, can potentially generate harmful or biased content. It should not be used for making decisions about individuals or in sensitive applications without appropriate safeguards and human oversight.

[](#citation)Citation
---------------------

If you use this model in your research, please cite:

    @article{Li2024DataCompLM,
      title={DataComp-LM: In search of the next generation of training sets for language models},
      author={Jeffrey Li and Alex Fang and Georgios Smyrnis and Maor Ivgi and Matt Jordan and Samir Gadre and Hritik Bansal and Etash Guha and Sedrick Keh and Kushal Arora and [... full author list]},
      journal={arXiv preprint arXiv:2406.11794},
      year={2024}
    }

## Model overview

The `DCLM-7B` is a 7 billion parameter language model trained by the DataComp for Language Models (DCLM) team. It is a decoder-only Transformer model designed to showcase the effectiveness of systematic data curation techniques for improving language model performance. The model was trained on the DCLM-Baseline dataset, which contains approximately 2.5 trillion tokens. 

Similar models include the [DeciLM-7B](https://aimodels.fyi/models/huggingFace/decilm-7b-deci), a highly efficient 7 billion parameter decoder-only text generation model released by Deci, and the [OpenELM-3B](https://aimodels.fyi/models/huggingFace/openelm-3b-apple), a family of open-source efficient language models developed by researchers at Apple.

## Model inputs and outputs

### Inputs
- Text prompts of varying lengths

### Outputs
- Continuation of the input text, generated in an autoregressive manner
- The model can be used for a variety of text generation tasks, such as creative writing, summarization, and question answering

## Capabilities

The `DCLM-7B` model demonstrates strong performance across a range of language understanding and generation benchmarks, including the ARC, HellaSwag, and MMLU tasks. It is particularly effective at tasks that require comprehensive language understanding and reasoning abilities.

## What can I use it for?

The `DCLM-7B` model can be used for a variety of natural language processing tasks, such as content generation, question answering, and dialogue systems. Its large size and strong performance make it well-suited for commercial and research applications that require high-quality text generation. 

Users and developers should keep in mind that, like all large language models, the `DCLM-7B` may produce outputs that are inaccurate, biased, or objectionable. Thorough safety testing and filtering mechanisms are recommended before deploying the model in production environments.

## Things to try

Experiment with the model's zero-shot and few-shot capabilities on various language understanding and generation tasks. Try fine-tuning the model on domain-specific datasets to see how it adapts to specialized applications. Explore the model's ability to handle long-form text generation and multi-turn dialogue.

[](#openelm)OpenELM
===================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open**\-source **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model apple/OpenELM-3B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model apple/OpenELM-3B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model apple/OpenELM-3B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-3B-Instruct
    hf_model=OpenELM-3B-Instruct
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}-source {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model Overview

`OpenELM-3B-Instruct` is a 3 billion parameter language model developed by [Apple](https://aimodels.fyi/creators/huggingFace/apple). It is part of the OpenELM family of efficient language models that use a layer-wise scaling strategy to enhance accuracy. The model was pretrained on a large corpus of data, including [RefinedWeb](https://github.com/apple/corenet), deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. 

Similar models in the OpenELM family include [OpenELM-270M](https://huggingface.co/apple/OpenELM-270M), [OpenELM-450M](https://huggingface.co/apple/OpenELM-450M), [OpenELM-1_1B](https://huggingface.co/apple/OpenELM-1_1B), and their respective instruction-tuned versions. These models aim to provide efficient and high-performing language models for a variety of natural language processing tasks.

## Model Inputs and Outputs

### Inputs
- Text input for the model to generate output

### Outputs
- Generated text output

## Capabilities

The OpenELM-3B-Instruct model demonstrates strong performance across a range of benchmark tasks, including ARC-c, ARC-e, BoolQ, HellaSwag, PIQA, SciQ, and WinoGrande. It outperforms many existing 3 billion parameter models, setting a new standard for efficiency and accuracy in this size range.

## What Can I Use It For?

The OpenELM-3B-Instruct model can be used for a variety of natural language generation tasks, such as text summarization, language translation, and content creation. Its strong performance on benchmarks suggests it could be a valuable tool for researchers and developers working on advanced language models and natural language processing applications.

## Things to Try

One interesting aspect of the OpenELM-3B-Instruct model is its support for lookup token speculative generation, which can help speed up inference. Developers could experiment with this feature to optimize the model's performance for their specific use cases. Additionally, the model's ability to be used with an assistive model for model-wise speculative generation could be an interesting area to explore, allowing for potential improvements in the model's helpfulness and safety.

[](#openelm)OpenELM
===================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open**\-source **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model apple/OpenELM-3B --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model apple/OpenELM-3B --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model apple/OpenELM-3B --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-3B
    hf_model=OpenELM-3B
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}-source {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model Overview

[`OpenELM`](https://aimodels.fyi/models/huggingFace/openelm-apple) is a family of open-source efficient language models created by Apple. The models use a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. OpenELM models are available in a range of sizes from 270M to 3B parameters, with both pretrained and instruction-tuned versions.

## Model Inputs and Outputs

OpenELM models take in natural language prompts as input and generate coherent, contextual text as output. The models can be used for a variety of language tasks such as text generation, summarization, and question answering.

### Inputs
- Natural language prompts

### Outputs
- Coherent, contextual text generated in response to the input prompt

## Capabilities

OpenELM models exhibit strong performance on a range of language tasks, including question answering, common sense reasoning, and language understanding. The models show competitive results compared to other large language models, with the larger 3B parameter version outperforming on many benchmarks.

## What Can I Use it For?

The OpenELM models can be used for a wide variety of natural language processing applications, such as:

- **Content generation**: Generate coherent and contextual text for tasks like story writing, article summarization, and dialogue response.
- **Language understanding**: Use the models for tasks like text classification, question answering, and relation extraction.
- **Conversational AI**: Integrate the models into chatbots and virtual assistants to enable more natural and engaging interactions.

## Things to Try

One interesting aspect of the OpenELM models is the use of the layer-wise scaling strategy, which allows the models to allocate parameters more efficiently across layers. This could enable interesting explorations into model compression and efficient inference on resource-constrained devices.

Additionally, the availability of both pretrained and instruction-tuned versions of the models opens up possibilities for prompt engineering and few-shot learning experiments. Developers can explore how the models respond to different prompts and fine-tune them for specific use cases.

[](#openelm)OpenELM
===================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open**\-source **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-270M-Instruct
    hf_model=OpenELM-270M-Instruct
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}-source {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model overview

The `OpenELM-270M-Instruct` is a 270M parameter open-source efficient language model developed by Apple. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. OpenELM models are pretrained using the [CoreNet](https://github.com/apple/corenet) library and released in several sizes with both pretrained and instruction-tuned variants.

The OpenELM-270M-Instruct model is part of a family of OpenELM models that also includes the [OpenELM-3B-Instruct](https://aimodels.fyi/models/huggingFace/openelm-3b-instruct-apple), a larger 3B parameter version. These models are designed for text-to-text generation tasks and have shown strong performance on a variety of benchmarks like ARC-c, HellaSwag, and WinoGrande.

## Model inputs and outputs

### Inputs
- Text prompts to be used for text generation

### Outputs
- Conditional text generation based on the input prompts

## Capabilities

The OpenELM-270M-Instruct model is capable of generating high-quality and coherent text across a range of domains. It has shown strong performance on tasks like question answering, common sense reasoning, and open-ended text generation. Compared to similar sized models, the OpenELM-270M-Instruct has demonstrated improved accuracy and efficiency.

## What can I use it for?

The OpenELM-270M-Instruct model can be used for a variety of natural language processing applications, such as:

- Chatbots and conversational assistants
- Content generation (e.g. stories, articles, product descriptions)
- Question answering and knowledge retrieval
- Text summarization and simplification

As an open-source model, developers can fine-tune the OpenELM-270M-Instruct for their specific use cases or incorporate it into larger language models or applications.

## Things to try

One interesting aspect of the OpenELM-270M-Instruct is its use of layer-wise scaling to efficiently allocate parameters. This allows the model to achieve strong performance while being more compact than models of similar size. Developers can experiment with different ways of leveraging this efficiency, such as deploying the model on low-resource devices or incorporating it into ensemble models.

Another area to explore is the instruction tuning process used to create the OpenELM-Instruct variants. Analyzing the impact of this fine-tuning on the model's capabilities and safety could provide insights for developing more robust and versatile language models.

[](#aim-autoregressive-image-models)AIM: Autoregressive Image Models
====================================================================

_Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, and Armand Joulin_

This software project accompanies the research paper, Scalable Pre-training of Large Autoregressive Image Models.

We introduce **AIM** a collection of vision models pre-trained with an autoregressive generative objective. We show that autoregressive pre-training of image features exhibits similar scaling properties to their textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:

1.  the model capacity can be trivially scaled to billions of parameters, and
2.  AIM effectively leverages large collections of uncurated image data.

[](#installation)Installation
-----------------------------

Please install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/). Afterward, install the package as:

    pip install git+https://git@github.com/apple/ml-aim.git
    

We also offer [MLX](https://github.com/ml-explore/mlx) backend support for research and experimentation on Apple silicon. To enable MLX support, simply run:

    pip install mlx
    

[](#usage)Usage
---------------

Below we provide an example of usage in [PyTorch](https://pytorch.org/):

    from PIL import Image
    
    from aim.utils import load_pretrained
    from aim.torch.data import val_transforms
    
    img = Image.open(...)
    model = load_pretrained("aim-600M-2B-imgs", backend="torch")
    transform = val_transforms()
    
    inp = transform(img).unsqueeze(0)
    logits, _ = model(inp)
    

and in both [MLX](https://ml-explore.github.io/mlx/)

    from PIL import Image
    import mlx.core as mx
    
    from aim.utils import load_pretrained
    from aim.torch.data import val_transforms
    
    img = Image.open(...)
    model = load_pretrained("aim-600M-2B-imgs", backend="mlx")
    transform = val_transforms()
    
    inp = transform(img).unsqueeze(0)
    inp = mx.array(inp.numpy())
    logits, _ = model(inp)
and [JAX](https://jax.readthedocs.io/)

    from PIL import Image
    import jax.numpy as jnp
    
    from aim.utils import load_pretrained
    from aim.torch.data import val_transforms
    
    img = Image.open(...)
    model, params = load_pretrained("aim-600M-2B-imgs", backend="jax")
    transform = val_transforms()
    
    inp = transform(img).unsqueeze(0)
    inp = jnp.array(inp)
    (logits, _), _ = model.apply(params, inp, mutable=['batch_stats'])

[](#pre-trained-checkpoints)Pre-trained checkpoints
---------------------------------------------------

The pre-trained models can be accessed either via [Hugging Face](https://huggingface.co/collections/apple/aim-65aa3ce948c718a574f09eb7):

    # after running pip install git+https://git@github.com/apple/ml-aim.git
    from aim.torch.models import AIMForImageClassification
    
    aim_600m = AIMForImageClassification.from_pretrained("apple/aim-600M")
    aim_1b   = AIMForImageClassification.from_pretrained("apple/aim-1B")
    aim_3b   = AIMForImageClassification.from_pretrained("apple/aim-3B")
    aim_7b   = AIMForImageClassification.from_pretrained("apple/aim-7B")
    

or [PyTorch Hub](https://pytorch.org/hub/) as:

    import torch
    
    aim_600m = torch.hub.load("apple/ml-aim", "aim_600M")
    aim_1b   = torch.hub.load("apple/ml-aim", "aim_1B")
    aim_3b   = torch.hub.load("apple/ml-aim", "aim_3B")
    aim_7b   = torch.hub.load("apple/ml-aim", "aim_7B")
    

### [](#pre-trained-backbones)Pre-trained backbones

The following table contains pre-trained backbones used in our paper.

model

#params

attn (best layer)

backbone, SHA256

AIM-0.6B

0.6B

79.4%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_backbone.pth), 0d6f6b8f

AIM-1B

1B

82.3%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_backbone.pth), d254ecd3

AIM-3B

3B

83.3%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_backbone.pth), 8475ce4e

AIM-7B

7B

84.0%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_backbone.pth), 184ed94c

### [](#pre-trained-attention-heads)Pre-trained attention heads

The table below contains the classification results on ImageNet-1k validation set.

model

top-1 IN-1k

attention head, SHA256

last layer

best layer

last layer

best layer

AIM-0.6B

78.5%

79.4%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_head_last_layers.pth), 5ce5a341

[link](https://huggingface.co/apple/AIM/resolve/main/aim_600m_2bimgs_attnprobe_head_best_layers.pth), ebd45c05

AIM-1B

80.6%

82.3%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_head_last_layers.pth), db3be2ad

[link](https://huggingface.co/apple/AIM/resolve/main/aim_1b_5bimgs_attnprobe_head_best_layers.pth), f1ed7852

AIM-3B

82.2%

83.3%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_head_last_layers.pth), 5c057b30

[link](https://huggingface.co/apple/AIM/resolve/main/aim_3b_5bimgs_attnprobe_head_best_layers.pth), ad380e16

AIM-7B

82.4%

84.0%

[link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_head_last_layers.pth), 1e5c99ba

[link](https://huggingface.co/apple/AIM/resolve/main/aim_7b_5bimgs_attnprobe_head_best_layers.pth), 73ecd732

[](#reproducing-the-in-1k-classification-results)Reproducing the IN-1k classification results
---------------------------------------------------------------------------------------------

The commands below reproduce the [attention probe results](#pre-trained-attention-heads) on ImageNet-1k validation set. We run the evaluation using 1 node with 8 GPUs:

    torchrun --standalone --nnodes=1 --nproc-per-node=8 main_attnprobe.py \
      --model=aim-7B \
      --batch-size=64 \
      --data-path=/path/to/imagenet \
      --probe-layers=last \
      --backbone-ckpt-path=/path/to/backbone_ckpt.pth \
      --head-ckpt-path=/path/to/head_ckpt.pth
    

By default, we probe the last 6 layers. To change this, simply pass `--probe-layers=best`.

## Model Overview

`AIM` is a collection of vision models pre-trained with an autoregressive generative objective, introduced by researchers at [Apple](https://aimodels.fyi/creators/huggingFace/apple). The models demonstrate that autoregressive pre-training of image features can exhibit similar scaling properties to large language models. Key findings include the ability to scale the model capacity to billions of parameters and effectively leverage large uncurated image datasets.

Similar models include [OpenELM](https://aimodels.fyi/models/huggingFace/openelm-apple), a family of efficient open-source language models developed by Apple. Like AIM, OpenELM utilizes a layer-wise scaling strategy to allocate parameters efficiently within the transformer architecture.

## Model Inputs and Outputs

AIM takes images as input and generates a set of logits as output, which can be used for various downstream tasks such as image classification. The model uses the `val_transforms` function from the `aim.torch.data` module to preprocess the input images.

### Inputs
- Images, preprocessed using the `val_transforms` function

### Outputs
- Logits, representing the model's predictions
- Additional output, such as intermediate representations, depending on the specific use case

## Capabilities

AIM demonstrates the ability to effectively leverage large-scale image datasets for pre-training, resulting in strong performance across a variety of computer vision benchmarks. The model's autoregressive nature allows it to capture rich visual features that can be useful for tasks like image classification, generation, and understanding.

## What Can I Use It For?

The `AIM` models can be used for a range of computer vision applications, including image classification, generation, and understanding. Potential use cases include:

- **Image Classification**: Fine-tune the AIM model on a labeled dataset to perform image classification tasks.
- **Image Generation**: Use the autoregressive nature of AIM to generate novel images conditioned on text or other inputs.
- **Transfer Learning**: Leverage the pre-trained visual representations of AIM as a feature extractor for other computer vision tasks.

## Things to Try

One interesting aspect of AIM is its ability to scale to very large model sizes, up to billions of parameters. Experiment with different model sizes and compare the performance on your specific task to explore the scaling properties of the model. Additionally, try combining AIM with other techniques, such as few-shot learning or adversarial training, to further enhance its capabilities.

[](#stable-diffusion-v2-model-card)Stable Diffusion v2 Model Card
=================================================================

This model was generated by Hugging Face using [Apples repository](https://github.com/apple/ml-stable-diffusion) which has [ASCL](https://github.com/apple/ml-stable-diffusion/blob/main/LICENSE.md).

This model card focuses on the model associated with the Stable Diffusion v2 model, available [here](https://github.com/Stability-AI/stablediffusion).

The model is trained from scratch 550k steps at resolution `256x256` on a subset of [LAION-5B](https://laion.ai/blog/laion-5b/) filtered for explicit pornographic material, using the [LAION-NSFW classifier](https://github.com/LAION-AI/CLIP-based-NSFW-Detector) with `punsafe=0.1` and an [aesthetic score](https://github.com/christophschuhmann/improved-aesthetic-predictor) >= `4.5`. Then it is further trained for 850k steps at resolution `512x512` on the same dataset on images with resolution `>= 512x512`.

[![image](https://github.com/Stability-AI/stablediffusion/blob/main/assets/stable-samples/txt2img/merged-0003.png?raw=true)](https://github.com/Stability-AI/stablediffusion/blob/main/assets/stable-samples/txt2img/merged-0003.png?raw=true)

These weights here have been converted to Core ML for use on Apple Silicon hardware.

There are 4 variants of the Core ML weights:

    coreml-stable-diffusion-2-base
     original
        compiled              # Swift inference, "original" attention
        packages              # Python inference, "original" attention
     split_einsum
         compiled              # Swift inference, "split_einsum" attention
         packages              # Python inference, "split_einsum" attention
    

Please, refer to [https://huggingface.co/blog/diffusers-coreml](https://huggingface.co/blog/diffusers-coreml) for details.

*   Use it with  [`diffusers`](https://huggingface.co/stabilityai/stable-diffusion-2-base#examples)
*   Use it with the [`stablediffusion`](https://github.com/Stability-AI/stablediffusion) repository: download the `512-base-ema.ckpt` [here](https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt).

[](#model-details)Model Details
-------------------------------

*   **Developed by:** Robin Rombach, Patrick Esser
    
*   **Model type:** Diffusion-based text-to-image generation model
    
*   **Language(s):** English
    
*   **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
    
*   **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip)).
    
*   **Resources for more information:** [GitHub Repository](https://github.com/Stability-AI/).
    
*   **Cite as:**
    
        @InProceedings{Rombach_2022_CVPR,
            author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
            title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
            booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            month     = {June},
            year      = {2022},
            pages     = {10684-10695}
        }
        
    

[](#uses)Uses
=============

[](#direct-use)Direct Use
-------------------------

The model is intended for research purposes only. Possible research areas and tasks include

*   Safe deployment of models which have the potential to generate harmful content.
*   Probing and understanding the limitations and biases of generative models.
*   Generation of artworks and use in design and other artistic processes.
*   Applications in educational or creative tools.
*   Research on generative models.

Excluded uses are described below.

### [](#misuse-malicious-use-and-out-of-scope-use)Misuse, Malicious Use, and Out-of-Scope Use

_Note: This section is originally taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2_.

The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

#### [](#out-of-scope-use)Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

#### [](#misuse-and-malicious-use)Misuse and Malicious Use

Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:

*   Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.
*   Intentionally promoting or propagating discriminatory content or harmful stereotypes.
*   Impersonating individuals without their consent.
*   Sexual content without consent of the people who might see it.
*   Mis- and disinformation
*   Representations of egregious violence and gore
*   Sharing of copyrighted or licensed material in violation of its terms of use.
*   Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.

[](#limitations-and-bias)Limitations and Bias
---------------------------------------------

### [](#limitations)Limitations

*   The model does not achieve perfect photorealism
*   The model cannot render legible text
*   The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to A red cube on top of a blue sphere
*   Faces and people in general may not be generated properly.
*   The model was trained mainly with English captions and will not work as well in other languages.
*   The autoencoding part of the model is lossy
*   The model was trained on a subset of the large-scale dataset [LAION-5B](https://laion.ai/blog/laion-5b/), which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section).

### [](#bias)Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of [LAION-2B(en)](https://laion.ai/blog/laion-5b/), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.

[](#training)Training
---------------------

**Training Data** The model developers used the following dataset for training the model:

*   LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a "p\_unsafe" score of 0.1 (conservative). For more details, please refer to LAION-5B's [NeurIPS 2022](https://openreview.net/forum?id=M3Y74vmsMcY) paper and reviewer discussions on the topic.

**Training Procedure** Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,

*   Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
*   Text prompts are encoded through the OpenCLIP-ViT/H text-encoder.
*   The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
*   The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called _v-objective_, see [https://arxiv.org/abs/2202.00512](https://arxiv.org/abs/2202.00512).

We currently provide the following checkpoints:

*   `512-base-ema.ckpt`: 550k steps at resolution `256x256` on a subset of [LAION-5B](https://laion.ai/blog/laion-5b/) filtered for explicit pornographic material, using the [LAION-NSFW classifier](https://github.com/LAION-AI/CLIP-based-NSFW-Detector) with `punsafe=0.1` and an [aesthetic score](https://github.com/christophschuhmann/improved-aesthetic-predictor) >= `4.5`. 850k steps at resolution `512x512` on the same dataset with resolution `>= 512x512`.
    
*   `768-v-ema.ckpt`: Resumed from `512-base-ema.ckpt` and trained for 150k steps using a [v-objective](https://arxiv.org/abs/2202.00512) on the same dataset. Resumed for another 140k steps on a `768x768` subset of our dataset.
    
*   `512-depth-ema.ckpt`: Resumed from `512-base-ema.ckpt` and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by [MiDaS](https://github.com/isl-org/MiDaS) (`dpt_hybrid`) which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized.
    
*   `512-inpainting-ema.ckpt`: Resumed from `512-base-ema.ckpt` and trained for another 200k steps. Follows the mask-generation strategy presented in [LAMA](https://github.com/saic-mdal/lama) which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the [1.5-inpainting checkpoint](https://github.com/saic-mdal/lama).
    
*   `x4-upscaling-ema.ckpt`: Trained for 1.25M steps on a 10M subset of LAION containing images `>2048x2048`. The model was trained on crops of size `512x512` and is a text-guided [latent upscaling diffusion model](https://arxiv.org/abs/2112.10752). In addition to the textual input, it receives a `noise_level` as an input parameter, which can be used to add noise to the low-resolution input according to a [predefined diffusion schedule](/apple/coreml-stable-diffusion-2-base/blob/main/configs/stable-diffusion/x4-upscaling.yaml).
    
*   **Hardware:** 32 x 8 x A100 GPUs
    
*   **Optimizer:** AdamW
    
*   **Gradient Accumulations**: 1
    
*   **Batch:** 32 x 8 x 2 x 4 = 2048
    
*   **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant
    

[](#evaluation-results)Evaluation Results
-----------------------------------------

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints:

[![pareto](/apple/coreml-stable-diffusion-2-base/resolve/main/model-variants.jpg)](/apple/coreml-stable-diffusion-2-base/blob/main/model-variants.jpg)

Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores.

[](#environmental-impact)Environmental Impact
---------------------------------------------

**Stable Diffusion v1** **Estimated Emissions** Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

*   **Hardware Type:** A100 PCIe 40GB
*   **Hours used:** 200000
*   **Cloud Provider:** AWS
*   **Compute Region:** US-east
*   **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 15000 kg CO2 eq.

[](#citation)Citation
---------------------

    @InProceedings{Rombach_2022_CVPR,
        author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
        title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        month     = {June},
        year      = {2022},
        pages     = {10684-10695}
    }
    

_This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md) and [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini)._

## Model overview

The `coreml-stable-diffusion-2-base` model is a text-to-image generation model developed by Apple. It is a version of the Stable Diffusion v2 model that has been converted for use on Apple Silicon hardware. This model is capable of generating high-quality images from text prompts and can be used with the [`diffusers`](https://huggingface.co/stabilityai/stable-diffusion-2-base#examples) library.

The model was trained on a filtered subset of the large-scale [LAION-5B](https://laion.ai/blog/laion-5b/) dataset, with a focus on images with high aesthetic quality and the removal of explicit pornographic content. It uses a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) architecture that combines an autoencoder with a diffusion model, along with a fixed, pretrained text encoder ([OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip)).

There are four variants of the Core ML weights available, with different attention mechanisms and compilation targets. Users can choose the version that best fits their needs, whether that's Swift-based inference or Python-based inference, and the "original" or "split_einsum" attention mechanisms.

## Model inputs and outputs

### Inputs
- **Text prompt**: A natural language description of the desired image.

### Outputs
- **Generated image**: The model outputs a high-quality image that corresponds to the input text prompt.

## Capabilities

The `coreml-stable-diffusion-2-base` model is capable of generating a wide variety of images from text prompts, including scenes, objects, and abstract concepts. It can produce photorealistic images, as well as more stylized or imaginative compositions. The model performs well on a range of prompts, though it may struggle with more complex or compositional tasks.

## What can I use it for?

The `coreml-stable-diffusion-2-base` model is intended for research purposes only. Possible applications include:

- **Safe deployment of generative models**: Researching techniques to safely deploy models that have the potential to generate harmful content.
- **Understanding model biases**: Probing the limitations and biases of the model to improve future iterations.
- **Creative applications**: Generating artwork, designs, and other creative content.
- **Educational tools**: Developing interactive educational or creative applications.
- **Generative model research**: Furthering the state of the art in text-to-image generation.

The model should not be used to create content that is harmful, offensive, or in violation of copyrights.

## Things to try

One interesting aspect of the `coreml-stable-diffusion-2-base` model is the availability of different attention mechanisms and compilation targets. Users can experiment with the "original" and "split_einsum" attention variants to see how they perform on their specific use cases and hardware setups. Additionally, the model's ability to generate high-quality images at 512x512 resolution makes it a compelling tool for creative applications and research.

[](#openelm)OpenELM
===================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model apple/OpenELM-270M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model apple/OpenELM-270M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model apple/OpenELM-270M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-270M
    hf_model=apple/OpenELM-270M
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model overview

The `OpenELM-270M` model is part of the OpenELM family of open-source efficient language models developed by researchers at Apple. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. The models were pretrained on a large dataset containing RefinedWeb, deduplicated PILE, and subsets of RedPajama and Dolma v1.6, totaling around 1.8 trillion tokens. OpenELM models are available in different sizes, including 270M, 450M, 1.1B, and 3B parameters, with both pretrained and instruction-tuned versions.

## Model inputs and outputs

### Inputs
- **Text prompt**: The model takes in a text prompt as input, which can be used to generate continued text.

### Outputs
- **Continued text**: The model outputs a continuation of the provided text prompt, generating coherent and contextually relevant text.

## Capabilities

The `OpenELM-270M` model demonstrates strong performance across a variety of benchmark tasks, including common sense reasoning, reading comprehension, and natural language understanding. It achieves high scores on datasets like ARC, BoolQ, HellaSwag, PIQA, SciQ, and WinoGrande. Additionally, the instruction-tuned `OpenELM-270M-Instruct` model shows further improvements in several of these areas.

## What can I use it for?

The OpenELM models can be used for a wide range of natural language processing tasks, such as text generation, question answering, and language understanding. Developers and researchers can leverage these efficient models to build applications that require language-based capabilities, while benefiting from the open-source nature and transparency of the project. As with any large language model, it is important to carefully evaluate the model's performance and potential biases for your specific use case.

## Things to try

One interesting aspect of the OpenELM models is the ability to leverage different techniques to improve inference speed, such as lookup token speculative generation and model-wise speculative generation with an assistive model. Developers can experiment with these strategies to find the right balance between performance and efficiency for their particular applications.

[](#sd-xl-10-base-model-card-core-ml)SD-XL 1.0-base Model Card (Core ML)
========================================================================

This model was generated by Hugging Face using [Apples repository](https://github.com/apple/ml-stable-diffusion) which has [ASCL](https://github.com/apple/ml-stable-diffusion/blob/main/LICENSE.md). This version contains Core ML weights with the `ORIGINAL` attention implementation, suitable for running on macOS GPUs.

The Core ML weights are also distributed as a zip archive for use in the [Hugging Face demo app](https://github.com/huggingface/swift-coreml-diffusers) and other third party tools. The zip archive was created from the contents of the `original/compiled` folder in this repo. Please, refer to [https://huggingface.co/blog/diffusers-coreml](https://huggingface.co/blog/diffusers-coreml) for details.

The remaining contents of this model card were copied from the [original repo](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

[![row01](/apple/coreml-stable-diffusion-xl-base/resolve/main/01.png)](/apple/coreml-stable-diffusion-xl-base/blob/main/01.png)

[](#model)Model
---------------

[![pipeline](/apple/coreml-stable-diffusion-xl-base/resolve/main/pipeline.png)](/apple/coreml-stable-diffusion-xl-base/blob/main/pipeline.png)

[SDXL](https://arxiv.org/abs/2307.01952) consists of an [ensemble of experts](https://arxiv.org/abs/2211.01324) pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: [https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/)) specialized for the final denoising steps. Note that the base model can be used as a standalone module.

Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit ([https://arxiv.org/abs/2108.01073](https://arxiv.org/abs/2108.01073), also known as "img2img") to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations.

Source code is available at [https://github.com/Stability-AI/generative-models](https://github.com/Stability-AI/generative-models) .

### [](#model-description)Model Description

*   **Developed by:** Stability AI
*   **Model type:** Diffusion-based text-to-image generative model
*   **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
*   **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
*   **Resources for more information:** Check out our [GitHub Repository](https://github.com/Stability-AI/generative-models) and the [SDXL report on arXiv](https://arxiv.org/abs/2307.01952).

### [](#model-sources)Model Sources

For research purposes, we recommned our `generative-models` Github repository ([https://github.com/Stability-AI/generative-models](https://github.com/Stability-AI/generative-models)), which implements the most popoular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time. [Clipdrop](https://clipdrop.co/stable-diffusion) provides free SDXL inference.

*   **Repository:** [https://github.com/Stability-AI/generative-models](https://github.com/Stability-AI/generative-models)
*   **Demo:** [https://clipdrop.co/stable-diffusion](https://clipdrop.co/stable-diffusion)

[](#evaluation)Evaluation
-------------------------

[![comparison](/apple/coreml-stable-diffusion-xl-base/resolve/main/comparison.png)](/apple/coreml-stable-diffusion-xl-base/blob/main/comparison.png) The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.

### [](#-diffusers) Diffusers

Make sure to upgrade diffusers to >= 0.18.0:

    pip install diffusers --upgrade
    

In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:

    pip install invisible_watermark transformers accelerate safetensors
    

You can use the model then as follows

    from diffusers import DiffusionPipeline
    import torch
    
    pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
    pipe.to("cuda")
    
    # if using torch < 2.0
    # pipe.enable_xformers_memory_efficient_attention()
    
    prompt = "An astronaut riding a green horse"
    
    images = pipe(prompt=prompt).images[0]
    

When using `torch >= 2.0`, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:

    pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
    

If you are limited by GPU VRAM, you can enable _cpu offloading_ by calling `pipe.enable_model_cpu_offload` instead of `.to("cuda")`:

    - pipe.to("cuda")
    + pipe.enable_model_cpu_offload()
    

[](#uses)Uses
-------------

### [](#direct-use)Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

*   Generation of artworks and use in design and other artistic processes.
*   Applications in educational or creative tools.
*   Research on generative models.
*   Safe deployment of models which have the potential to generate harmful content.
*   Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

### [](#out-of-scope-use)Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

[](#limitations-and-bias)Limitations and Bias
---------------------------------------------

### [](#limitations)Limitations

*   The model does not achieve perfect photorealism
*   The model cannot render legible text
*   The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to A red cube on top of a blue sphere
*   Faces and people in general may not be generated properly.
*   The autoencoding part of the model is lossy.

### [](#bias)Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

## Model overview

The `coreml-stable-diffusion-xl-base` model is a text-to-image generation model developed by Apple. It is based on the Stable Diffusion XL (SDXL) model, which consists of an ensemble of experts pipeline for latent diffusion. The base model generates initial noisy latents, which are then further processed with a refinement model to produce the final denoised image. Alternatively, the base model can be used on its own in a two-stage pipeline to first generate latents and then apply a specialized high-resolution model for the final image.

## Model inputs and outputs

The `coreml-stable-diffusion-xl-base` model takes text prompts as input and generates corresponding images as output. The text prompts can describe a wide variety of scenes, objects, and concepts, which the model then translates into visual form.

### Inputs
- **Text prompt**: A natural language description of the desired image, such as "a photo of an astronaut riding a horse on mars".

### Outputs
- **Generated image**: The model outputs a corresponding image based on the input text prompt.

## Capabilities

The `coreml-stable-diffusion-xl-base` model is capable of generating high-quality, photorealistic images from text prompts. It can create a wide range of scenes, objects, and concepts, and performs significantly better than previous versions of Stable Diffusion. The model can also be used in a two-stage pipeline with a specialized high-resolution refinement model to further improve image quality.

## What can I use it for?

The `coreml-stable-diffusion-xl-base` model is intended for research purposes, such as the generation of artworks, applications in educational or creative tools, and probing the limitations and biases of generative models. The model should not be used to create content that is harmful, offensive, or misrepresents people or events.

## Things to try

Experiment with different text prompts to see the variety of images the model can generate. Try combining the base model with the [stable-diffusion-xl-refiner-1.0](https://aimodels.fyi/models/huggingFace/stable-diffusion-xl-refiner-10-stabilityai) model to see if the additional refinement step improves the image quality. Explore the model's capabilities and limitations, and consider how it could be applied in creative or educational contexts.

[](#openelm)OpenELM
===================

_Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari_

We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters. We release the complete framework, encompassing data preparation, training, fine-tuning, and evaluation procedures, alongside multiple pre-trained checkpoints and training logs, to facilitate open research.

Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.

[](#usage)Usage
---------------

We have provided an example function to generate output from OpenELM models loaded via [HuggingFace Hub](https://huggingface.co/docs/hub/) in `generate_openelm.py`.

You can try the model by running the following command:

    python generate_openelm.py --model apple/OpenELM-1_1B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
    

Please refer to [this link](https://huggingface.co/docs/hub/security-tokens) to obtain your hugging face access token.

Additional arguments to the hugging face generate function can be passed via `generate_kwargs`. As an example, to speedup the inference, you can try [lookup token speculative generation](https://huggingface.co/docs/transformers/generation_strategies) by passing the `prompt_lookup_num_tokens` argument as follows:

    python generate_openelm.py --model apple/OpenELM-1_1B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10
    

Alternatively, try model-wise speculative generation with an [assistive model](https://huggingface.co/blog/assisted-generation) by passing a smaller model through the `assistant_model` argument, for example:

    python generate_openelm.py --model apple/OpenELM-1_1B-Instruct --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]
    

[](#main-results)Main Results
-----------------------------

### [](#zero-shot)Zero-Shot

**Model Size**

**ARC-c**

**ARC-e**

**BoolQ**

**HellaSwag**

**PIQA**

**SciQ**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

26.45

45.08

**53.98**

46.71

69.75

**84.70**

**53.91**

54.37

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**30.55**

**46.68**

48.56

**52.07**

**70.78**

84.40

52.72

**55.11**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

27.56

48.06

55.78

53.97

72.31

87.20

58.01

57.56

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**30.38**

**50.00**

**60.37**

**59.34**

**72.63**

**88.00**

**58.96**

**59.95**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

32.34

**55.43**

63.58

64.81

**75.57**

**90.60**

61.72

63.44

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**37.97**

52.23

**70.00**

**71.20**

75.03

89.30

**62.75**

**65.50**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

35.58

59.89

67.40

72.44

78.24

**92.70**

65.51

67.39

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**39.42**

**61.74**

**68.17**

**76.36**

**79.00**

92.50

**66.85**

**69.15**

### [](#llm360)LLM360

**Model Size**

**ARC-c**

**HellaSwag**

**MMLU**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

47.15

25.72

**39.24**

**53.83**

38.72

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

**51.58**

**26.70**

38.72

53.20

**40.54**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

53.86

**26.01**

40.18

57.22

41.50

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

**59.31**

25.41

**40.48**

**58.33**

**43.41**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

65.71

**27.05**

36.98

63.22

45.93

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

**71.83**

25.65

**45.95**

**64.72**

**49.94**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

73.28

**26.76**

34.98

67.25

48.90

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

**76.87**

24.80

**38.76**

**67.96**

**51.22**

### [](#openllm-leaderboard)OpenLLM Leaderboard

**Model Size**

**ARC-c**

**CrowS-Pairs**

**HellaSwag**

**MMLU**

**PIQA**

**RACE**

**TruthfulQA**

**WinoGrande**

**Average**

[OpenELM-270M](https://huggingface.co/apple/OpenELM-270M)

27.65

**66.79**

47.15

25.72

69.75

30.91

**39.24**

**53.83**

45.13

[OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct)

**32.51**

66.01

**51.58**

**26.70**

**70.78**

33.78

38.72

53.20

**46.66**

[OpenELM-450M](https://huggingface.co/apple/OpenELM-450M)

30.20

**68.63**

53.86

**26.01**

72.31

33.11

40.18

57.22

47.69

[OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct)

**33.53**

67.44

**59.31**

25.41

**72.63**

**36.84**

**40.48**

**58.33**

**49.25**

[OpenELM-1\_1B](https://huggingface.co/apple/OpenELM-1_1B)

36.69

**71.74**

65.71

**27.05**

**75.57**

36.46

36.98

63.22

51.68

[OpenELM-1\_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct)

**41.55**

71.02

**71.83**

25.65

75.03

**39.43**

**45.95**

**64.72**

**54.40**

[OpenELM-3B](https://huggingface.co/apple/OpenELM-3B)

42.24

**73.29**

73.28

**26.76**

78.24

**38.76**

34.98

67.25

54.35

[OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct)

**47.70**

72.33

**76.87**

24.80

**79.00**

38.47

**38.76**

**67.96**

**55.73**

See the technical report for more results and comparison.

[](#evaluation)Evaluation
-------------------------

### [](#setup)Setup

Install the following dependencies:

    
    # install public lm-eval-harness
    
    harness_repo="public-lm-eval-harness"
    git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
    cd ${harness_repo}
    # use main branch on 03-15-2024, SHA is dc90fec
    git checkout dc90fec
    pip install -e .
    cd ..
    
    # 66d6242 is the main branch on 2024-04-01 
    pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
    pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
    

### [](#evaluate-openelm)Evaluate OpenELM

    
    # OpenELM-1_1B-Instruct
    hf_model=apple/OpenELM-1_1B-Instruct
    
    # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
    tokenizer=meta-llama/Llama-2-7b-hf
    add_bos_token=True
    batch_size=1
    
    mkdir lm_eval_output
    
    shot=0
    task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=5
    task=mmlu,winogrande
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=25
    task=arc_challenge,crows_pairs_english
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    
    shot=10
    task=hellaswag
    lm_eval --model hf \
            --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
            --tasks ${task} \
            --device cuda:0 \
            --num_fewshot ${shot} \
            --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
            --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

The release of OpenELM models aims to empower and enrich the open research community by providing access to state-of-the-art language models. Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.

[](#citation)Citation
---------------------

If you find our work useful, please cite:

    @article{mehtaOpenELMEfficientLanguage2024,
        title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
        shorttitle = {{OpenELM}},
        url = {https://arxiv.org/abs/2404.14619v1},
        language = {en},
        urldate = {2024-04-24},
        journal = {arXiv.org},
        author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
        month = apr,
        year = {2024},
    }
    
    @inproceedings{mehta2022cvnets, 
         author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
         title = {CVNets: High Performance Library for Computer Vision}, 
         year = {2022}, 
         booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
         series = {MM '22} 
    }

## Model overview

`OpenELM` is a family of open-source efficient language models developed by the team at Apple. These models use a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. The available model sizes range from 270M to 3B parameters, with both pretrained and instruction-tuned versions.

The pretraining dataset for these models contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Similar models in this family include the [OpenELM-3B-Instruct](https://aimodels.fyi/models/huggingFace/openelm-3b-instruct-apple), [OpenELM-270M-Instruct](https://aimodels.fyi/models/huggingFace/openelm-270m-instruct-apple), and [OpenELM-270M](https://aimodels.fyi/models/huggingFace/openelm-270m-apple).

## Model inputs and outputs

`OpenELM` is a text-to-text model, taking natural language prompts as input and generating relevant text responses. 

### Inputs
- Natural language prompts of varying lengths

### Outputs
- Coherent, context-appropriate text continuations and completions
- Responses to questions, instructions, or other prompts

## Capabilities

`OpenELM` models demonstrate strong performance across a variety of language understanding and generation benchmarks, including tasks like question answering, common sense reasoning, and text completion. The instruction-tuned versions, in particular, excel at following complex prompts and producing high-quality outputs tailored to the specified task.

## What can I use it for?

`OpenELM` models can be applied to a wide range of natural language processing tasks, such as:

- Automated text generation (e.g. story writing, article summarization, dialogue generation)
- Question answering and knowledge retrieval
- Language-based task completion (e.g. code generation, data analysis, creative writing)
- Conversational AI and chatbots
- Content creation and personalization

Developers and researchers can fine-tune these models on domain-specific data to create customized language models for their particular use cases.

## Things to try

One interesting aspect of the `OpenELM` models is their ability to perform well on few-shot or zero-shot learning tasks, where the model is able to adapt to new prompts and datasets with minimal additional training. This makes them a promising starting point for exploring transfer learning and rapid model adaptation.

Developers can also experiment with different generation strategies, such as using an assistive model or speculative generation techniques, to further improve the coherence and quality of the model's outputs.