[](#gpt-neo-125m)GPT-Neo 125M
=============================

[](#model-description)Model Description
---------------------------------------

GPT-Neo 125M is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 125M represents the number of parameters of this particular pre-trained model.

[](#training-data)Training data
-------------------------------

GPT-Neo 125M was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model.

[](#training-procedure)Training procedure
-----------------------------------------

This model was trained on the Pile for 300 billion tokens over 572,300 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

[](#intended-use-and-limitations)Intended Use and Limitations
-------------------------------------------------------------

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

### [](#how-to-use)How to use

You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

    >>> from transformers import pipeline
    >>> generator = pipeline('text-generation', model='EleutherAI/gpt-neo-125M')
    >>> generator("EleutherAI has", do_sample=True, min_length=20)
    
    [{'generated_text': 'EleutherAI has made a commitment to create new software packages for each of its major clients and has'}]
    

### [](#limitations-and-biases)Limitations and Biases

GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work.

GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.

As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.

[](#eval-results)Eval results
-----------------------------

TBD

### [](#down-stream-applications)Down-Stream Applications

TBD

### [](#bibtex-entry-and-citation-info)BibTeX entry and citation info

To cite this model, use

    
    @software{gpt-neo,
    
      author       = {Black, Sid and
                      Leo, Gao and
                      Wang, Phil and
                      Leahy, Connor and
                      Biderman, Stella},
      title        = {{GPT-Neo: Large Scale Autoregressive Language 
                       Modeling with Mesh-Tensorflow}},
      month        = mar,
      year         = 2021,
      note         = {{If you use this software, please cite it using 
                       these metadata.}},
      publisher    = {Zenodo},
      version      = {1.0},
      doi          = {10.5281/zenodo.5297715},
      url          = {https://doi.org/10.5281/zenodo.5297715}
    }
    
    @article{gao2020pile,
      title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
      author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},
      journal={arXiv preprint arXiv:2101.00027},
      year={2020}
    }
    

[](#open-llm-leaderboard-evaluation-results)Open LLM Leaderboard Evaluation Results
===================================================================================

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_EleutherAI__gpt-neo-125m)

Metric

Value

Avg.

25.79

ARC (25-shot)

22.95

HellaSwag (10-shot)

30.26

MMLU (5-shot)

25.97

TruthfulQA (0-shot)

45.58

Winogrande (5-shot)

51.78

GSM8K (5-shot)

0.3

DROP (3-shot)

3.69

## Model overview

The `gpt-neo-125m` is a 125 million parameter transformer model developed by [EleutherAI](https://aimodels.fyi/creators/huggingFace/EleutherAI), a collective of AI researchers and engineers. It is a replication of the GPT-3 architecture, with the "GPT-Neo" referring to the class of models. This particular model was trained on the Pile, a large-scale curated dataset created by EleutherAI, for 300 billion tokens over 572,300 steps.

Compared to similar models, the `gpt-neo-125m` is a smaller and more lightweight version of [GPT-Neo 2.7B](https://aimodels.fyi/models/huggingFace/gpt-neo-27b-eleutherai) and [GPT-NeoX-20B](https://aimodels.fyi/models/huggingFace/gpt-neox-20b-eleutherai), which have 2.7 billion and 20 billion parameters respectively. These larger models demonstrate improved performance on various benchmarks compared to the 125M version.

## Model inputs and outputs

### Inputs
- **Text prompt**: The model takes in a text prompt as input, which it uses to generate the next token in a sequence.

### Outputs
- **Generated text**: The model outputs a sequence of generated text, continuing from the provided prompt. The generated text is produced in an autoregressive manner, with the model predicting the next token based on the previous tokens in the sequence.

## Capabilities

The `gpt-neo-125m` model is a capable language generation model that can be used to produce human-like text from a given prompt. It has learned an internal representation of the English language that allows it to generate coherent and contextually relevant text. However, as an autoregressive model, it is best suited for tasks like text generation and may not perform as well on other NLP tasks that require more sophisticated reasoning.

## What can I use it for?

The `gpt-neo-125m` model can be used for a variety of text generation tasks, such as creative writing, content generation, and chatbots. For example, you could use the model to generate product descriptions, short stories, or engaging dialog. The model's relatively small size also makes it suitable for deployment on resource-constrained devices or platforms.

However, it's important to note that the model was trained on a dataset that contains potentially offensive content, so the generated text may include biases, profanity, or other undesirable content. It's recommended to carefully curate and filter the model's outputs before using them in production or releasing them to end-users.

## Things to try

One interesting aspect of the `gpt-neo-125m` model is its ability to capture and generate long-range dependencies in text. Try providing the model with a long, multi-sentence prompt and see how it continues the narrative, maintaining coherence and consistency over several paragraphs. This can showcase the model's understanding of contextual information and its capacity for generating coherent, extended passages of text.

Additionally, you can experiment with providing the model with prompts that require some level of reasoning or world knowledge, such as answering questions or completing tasks. While the model may not excel at these types of tasks out-of-the-box, observing its strengths and limitations can provide valuable insights into its capabilities and potential areas for improvement.