[](#gpt-neo-13b)GPT-Neo 1.3B
============================

[](#model-description)Model Description
---------------------------------------

GPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular pre-trained model.

[](#training-data)Training data
-------------------------------

GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model.

[](#training-procedure)Training procedure
-----------------------------------------

This model was trained on the Pile for 380 billion tokens over 362,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

[](#intended-use-and-limitations)Intended Use and Limitations
-------------------------------------------------------------

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

### [](#how-to-use)How to use

You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

    >>> from transformers import pipeline
    >>> generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
    >>> generator("EleutherAI has", do_sample=True, min_length=50)
    
    [{'generated_text': 'EleutherAI has made a commitment to create new software packages for each of its major clients and has'}]
    

### [](#limitations-and-biases)Limitations and Biases

GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work.

GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.

As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.

[](#eval-results)Eval results
-----------------------------

### [](#linguistic-reasoning)Linguistic Reasoning

Model and Size

Pile BPB

Pile PPL

Wikitext PPL

Lambada PPL

Lambada Acc

Winogrande

Hellaswag

**GPT-Neo 1.3B**

**0.7527**

**6.159**

**13.10**

**7.498**

**57.23%**

**55.01%**

**38.66%**

GPT-2 1.5B

1.0468

\-----

17.48

10.634

51.21%

59.40%

40.03%

GPT-Neo 2.7B

0.7165

5.646

11.39

5.626

62.22%

56.50%

42.73%

GPT-3 Ada

0.9631

\-----

\-----

9.954

51.60%

52.90%

35.93%

### [](#physical-and-scientific-reasoning)Physical and Scientific Reasoning

Model and Size

MathQA

PubMedQA

Piqa

**GPT-Neo 1.3B**

**24.05%**

**54.40%**

**71.11%**

GPT-2 1.5B

23.64%

58.33%

70.78%

GPT-Neo 2.7B

24.72%

57.54%

72.14%

GPT-3 Ada

24.29%

52.80%

68.88%

### [](#down-stream-applications)Down-Stream Applications

TBD

### [](#bibtex-entry-and-citation-info)BibTeX entry and citation info

To cite this model, please use

    @software{gpt-neo,
      author       = {Black, Sid and
                      Leo, Gao and
                      Wang, Phil and
                      Leahy, Connor and
                      Biderman, Stella},
      title        = {{GPT-Neo: Large Scale Autoregressive Language 
                       Modeling with Mesh-Tensorflow}},
      month        = mar,
      year         = 2021,
      note         = {{If you use this software, please cite it using 
                       these metadata.}},
      publisher    = {Zenodo},
      version      = {1.0},
      doi          = {10.5281/zenodo.5297715},
      url          = {https://doi.org/10.5281/zenodo.5297715}
    }
    
    @article{gao2020pile,
      title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
      author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},
      journal={arXiv preprint arXiv:2101.00027},
      year={2020}
    }
    

[](#open-llm-leaderboard-evaluation-results)Open LLM Leaderboard Evaluation Results
===================================================================================

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_EleutherAI__gpt-neo-1.3B)

Metric

Value

Avg.

29.44

ARC (25-shot)

31.23

HellaSwag (10-shot)

48.47

MMLU (5-shot)

24.82

TruthfulQA (0-shot)

39.63

Winogrande (5-shot)

56.91

GSM8K (5-shot)

0.45

DROP (3-shot)

4.6

## Model overview

`GPT-Neo 1.3B` is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. The model was trained on the Pile, a large-scale curated dataset created by EleutherAI. `GPT-Neo` refers to the class of models, while `1.3B` represents the number of parameters of this particular pre-trained model.

Compared to similar models like `GPT-Neo 2.7B` and `GPT-J 6B`, `GPT-Neo 1.3B` has a smaller parameter size but still demonstrates strong performance on a variety of language tasks. The model was trained using a similar approach to GPT-3, learning an inner representation of the English language that can then be used to extract features useful for downstream applications.

## Model inputs and outputs

`GPT-Neo 1.3B` is a language model that takes in a string of text as input and generates the next token in the sequence. The model can be used for a variety of text-to-text tasks, such as text generation, summarization, and question answering.

### Inputs
- A string of text, which the model will use to predict the next token

### Outputs
- A predicted token that continues the input text sequence
- The model can be used to generate full text passages by repeatedly applying the model to generate the next token

## Capabilities

`GPT-Neo 1.3B` demonstrates strong performance on a variety of language understanding and generation tasks. On the LAMBADA task, which measures language modeling ability, the model achieves a perplexity of 7.498. It also performs well on other benchmarks like Winogrande (55.01% accuracy) and Hellaswag (38.66% accuracy).

While the model was not specifically fine-tuned for downstream tasks, its general language understanding capabilities make it useful for applications like text summarization, question answering, and creative writing assistance. The model can generate fluent and contextually relevant text, though users should be mindful of potential biases or inaccuracies in the generated output.

## What can I use it for?

`GPT-Neo 1.3B` can be a valuable tool for a variety of natural language processing applications. Researchers and developers may find it useful for pre-training on language tasks or as a starting point for fine-tuning on specific domains or applications.

For example, the model could be fine-tuned for summarization tasks, where it generates concise summaries of longer text passages. It could also be used for question answering, where the model is prompted with a question and generates a relevant answer. In the creative writing domain, the model can assist with ideation and text generation to help writers overcome writer's block.

However, as with all language models, users should be cautious about deploying `GPT-Neo 1.3B` in high-stakes applications without thorough testing and curation of the model outputs. The model was trained on a dataset that may contain biases or inaccuracies, so it's important to carefully evaluate the model's behavior and outputs before relying on them for critical tasks.

## Things to try

One interesting aspect of `GPT-Neo 1.3B` is its strong performance on the Winogrande benchmark, which tests the model's ability to reason about complex linguistic phenomena. Developers could explore using the model for tasks that require deeper language understanding, such as commonsense reasoning or natural language inference.

Another area to explore is the model's potential for open-ended text generation. By providing the model with creative prompts, users can see what kinds of imaginative and engaging text it can produce. This could be useful for applications like story writing assistance or chatbots that engage in open-ended dialogue.

Ultimately, the versatility of `GPT-Neo 1.3B` means that there are many possibilities for experimentation and exploration. By understanding the model's strengths and limitations, developers can find innovative ways to apply it to a wide range of natural language processing tasks.