[](#gpt-2)GPT-2
===============

Test the whole generation capabilities here: [https://transformer.huggingface.co/doc/gpt2-large](https://transformer.huggingface.co/doc/gpt2-large)

Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and first released at [this page](https://openai.com/blog/better-language-models/).

Disclaimer: The team releasing GPT-2 also wrote a [model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.

[](#model-description)Model description
---------------------------------------

GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.

More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

This is the **smallest** version of GPT-2, with 124M parameters.

**Related Models:** [GPT-Large](https://huggingface.co/gpt2-large), [GPT-Medium](https://huggingface.co/gpt2-medium) and [GPT-XL](https://huggingface.co/gpt2-xl)

[](#intended-uses--limitations)Intended uses & limitations
----------------------------------------------------------

You can use the raw model for text generation or fine-tune it to a downstream task. See the [model hub](https://huggingface.co/models?filter=gpt2) to look for fine-tuned versions on a task that interests you.

### [](#how-to-use)How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2')
    >>> set_seed(42)
    >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
    
    [{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
     {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
     {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},
     {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"},
     {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}]
    

Here is how to use this model to get the features of a given text in PyTorch:

    from transformers import GPT2Tokenizer, GPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2Model.from_pretrained('gpt2')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    

and in TensorFlow:

    from transformers import GPT2Tokenizer, TFGPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = TFGPT2Model.from_pretrained('gpt2')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    output = model(encoded_input)
    

### [](#limitations-and-bias)Limitations and bias

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. As the openAI team themselves point out in their [model card](https://github.com/openai/gpt-2/blob/master/model_card.md#out-of-scope-use-cases):

> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we dont support use-cases that require the generated text to be true.
> 
> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.

Here's an example of how the model can have biased predictions:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2')
    >>> set_seed(42)
    >>> generator("The White man worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The White man worked as a mannequin for'},
     {'generated_text': 'The White man worked as a maniser of the'},
     {'generated_text': 'The White man worked as a bus conductor by day'},
     {'generated_text': 'The White man worked as a plumber at the'},
     {'generated_text': 'The White man worked as a journalist. He had'}]
    
    >>> set_seed(42)
    >>> generator("The Black man worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The Black man worked as a man at a restaurant'},
     {'generated_text': 'The Black man worked as a car salesman in a'},
     {'generated_text': 'The Black man worked as a police sergeant at the'},
     {'generated_text': 'The Black man worked as a man-eating monster'},
     {'generated_text': 'The Black man worked as a slave, and was'}]
    

This bias will also affect all fine-tuned versions of this model.

[](#training-data)Training data
-------------------------------

The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText [here](https://github.com/openai/gpt-2/blob/master/domains.txt).

[](#training-procedure)Training procedure
-----------------------------------------

### [](#preprocessing)Preprocessing

The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.

The larger model was trained on 256 cloud TPU v3 cores. The training duration was not disclosed, nor were the exact details of training.

[](#evaluation-results)Evaluation results
-----------------------------------------

The model achieves the following results without any fine-tuning (zero-shot):

Dataset

LAMBADA

LAMBADA

CBT-CN

CBT-NE

WikiText2

PTB

enwiki8

text8

WikiText103

1BW

(metric)

(PPL)

(ACC)

(ACC)

(ACC)

(PPL)

(PPL)

(BPB)

(BPC)

(PPL)

(PPL)

35.13

45.99

87.65

83.4

29.41

65.85

1.16

1,17

37.50

75.20

### [](#bibtex-entry-and-citation-info)BibTeX entry and citation info

    @article{radford2019language,
      title={Language Models are Unsupervised Multitask Learners},
      author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
      year={2019}
    }
    

[![](https://cdn-media.huggingface.co/exbert/button.png)](https://huggingface.co/exbert/?model=gpt2)

## Model overview

`gpt2` is a transformer-based language model created and released by OpenAI. It is the smallest version of the GPT-2 model, with 124 million parameters. Like other GPT-2 models, `gpt2` is a causal language model pretrained on a large corpus of English text using a self-supervised objective to predict the next token in a sequence. This allows the model to learn a general understanding of the English language that can be leveraged for a variety of downstream tasks.

The `gpt2` model is related to larger GPT-2 variations such as [GPT2-Large](https://huggingface.co/gpt2-large), [GPT2-Medium](https://huggingface.co/gpt2-medium), and [GPT2-XL](https://huggingface.co/gpt2-xl), which have 355 million, 774 million, and 1.5 billion parameters respectively. These larger models were also developed and released by the OpenAI community.

## Model inputs and outputs

### Inputs
- **Text sequence**: The model takes a sequence of text as input, which it uses to generate additional text.

### Outputs
- **Generated text**: The model outputs a continuation of the input text sequence, generating new text one token at a time in an autoregressive fashion.

## Capabilities

The `gpt2` model is capable of generating fluent, coherent text in English on a wide variety of topics. It can be used for tasks like creative writing, text summarization, and language modeling. However, as the OpenAI team notes, the model does not distinguish fact from fiction, so it should not be used for applications that require the generated text to be truthful.

## What can I use it for?

The `gpt2` model can be used for a variety of text generation tasks. Researchers may use it to better understand the behaviors, capabilities, and biases of large-scale language models. The model could also be fine-tuned for applications like grammar assistance, auto-completion, creative writing, and chatbots. However, users should be aware of the model's limitations and potential for biased or harmful output, as discussed in the [OpenAI model card](https://github.com/openai/gpt-2/blob/master/model_card.md).

## Things to try

One interesting aspect of the `gpt2` model is its ability to generate diverse and creative text from a given prompt. You can experiment with providing the model with different types of starting prompts, such as the beginning of a story, a description of a scene, or even a single word, and see what kind of coherent and imaginative text it generates in response. Additionally, you can try fine-tuning the model on a specific domain or task to see how its performance and output changes compared to the base model.

[](#gpt-2-xl)GPT-2 XL
=====================

[](#table-of-contents)Table of Contents
---------------------------------------

*   [Model Details](#model-details)
*   [How To Get Started With the Model](#how-to-get-started-with-the-model)
*   [Uses](#uses)
*   [Risks, Limitations and Biases](#risks-limitations-and-biases)
*   [Training](#training)
*   [Evaluation](#evaluation)
*   [Environmental Impact](#environmental-impact)
*   [Technical Specifications](#technical-specifications)
*   [Citation Information](#citation-information)
*   [Model Card Authors](#model-card-authors)

[](#model-details)Model Details
-------------------------------

**Model Description:** GPT-2 XL is the **1.5B parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

*   **Developed by:** OpenAI, see [associated research paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GitHub repo](https://github.com/openai/gpt-2) for model developers.
*   **Model Type:** Transformer-based language model
*   **Language(s):** English
*   **License:** [Modified MIT License](https://github.com/openai/gpt-2/blob/master/LICENSE)
*   **Related Models:** [GPT-2](https://huggingface.co/gpt2), [GPT-Medium](https://huggingface.co/gpt2-medium) and [GPT-Large](https://huggingface.co/gpt2-large)
*   **Resources for more information:**
    *   [Research Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    *   [OpenAI Blog Post](https://openai.com/blog/better-language-models/)
    *   [GitHub Repo](https://github.com/openai/gpt-2)
    *   [OpenAI Model Card for GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md)
    *   [OpenAI GPT-2 1.5B Release Blog Post](https://openai.com/blog/gpt-2-1-5b-release/)
    *   Test the full generation capabilities here: [https://transformer.huggingface.co/doc/gpt2-large](https://transformer.huggingface.co/doc/gpt2-large)

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

    from transformers import pipeline, set_seed
    generator = pipeline('text-generation', model='gpt2-xl')
    set_seed(42)
    generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
    

Here is how to use this model to get the features of a given text in PyTorch:

    from transformers import GPT2Tokenizer, GPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')
    model = GPT2Model.from_pretrained('gpt2-xl')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    

and in TensorFlow:

    from transformers import GPT2Tokenizer, TFGPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')
    model = TFGPT2Model.from_pretrained('gpt2-xl')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    output = model(encoded_input)
    

[](#uses)Uses
-------------

#### [](#direct-use)Direct Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> The primary intended users of these models are AI researchers and practitioners.
> 
> We primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.

#### [](#downstream-use)Downstream Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Here are some secondary use cases we believe are likely:
> 
> *   Writing assistance: Grammar assistance, autocompletion (for normal prose or code)
> *   Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.
> *   Entertainment: Creation of games, chat bots, and amusing generations.

#### [](#misuse-and-out-of-scope-use)Misuse and Out-of-scope Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we dont support use-cases that require the generated text to be true.
> 
> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.

[](#risks-limitations-and-biases)Risks, Limitations and Biases
--------------------------------------------------------------

**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**

#### [](#biases)Biases

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

    from transformers import pipeline, set_seed
    generator = pipeline('text-generation', model='gpt2-xl')
    set_seed(42)
    generator("The man worked as a", max_length=10, num_return_sequences=5)
    
    set_seed(42)
    generator("The woman worked as a", max_length=10, num_return_sequences=5)
    

This bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

#### [](#risks-and-limitations)Risks and Limitations

When they released the 1.5B parameter model, OpenAI wrote in a [blog post](https://openai.com/blog/gpt-2-1-5b-release/):

> GPT-2 can be fine-tuned for misuse. Our partners at the Middlebury Institute of International Studies Center on Terrorism, Extremism, and Counterterrorism (CTEC) found that extremist groups can use GPT-2 for misuse, specifically by fine-tuning GPT-2 models on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that its possible to create models that can generate synthetic propaganda for these ideologies. They also show that, despite having low detection accuracy on synthetic outputs, ML-based detection methods can give experts reasonable suspicion that an actor is generating synthetic text.

The blog post further discusses the risks, limitations, and biases of the model.

[](#training)Training
---------------------

#### [](#training-data)Training Data

The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText [here](https://github.com/openai/gpt-2/blob/master/domains.txt).

#### [](#training-procedure)Training Procedure

The model is pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.

More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks.

The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.

[](#evaluation)Evaluation
-------------------------

The following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

#### [](#testing-data-factors-and-metrics)Testing Data, Factors and Metrics

The model authors write in the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) that:

> Since our model operates on a byte level and does not require lossy pre-processing or tokenization, we can evaluate it on any language model benchmark. Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word. We evaluate the same quantity by computing the log-probability of a dataset according to a WebText LM and dividing by the number of canonical units. For many of these datasets, WebText LMs would be tested significantly out- of-distribution, having to predict aggressively standardized text, tokenization artifacts such as disconnected punctuation and contractions, shuffled sentences, and even the string which is extremely rare in WebText - occurring only 26 times in 40 billion bytes. We report our main results...using invertible de-tokenizers which remove as many of these tokenization / pre-processing artifacts as possible. Since these de-tokenizers are invertible, we can still calculate the log probability of a dataset and they can be thought of as a simple form of domain adaptation.

#### [](#results)Results

The model achieves the following results without any fine-tuning (zero-shot):

Dataset

LAMBADA

LAMBADA

CBT-CN

CBT-NE

WikiText2

PTB

enwiki8

text8

WikiText103

1BW

(metric)

(PPL)

(ACC)

(ACC)

(ACC)

(PPL)

(PPL)

(BPB)

(BPC)

(PPL)

(PPL)

8.63

63.24

93.30

89.05

18.34

35.76

0.93

0.98

17.48

42.16

[](#environmental-impact)Environmental Impact
---------------------------------------------

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware type and hours used are based on information provided by one of the model authors on [Reddit](https://bit.ly/2Tw1x4L).

*   **Hardware Type:** 32 TPUv3 chips
*   **Hours used:** 168
*   **Cloud Provider:** Unknown
*   **Compute Region:** Unknown
*   **Carbon Emitted:** Unknown

[](#technical-specifications)Technical Specifications
-----------------------------------------------------

See the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) for details on the modeling architecture, objective, and training details.

[](#citation-information)Citation Information
---------------------------------------------

    @article{radford2019language,
      title={Language models are unsupervised multitask learners},
      author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
      journal={OpenAI blog},
      volume={1},
      number={8},
      pages={9},
      year={2019}
    }
    

[](#model-card-authors)Model Card Authors
-----------------------------------------

This model card was written by the Hugging Face team.

## Model overview

The `gpt2-xl` model is a large, 1.5 billion parameter transformer-based language model developed and released by OpenAI. It is a scaled-up version of the original GPT-2 model, with improvements to the model architecture and increased training data. Compared to similar models like [DistilGPT2](https://aimodels.fyi/models/huggingFace/distilgpt2-distilbert), `gpt2-xl` has significantly more parameters, allowing it to capture more complex patterns in language. However, the larger size also means it requires more computational resources to run. The model was trained on a large corpus of English text data, giving it broad knowledge and capabilities in generating natural language.

## Model inputs and outputs

The `gpt2-xl` model takes text as input and generates additional text as output. The input can be a single sentence, a paragraph, or even multiple paragraphs, and the model will attempt to continue the text in a coherent and natural way. The output is also text, with the length determined by the user. The model can be used for a variety of language generation tasks, such as story writing, summarization, and query answering.

### Inputs
- **Text**: The input text that the model will use to generate additional text.

### Outputs
- **Generated Text**: The text generated by the model, continuing the input text in a coherent and natural way.

## Capabilities

The `gpt2-xl` model excels at language generation tasks, where it can produce human-like text that is fluent and coherent. It has been used for a variety of applications, such as creative writing, text summarization, and question answering. The model's large size and broad training data allow it to adapt to a wide range of topics and styles, making it a versatile tool for natural language processing.

## What can I use it for?

The `gpt2-xl` model can be used for a variety of natural language processing tasks, such as:

- **Creative writing**: The model can be used to generate original stories, poems, or other creative content by providing it with a prompt or starting point.
- **Summarization**: By inputting a longer text, the model can generate a concise summary of the key points.
- **Question answering**: The model can be used to answer questions by generating relevant and informative responses.
- **Dialogue generation**: The model can be used to create chatbots or virtual assistants that can engage in natural conversations.

Additionally, the model can be fine-tuned on specific datasets or tasks to improve its performance in those areas. For example, fine-tuning the model on a domain-specific corpus could make it better suited for generating technical or scientific content.

## Things to try

One interesting aspect of the `gpt2-xl` model is its ability to generate text that maintains coherence and consistency over long sequences. This makes it well-suited for generating extended narratives or dialogues, where the model needs to keep track of context and character development.

Another interesting experiment would be to explore the model's ability to handle different writing styles or genres. By providing the model with prompts or examples in various styles, such as formal academic writing, creative fiction, or casual conversational language, you could see how the generated output adapts and reflects those stylistic qualities.

Additionally, you could investigate the model's performance on multilingual tasks. While the `gpt2-xl` model was primarily trained on English data, the related [XLM-RoBERTa](https://aimodels.fyi/models/huggingFace/xlm-roberta-base-facebookai) model has been trained on a multilingual corpus and may be better suited for tasks involving multiple languages.

[](#gpt-2-large)GPT-2 Large
===========================

[](#table-of-contents)Table of Contents
---------------------------------------

*   [Model Details](#model-details)
*   [How To Get Started With the Model](#how-to-get-started-with-the-model)
*   [Uses](#uses)
*   [Risks, Limitations and Biases](#risks-limitations-and-biases)
*   [Training](#training)
*   [Evaluation](#evaluation)
*   [Environmental Impact](#environmental-impact)
*   [Technical Specifications](#technical-specifications)
*   [Citation Information](#citation-information)
*   [Model Card Authors](#model-card-author)

[](#model-details)Model Details
-------------------------------

**Model Description:** GPT-2 Large is the **774M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

*   **Developed by:** OpenAI, see [associated research paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GitHub repo](https://github.com/openai/gpt-2) for model developers.
*   **Model Type:** Transformer-based language model
*   **Language(s):** English
*   **License:** [Modified MIT License](https://github.com/openai/gpt-2/blob/master/LICENSE)
*   **Related Models:** [GPT-2](https://huggingface.co/gpt2), [GPT-Medium](https://huggingface.co/gpt2-medium) and [GPT-XL](https://huggingface.co/gpt2-xl)
*   **Resources for more information:**
    *   [Research Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    *   [OpenAI Blog Post](https://openai.com/blog/better-language-models/)
    *   [GitHub Repo](https://github.com/openai/gpt-2)
    *   [OpenAI Model Card for GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md)
    *   Test the full generation capabilities here: [https://transformer.huggingface.co/doc/gpt2-large](https://transformer.huggingface.co/doc/gpt2-large)

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2-large')
    >>> set_seed(42)
    >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
    
    [{'generated_text': "Hello, I'm a language model, I can do language modeling. In fact, this is one of the reasons I use languages. To get a"},
     {'generated_text': "Hello, I'm a language model, which in its turn implements a model of how a human can reason about a language, and is in turn an"},
     {'generated_text': "Hello, I'm a language model, why does this matter for you?\n\nWhen I hear new languages, I tend to start thinking in terms"},
     {'generated_text': "Hello, I'm a language model, a functional language...\n\nI don't need to know anything else. If I want to understand about how"},
     {'generated_text': "Hello, I'm a language model, not a toolbox.\n\nIn a nutshell, a language model is a set of attributes that define how"}]
    

Here is how to use this model to get the features of a given text in PyTorch:

    from transformers import GPT2Tokenizer, GPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')
    model = GPT2Model.from_pretrained('gpt2-large')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    

and in TensorFlow:

    from transformers import GPT2Tokenizer, TFGPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')
    model = TFGPT2Model.from_pretrained('gpt2-large')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    output = model(encoded_input)
    

[](#uses)Uses
-------------

#### [](#direct-use)Direct Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> The primary intended users of these models are AI researchers and practitioners.
> 
> We primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.

#### [](#downstream-use)Downstream Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Here are some secondary use cases we believe are likely:
> 
> *   Writing assistance: Grammar assistance, autocompletion (for normal prose or code)
> *   Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.
> *   Entertainment: Creation of games, chat bots, and amusing generations.

#### [](#misuse-and-out-of-scope-use)Misuse and Out-of-scope Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we dont support use-cases that require the generated text to be true.
> 
> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.

[](#risks-limitations-and-biases)Risks, Limitations and Biases
--------------------------------------------------------------

**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2-large')
    >>> set_seed(42)
    >>> generator("The man worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The man worked as a security guard in a hotel'},
     {'generated_text': 'The man worked as a salesman in Mexico and in'},
     {'generated_text': 'The man worked as a supervisor at the warehouse for'},
     {'generated_text': "The man worked as a cleaner for the store's"},
     {'generated_text': 'The man worked as a barbershop apprentice.'}]
    
    >>> set_seed(42)
    >>> generator("The woman worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The woman worked as a clerk at the bank.'},
     {'generated_text': 'The woman worked as a caregiver, and her'},
     {'generated_text': 'The woman worked as a customer service agent for a'},
     {'generated_text': 'The woman worked as a cleaner at the store,'},
     {'generated_text': 'The woman worked as a barista and was "'}]
    

This bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

[](#training)Training
---------------------

#### [](#training-data)Training Data

The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText [here](https://github.com/openai/gpt-2/blob/master/domains.txt).

#### [](#training-procedure)Training Procedure

The model is pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.

More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks.

The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.

[](#evaluation)Evaluation
-------------------------

The following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

#### [](#testing-data-factors-and-metrics)Testing Data, Factors and Metrics

The model authors write in the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) that:

> Since our model operates on a byte level and does not require lossy pre-processing or tokenization, we can evaluate it on any language model benchmark. Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word. We evaluate the same quantity by computing the log-probability of a dataset according to a WebText LM and dividing by the number of canonical units. For many of these datasets, WebText LMs would be tested significantly out- of-distribution, having to predict aggressively standardized text, tokenization artifacts such as disconnected punctuation and contractions, shuffled sentences, and even the string which is extremely rare in WebText - occurring only 26 times in 40 billion bytes. We report our main results...using invertible de-tokenizers which remove as many of these tokenization / pre-processing artifacts as possible. Since these de-tokenizers are invertible, we can still calculate the log probability of a dataset and they can be thought of as a simple form of domain adaptation.

#### [](#results)Results

The model achieves the following results without any fine-tuning (zero-shot):

Dataset

LAMBADA

LAMBADA

CBT-CN

CBT-NE

WikiText2

PTB

enwiki8

text8

WikiText103

1BW

(metric)

(PPL)

(ACC)

(ACC)

(ACC)

(PPL)

(PPL)

(BPB)

(BPC)

(PPL)

(PPL)

10.87

60.12

93.45

88.0

19.93

40.31

0.97

1.02

22.05

44.575

[](#environmental-impact)Environmental Impact
---------------------------------------------

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Unknown
*   **Hours used:** Unknown
*   **Cloud Provider:** Unknown
*   **Compute Region:** Unknown
*   **Carbon Emitted:** Unknown

[](#technical-specifications)Technical Specifications
-----------------------------------------------------

See the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) for details on the modeling architecture, objective, compute infrastructure, and training details.

[](#citation-information)Citation Information
---------------------------------------------

    @article{radford2019language,
      title={Language models are unsupervised multitask learners},
      author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
      journal={OpenAI blog},
      volume={1},
      number={8},
      pages={9},
      year={2019}
    }
    

[](#model-card-authors)Model Card Authors
-----------------------------------------

This model card was written by the Hugging Face team.

## Model overview

The `gpt2-large` model is a 774M parameter transformer-based language model created and released by OpenAI. It is a pretrained model on English language using a causal language modeling (CLM) objective. The `gpt2-large` model is the largest version of the GPT-2 family of models, which also includes the [GPT-Medium](https://aimodels.fyi/models/huggingFace/gpt2-medium-openai-community) and [GPT-XL](https://aimodels.fyi/models/huggingFace/gpt2-xl-openai-community) versions.

## Model inputs and outputs

### Inputs
- The model accepts text prompts as input, which it uses to generate additional text.

### Outputs
- The model outputs generated text, which can be used for a variety of language generation tasks.

## Capabilities

The `gpt2-large` model is capable of generating coherent and contextually relevant text based on the provided prompt. It can be used for tasks like article generation, story writing, and creative text composition. The model's large size allows it to capture complex patterns in language and generate more sophisticated output compared to smaller language models.

## What can I use it for?

The `gpt2-large` model can be used for a wide range of text generation tasks, such as:

- Authoring articles, stories, or scripts
- Generating product descriptions or marketing copy
- Aiding in creative writing and ideation
- Building chatbots and conversational agents
- Providing autocompletion and language assistance tools

While the model is powerful, users should be aware of its potential biases and limitations, as discussed in the [OpenAI Model Card for GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md).

## Things to try

One interesting aspect of the `gpt2-large` model is its ability to generate diverse and imaginative text based on a simple prompt. Try providing the model with a short phrase or sentence and see how it expands and elaborates on the idea. You can also experiment with different prompting techniques, such as using specific keywords or persona descriptions, to guide the model's output in different directions.

[](#openai-gpt-1)OpenAI GPT 1
=============================

[](#table-of-contents)Table of Contents
---------------------------------------

*   [Model Details](#model-details)
*   [How To Get Started With the Model](#how-to-get-started-with-the-model)
*   [Uses](#uses)
*   [Risks, Limitations and Biases](#risks-limitations-and-biases)
*   [Training](#training)
*   [Evaluation](#evaluation)
*   [Environmental Impact](#environmental-impact)
*   [Technical Specifications](#technical-specifications)
*   [Citation Information](#citation-information)
*   [Model Card Authors](#model-card-authors)

[](#model-details)Model Details
-------------------------------

**Model Description:** `openai-gpt` (a.k.a. "GPT-1") is the first transformer-based language model created and released by OpenAI. The model is a causal (unidirectional) transformer pre-trained using language modeling on a large corpus with long range dependencies.

*   **Developed by:** Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. See [associated research paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) and [GitHub repo](https://github.com/openai/finetune-transformer-lm) for model developers and contributors.
*   **Model Type:** Transformer-based language model
*   **Language(s):** English
*   **License:** [MIT License](https://github.com/openai/finetune-transformer-lm/blob/master/LICENSE)
*   **Related Models:** [GPT2](https://huggingface.co/gpt2), [GPT2-Medium](https://huggingface.co/gpt2-medium), [GPT2-Large](https://huggingface.co/gpt2-large) and [GPT2-XL](https://huggingface.co/gpt2-xl)
*   **Resources for more information:**
    *   [Research Paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)
    *   [OpenAI Blog Post](https://openai.com/blog/language-unsupervised/)
    *   [GitHub Repo](https://github.com/openai/finetune-transformer-lm)
    *   Test the full generation capabilities here: [https://transformer.huggingface.co/doc/gpt](https://transformer.huggingface.co/doc/gpt)

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='openai-gpt')
    >>> set_seed(42)
    >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
    
    [{'generated_text': "Hello, I'm a language model,'he said, when i was finished.'ah well,'said the man,'that's"},
     {'generated_text': 'Hello, I\'m a language model, " she said. \n she reached the bottom of the shaft and leaned a little further out. it was'},
     {'generated_text': 'Hello, I\'m a language model, " she laughed. " we call that a\'white girl.\'or as we are called by the'},
     {'generated_text': 'Hello, I\'m a language model, " said mr pin. " an\'the ones with the funny hats don\'t. " the rest of'},
     {'generated_text': 'Hello, I\'m a language model, was\'ere \'bout to do some more dancin \', " he said, then his voice lowered to'}]
    

Here is how to use this model in PyTorch:

    from transformers import OpenAIGPTTokenizer, OpenAIGPTModel
    import torch
    
    tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
    model = OpenAIGPTModel.from_pretrained("openai-gpt")
    
    inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
    outputs = model(**inputs)
    
    last_hidden_states = outputs.last_hidden_state
    

and in TensorFlow:

    from transformers import OpenAIGPTTokenizer, TFOpenAIGPTModel
    
    tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
    model = TFOpenAIGPTModel.from_pretrained("openai-gpt")
    
    inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
    outputs = model(inputs)
    
    last_hidden_states = outputs.last_hidden_state
    

[](#uses)Uses
-------------

#### [](#direct-use)Direct Use

This model can be used for language modeling tasks.

#### [](#downstream-use)Downstream Use

Potential downstream uses of this model include tasks that leverage language models. In the [associated paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf), the model developers discuss evaluations of the model for tasks including natural language inference (NLI), question answering, semantic similarity, and text classification.

#### [](#misuse-and-out-of-scope-use)Misuse and Out-of-scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

[](#risks-limitations-and-biases)Risks, Limitations and Biases
--------------------------------------------------------------

#### [](#biases)Biases

**CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by this model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='openai-gpt')
    >>> set_seed(42)
    >>> generator("The man worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The man worked as a teacher for the college he'},
     {'generated_text': 'The man worked as a janitor at the club.'},
     {'generated_text': 'The man worked as a bodyguard in america. the'},
     {'generated_text': 'The man worked as a clerk for one of the'},
     {'generated_text': 'The man worked as a nurse, but there was'}]
    
    >>> set_seed(42)
    >>> generator("The woman worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The woman worked as a medical intern but is a'},
     {'generated_text': 'The woman worked as a midwife, i know that'},
     {'generated_text': 'The woman worked as a prostitute in a sex club'},
     {'generated_text': 'The woman worked as a secretary for one of the'},
     {'generated_text': 'The woman worked as a nurse, but she had'}]
    

This bias may also affect fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

#### [](#risks-and-limitations)Risks and Limitations

The model developers also wrote in a [blog post](https://openai.com/blog/language-unsupervised/) about risks and limitations of the model, including:

> *   **Compute Requirements:** Many previous approaches to NLP tasks train relatively small models on a single GPU from scratch. Our approach requires an expensive pre-training step - 1 month on 8 GPUs. Luckily, this only has to be done once and were releasing our model so others can avoid it. It is also a large model (in comparison to prior work) and consequently uses more compute and memory  we used a 37-layer (12 block) Transformer architecture, and we train on sequences of up to 512 tokens. Most experiments were conducted on 4 and 8 GPU systems. The model does fine-tune to new tasks very quickly which helps mitigate the additional resource requirements.
> *   **The limits and bias of learning about the world through text:** Books and text readily available on the internet do not contain complete or even accurate information about the world. Recent work ([Lucy and Gauthier, 2017](https://arxiv.org/abs/1705.11168)) has shown that certain kinds of information are difficult to learn via just text and other work ([Gururangan et al., 2018](https://arxiv.org/abs/1803.02324)) has shown that models learn and exploit biases in data distributions.
> *   **Still brittle generalization:** Although our approach improves performance across a broad range of tasks, current deep learning NLP models still exhibit surprising and counterintuitive behavior - especially when evaluated in a systematic, adversarial, or out-of-distribution way. Our approach is not immune to these issues, though we have observed some indications of progress. Our approach shows improved lexical robustness over previous purely neural approaches to textual entailment. On the dataset introduced in Glockner et al. (2018) our model achieves 83.75%, performing similarly to KIM, which incorporates external knowledge via WordNet.

[](#training)Training
---------------------

#### [](#training-data)Training Data

The model developers [write](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf):

> We use the BooksCorpus dataset ([Zhu et al., 2015](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Zhu_Aligning_Books_and_ICCV_2015_paper.pdf)) for training the language model. It contains over 7,000 unique unpublished books from a variety of genres including Adventure, Fantasy, and Romance. Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition on long-range information.

#### [](#training-procedure)Training Procedure

The model developers [write](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf):

> Our model largely follows the original transformer work \[62\]. We trained a 12-layer decoder-only transformer with masked self-attention heads (768 dimensional states and 12 attention heads). For the position-wise feed-forward networks, we used 3072 dimensional inner states. We used the Adam optimization scheme \[27\] with a max learning rate of 2.5e-4. The learning rate was increased linearly from zero over the first 2000 updates and annealed to 0 using a cosine schedule. We train for 100 epochs on minibatches of 64 randomly sampled, contiguous sequences of 512 tokens. Since layernorm \[2\] is used extensively throughout the model, a simple weight initialization of N (0, 0.02) was sufficient. We used a bytepair encoding (BPE) vocabulary with 40,000 merges \[53\] and residual, embedding, and attention dropouts with a rate of 0.1 for regularization. We also employed a modified version of L2 regularization proposed in \[37\], with w = 0.01 on all non bias or gain weights. For the activation function, we used the Gaussian Error Linear Unit (GELU) \[18\]. We used learned position embeddings instead of the sinusoidal version proposed in the original work. We use the ftfy library2 to clean the raw text in BooksCorpus, standardize some punctuation and whitespace, and use the spaCy tokenizer.

See the paper for further details and links to citations.

[](#evaluation)Evaluation
-------------------------

The following evaluation information is extracted from the [associated blog post](https://openai.com/blog/language-unsupervised/). See the [associated paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) for further details.

#### [](#testing-data-factors-and-metrics)Testing Data, Factors and Metrics

The model developers report that the model was evaluated on the following tasks and datasets using the listed metrics:

*   **Task:** Textual Entailment
    
    *   **Datasets:** [SNLI](https://huggingface.co/datasets/snli), [MNLI Matched](https://huggingface.co/datasets/glue), [MNLI Mismatched](https://huggingface.co/datasets/glue), [SciTail](https://huggingface.co/datasets/scitail), [QNLI](https://huggingface.co/datasets/glue), [RTE](https://huggingface.co/datasets/glue)
    *   **Metrics:** Accuracy
*   **Task:** Semantic Similarity
    
    *   **Datasets:** [STS-B](https://huggingface.co/datasets/glue), [QQP](https://huggingface.co/datasets/glue), [MRPC](https://huggingface.co/datasets/glue)
    *   **Metrics:** Accuracy
*   **Task:** Reading Comprehension
    
    *   **Datasets:** [RACE](https://huggingface.co/datasets/race)
    *   **Metrics:** Accuracy
*   **Task:** Commonsense Reasoning
    
    *   **Datasets:** [ROCStories](https://huggingface.co/datasets/story_cloze), [COPA](https://huggingface.co/datasets/xcopa)
    *   **Metrics:** Accuracy
*   **Task:** Sentiment Analysis
    
    *   **Datasets:** [SST-2](https://huggingface.co/datasets/glue)
    *   **Metrics:** Accuracy
*   **Task:** Linguistic Acceptability
    
    *   **Datasets:** [CoLA](https://huggingface.co/datasets/glue)
    *   **Metrics:** Accuracy
*   **Task:** Multi Task Benchmark
    
    *   **Datasets:** [GLUE](https://huggingface.co/datasets/glue)
    *   **Metrics:** Accuracy

#### [](#results)Results

The model achieves the following results without any fine-tuning (zero-shot):

Task

TE

TE

TE

TE

TE

TE

SS

SS

SS

RC

CR

CR

SA

LA

MTB

Dataset

SNLI

MNLI Matched

MNLI Mismatched

SciTail

QNLI

RTE

STS-B

QQP

MPRC

RACE

ROCStories

COPA

SST-2

CoLA

GLUE

89.9

82.1

81.4

88.3

88.1

56.0

82.0

70.3

82.3

59.0

86.5

78.6

91.3

45.4

72.8

[](#environmental-impact)Environmental Impact
---------------------------------------------

The model developers [report that](https://openai.com/blog/language-unsupervised/):

> The total compute used to train this model was 0.96 petaflop days (pfs-days).

> 8 P600 GPU's \* 30 days \* 12 TFLOPS/GPU \* 0.33 utilization = .96 pfs-days

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** 8 P600 GPUs
*   **Hours used:** 720 hours (30 days)
*   **Cloud Provider:** Unknown
*   **Compute Region:** Unknown
*   **Carbon Emitted:** Unknown

[](#technical-specifications)Technical Specifications
-----------------------------------------------------

See the [associated paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) for details on the modeling architecture, objective, compute infrastructure, and training details.

[](#citation-information)Citation Information
---------------------------------------------

    @article{radford2018improving,
      title={Improving language understanding by generative pre-training},
      author={Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya and others},
      year={2018},
      publisher={OpenAI}
    }
    

APA: _Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training._

[](#model-card-authors)Model Card Authors
-----------------------------------------

This model card was written by the Hugging Face team.

## Model Overview

`openai-gpt` is the first transformer-based language model created and released by OpenAI. The model is a causal (unidirectional) transformer pre-trained using language modeling on a large corpus with long range dependencies. It was developed by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever, as described in the [associated research paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf). The model is related to other GPT models like [GPT2](https://huggingface.co/gpt2), [GPT2-Medium](https://huggingface.co/gpt2-medium), [GPT2-Large](https://huggingface.co/gpt2-large), and [GPT2-XL](https://huggingface.co/gpt2-xl).

## Model Inputs and Outputs

The `openai-gpt` model is a text-to-text model, taking text as input and generating text as output. It can be used for a variety of language generation tasks, such as open-ended text generation, summarization, and question answering.

### Inputs
- Text prompts or passages to be used as input for the model

### Outputs
- Generated text in response to the input, such as completions, summaries, or answers to questions

## Capabilities

The `openai-gpt` model can be used to generate human-like text on a wide range of topics. It has been shown to perform well on tasks like language modeling, question answering, and text summarization. However, as with many large language models, it can also exhibit biases and generate content that is factually incorrect or harmful.

## What Can I Use It For?

The `openai-gpt` model is well-suited for applications that involve generating text, such as content creation, dialogue systems, and creative writing. Researchers and developers may find it useful for exploring the capabilities and limitations of transformer-based language models. However, it's important to be aware of the potential risks and to use the model responsibly.

## Things to Try

One interesting thing to try with `openai-gpt` is to experiment with different prompting techniques, such as using specific templates or incorporating instructions to the model. This can help you understand how the model responds to different input formats and how to get the most useful outputs for your specific use case. Additionally, you can try fine-tuning the model on domain-specific data to see how it performs on more specialized tasks.

[](#gpt-2-medium)GPT-2 Medium
=============================

[](#model-details)Model Details
-------------------------------

**Model Description:** GPT-2 Medium is the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

*   **Developed by:** OpenAI, see [associated research paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GitHub repo](https://github.com/openai/gpt-2) for model developers.
*   **Model Type:** Transformer-based language model
*   **Language(s):** English
*   **License:** [Modified MIT License](https://github.com/openai/gpt-2/blob/master/LICENSE)
*   **Related Models:** [GPT2](https://huggingface.co/gpt2), [GPT2-Large](https://huggingface.co/gpt2-large) and [GPT2-XL](https://huggingface.co/gpt2-xl)
*   **Resources for more information:**
    *   [Research Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
    *   [OpenAI Blog Post](https://openai.com/blog/better-language-models/)
    *   [GitHub Repo](https://github.com/openai/gpt-2)
    *   [OpenAI Model Card for GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md)
    *   Test the full generation capabilities here: [https://transformer.huggingface.co/doc/gpt2-large](https://transformer.huggingface.co/doc/gpt2-large)

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2-medium')
    >>> set_seed(42)
    >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
    
    [{'generated_text': "Hello, I'm a language model, I'm a language. I'm a compiler, I'm a parser, I'm a server process. I"},
     {'generated_text': "Hello, I'm a language model, and I'd like to join an existing team. What can I do to get started?\n\nI'd"},
     {'generated_text': "Hello, I'm a language model, why does my code get created? Can't I just copy it? But why did my code get created when"},
     {'generated_text': "Hello, I'm a language model, a functional language...\n\nI'm a functional language. Is it hard? A little, yes. But"},
     {'generated_text': "Hello, I'm a language model, not an object model.\n\nIn a nutshell, I need to give me objects from which I can get"}]
    

Here is how to use this model to get the features of a given text in PyTorch:

    from transformers import GPT2Tokenizer, GPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
    model = GPT2Model.from_pretrained('gpt2-medium')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    

and in TensorFlow:

    from transformers import GPT2Tokenizer, TFGPT2Model
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
    model = TFGPT2Model.from_pretrained('gpt2-medium')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    output = model(encoded_input)
    

[](#uses)Uses
-------------

#### [](#direct-use)Direct Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> The primary intended users of these models are AI researchers and practitioners.
> 
> We primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.

#### [](#downstream-use)Downstream Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Here are some secondary use cases we believe are likely:
> 
> *   Writing assistance: Grammar assistance, autocompletion (for normal prose or code)
> *   Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.
> *   Entertainment: Creation of games, chat bots, and amusing generations.

#### [](#misuse-and-out-of-scope-use)Misuse and Out-of-scope Use

In their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote:

> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we dont support use-cases that require the generated text to be true.
> 
> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.

[](#risks-limitations-and-biases)Risks, Limitations and Biases
--------------------------------------------------------------

**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

    >>> from transformers import pipeline, set_seed
    >>> generator = pipeline('text-generation', model='gpt2-medium')
    >>> set_seed(42)
    >>> generator("The man worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The man worked as a security guard in a military'},
     {'generated_text': 'The man worked as a salesman in Mexico and eventually'},
     {'generated_text': 'The man worked as a supervisor at the department for'},
     {'generated_text': 'The man worked as a cleaner for the same corporation'},
     {'generated_text': 'The man worked as a barman and was involved'}]
    
    >>> set_seed(42)
    >>> generator("The woman worked as a", max_length=10, num_return_sequences=5)
    
    [{'generated_text': 'The woman worked as a social worker in a children'},
     {'generated_text': 'The woman worked as a marketing manager, and her'},
     {'generated_text': 'The woman worked as a customer service agent in a'},
     {'generated_text': 'The woman worked as a cleaner for the same corporation'},
     {'generated_text': 'The woman worked as a barista and was involved'}]
    

This bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

[](#training)Training
---------------------

#### [](#training-data)Training Data

The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText [here](https://github.com/openai/gpt-2/blob/master/domains.txt).

#### [](#training-procedure)Training Procedure

The model is pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.

More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks.

The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.

[](#evaluation)Evaluation
-------------------------

The following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

#### [](#testing-data-factors-and-metrics)Testing Data, Factors and Metrics

The model authors write in the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) that:

> Since our model operates on a byte level and does not require lossy pre-processing or tokenization, we can evaluate it on any language model benchmark. Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word. We evaluate the same quantity by computing the log-probability of a dataset according to a WebText LM and dividing by the number of canonical units. For many of these datasets, WebText LMs would be tested significantly out- of-distribution, having to predict aggressively standardized text, tokenization artifacts such as disconnected punctuation and contractions, shuffled sentences, and even the string which is extremely rare in WebText - occurring only 26 times in 40 billion bytes. We report our main results...using invertible de-tokenizers which remove as many of these tokenization / pre-processing artifacts as possible. Since these de-tokenizers are invertible, we can still calculate the log probability of a dataset and they can be thought of as a simple form of domain adaptation.

#### [](#results)Results

The model achieves the following results without any fine-tuning (zero-shot):

Dataset

LAMBADA

LAMBADA

CBT-CN

CBT-NE

WikiText2

PTB

enwiki8

text8

WikiText103

1BW

(metric)

(PPL)

(ACC)

(ACC)

(ACC)

(PPL)

(PPL)

(BPB)

(BPC)

(PPL)

(PPL)

15.60

55.48

92.35

87.1

22.76

47.33

1.01

1.06

26.37

55.72

[](#environmental-impact)Environmental Impact
---------------------------------------------

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Unknown
*   **Hours used:** Unknown
*   **Cloud Provider:** Unknown
*   **Compute Region:** Unknown
*   **Carbon Emitted:** Unknown

[](#technical-specifications)Technical Specifications
-----------------------------------------------------

See the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) for details on the modeling architecture, objective, compute infrastructure, and training details.

[](#citation-information)Citation Information
---------------------------------------------

    @article{radford2019language,
      title={Language models are unsupervised multitask learners},
      author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
      journal={OpenAI blog},
      volume={1},
      number={8},
      pages={9},
      year={2019}
    }
    

[](#model-card-authors)Model Card Authors
-----------------------------------------

This model card was written by the Hugging Face team.

## Model overview

The `gpt2-medium` model is a 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective. It was developed by the OpenAI team, as detailed in the [associated research paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GitHub repo](https://github.com/openai/gpt-2). The model is a medium-sized version of the GPT-2 family, with the [GPT2](https://huggingface.co/gpt2), [GPT2-Large](https://huggingface.co/gpt2-large) and [GPT2-XL](https://huggingface.co/gpt2-xl) models being larger in size.

## Model inputs and outputs

### Inputs
- Text prompts of up to 1024 tokens

### Outputs
- Continued text generation based on the provided prompt

## Capabilities

The `gpt2-medium` model can be used to generate human-like text continuations based on the given prompt. It exhibits strong language understanding and generation capabilities, allowing it to be used for a variety of natural language tasks such as writing assistance, creative writing, and chatbot applications.

## What can I use it for?

The `gpt2-medium` model can be used for a variety of text generation tasks, such as:

- **Writing Assistance**: The model can be used to provide autocompletion and grammar assistance for normal prose or code.
- **Creative Writing**: The model can be used to explore the generation of creative, fictional texts and aid in the creation of poetry and other literary works.
- **Entertainment**: The model can be used to create games, chatbots, and generate amusing text.

However, users should be aware of the model's limitations and biases, as detailed in the [OpenAI model card](https://github.com/openai/gpt-2/blob/master/model_card.md). The model does not distinguish fact from fiction and reflects the biases present in its training data, so it should be used with caution, especially in applications that interact with humans.

## Things to try

One interesting aspect of the `gpt2-medium` model is its ability to capture long-range dependencies in text, allowing it to generate coherent and contextually-relevant continuations. Try providing the model with a prompt that sets up an interesting scenario or narrative, and see how it develops the story in creative and unexpected ways. You can also experiment with adjusting the generation parameters, such as temperature and top-k/top-p sampling, to explore different styles of text generation.

[](#roberta-base-openai-detector)RoBERTa Base OpenAI Detector
=============================================================

[](#table-of-contents)Table of Contents
---------------------------------------

*   [Model Details](#model-details)
*   [Uses](#uses)
*   [Risks, Limitations and Biases](#risks-limitations-and-biases)
*   [Training](#training)
*   [Evaluation](#evaluation)
*   [Environmental Impact](#environmental-impact)
*   [Technical Specifications](#technical-specifications)
*   [Citation Information](#citation-information)
*   [Model Card Authors](#model-card-author)
*   [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
-------------------------------

**Model Description:** RoBERTa base OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa base model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as OpenAI released the weights of the [largest GPT-2 model](https://huggingface.co/gpt2-xl), the 1.5B parameter version.

*   **Developed by:** OpenAI, see [GitHub Repo](https://github.com/openai/gpt-2-output-dataset/tree/master/detector) and [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for full author list
*   **Model Type:** Fine-tuned transformer-based language model
*   **Language(s):** English
*   **License:** MIT
*   **Related Models:** [RoBERTa base](https://huggingface.co/roberta-base), [GPT-XL (1.5B parameter version)](https://huggingface.co/gpt2-xl), [GPT-Large (the 774M parameter version)](https://huggingface.co/gpt2-large), [GPT-Medium (the 355M parameter version)](https://huggingface.co/gpt2-medium) and [GPT-2 (the 124M parameter version)](https://huggingface.co/gpt2)
*   **Resources for more information:**
    *   [Research Paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) (see, in particular, the section beginning on page 12 about Automated ML-based detection).
    *   [GitHub Repo](https://github.com/openai/gpt-2-output-dataset/tree/master/detector)
    *   [OpenAI Blog Post](https://openai.com/blog/gpt-2-1-5b-release/)
    *   [Explore the detector model here](https://huggingface.co/openai-detector)

[](#uses)Uses
-------------

#### [](#direct-use)Direct Use

The model is a classifier that can be used to detect text generated by GPT-2 models. However, it is strongly suggested not to use it as a ChatGPT detector for the purposes of making grave allegations of academic misconduct against undergraduates and others, as this model might give inaccurate results in the case of ChatGPT-generated input.

#### [](#downstream-use)Downstream Use

The model's developers have stated that they developed and released the model to help with research related to synthetic text generation, so the model could potentially be used for downstream tasks related to synthetic text generation. See the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further discussion.

#### [](#misuse-and-out-of-scope-use)Misuse and Out-of-scope Use

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model developers discuss the risk of adversaries using the model to better evade detection in their [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), suggesting that using the model for evading detection or for supporting efforts to evade detection would be a misuse of the model.

[](#risks-limitations-and-biases)Risks, Limitations and Biases
--------------------------------------------------------------

**CONTENT WARNING: Readers should be aware this section may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.**

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

#### [](#risks-and-limitations)Risks and Limitations

In their [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), the model developers discuss the risk that the model may be used by bad actors to develop capabilities for evading detection, though one purpose of releasing the model is to help improve detection research.

In a related [blog post](https://openai.com/blog/gpt-2-1-5b-release/), the model developers also discuss the limitations of automated methods for detecting synthetic text and the need to pair automated detection tools with other, non-automated approaches. They write:

> We conducted in-house detection research and developed a detection model that has detection rates of ~95% for detecting 1.5B GPT-2-generated text. We believe this is not high enough accuracy for standalone detection and needs to be paired with metadata-based approaches, human judgment, and public education to be more effective.

The model developers also [report](https://openai.com/blog/gpt-2-1-5b-release/) finding that classifying content from larger models is more difficult, suggesting that detection with automated tools like this model will be increasingly difficult as model sizes increase. The authors find that training detector models on the outputs of larger models can improve accuracy and robustness.

#### [](#bias)Bias

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by RoBERTa base and GPT-2 1.5B (which this model is built/fine-tuned on) can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups (see the [RoBERTa base](https://huggingface.co/roberta-base) and [GPT-2 XL](https://huggingface.co/gpt2-xl) model cards for more information). The developers of this model discuss these issues further in their [paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf).

[](#training)Training
---------------------

#### [](#training-data)Training Data

The model is a sequence classifier based on RoBERTa base (see the [RoBERTa base model card](https://huggingface.co/roberta-base) for more details on the RoBERTa base training data) and then fine-tuned using the outputs of the 1.5B GPT-2 model (available [here](https://github.com/openai/gpt-2-output-dataset)).

#### [](#training-procedure)Training Procedure

The model developers write that:

> We based a sequence classifier on RoBERTaBASE (125 million parameters) and fine-tuned it to classify the outputs from the 1.5B GPT-2 model versus WebText, the dataset we used to train the GPT-2 model.

They later state:

> To develop a robust detector model that can accurately classify generated texts regardless of the sampling method, we performed an analysis of the models transfer performance.

See the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further details on the training procedure.

[](#evaluation)Evaluation
-------------------------

The following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf).

#### [](#testing-data-factors-and-metrics)Testing Data, Factors and Metrics

The model is intended to be used for detecting text generated by GPT-2 models, so the model developers test the model on text datasets, measuring accuracy by:

> testing 510-token test examples comprised of 5,000 samples from the WebText dataset and 5,000 samples generated by a GPT-2 model, which were not used during the training.

#### [](#results)Results

The model developers [find](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf):

> Our classifier is able to detect 1.5 billion parameter GPT-2-generated text with approximately 95% accuracy...The models accuracy depends on sampling methods used when generating outputs, like temperature, Top-K, and nucleus sampling ([Holtzman et al., 2019](https://arxiv.org/abs/1904.09751). Nucleus sampling outputs proved most difficult to correctly classify, but a detector trained using nucleus sampling transfers well across other sampling methods. As seen in Figure 1 \[in the paper\], we found consistently high accuracy when trained on nucleus sampling.

See the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), Figure 1 (on page 14) and Figure 2 (on page 16) for full results.

[](#environmental-impact)Environmental Impact
---------------------------------------------

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Unknown
*   **Hours used:** Unknown
*   **Cloud Provider:** Unknown
*   **Compute Region:** Unknown
*   **Carbon Emitted:** Unknown

[](#technical-specifications)Technical Specifications
-----------------------------------------------------

The model developers write that:

See the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further details on the modeling architecture and training details.

[](#citation-information)Citation Information
---------------------------------------------

    @article{solaiman2019release,
      title={Release strategies and the social impacts of language models},
      author={Solaiman, Irene and Brundage, Miles and Clark, Jack and Askell, Amanda and Herbert-Voss, Ariel and Wu, Jeff and Radford, Alec and Krueger, Gretchen and Kim, Jong Wook and Kreps, Sarah and others},
      journal={arXiv preprint arXiv:1908.09203},
      year={2019}
    }
    

APA:

*   Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., ... & Wang, J. (2019). Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.

[https://huggingface.co/papers/1908.09203](https://huggingface.co/papers/1908.09203)

[](#model-card-authors)Model Card Authors
-----------------------------------------

This model card was written by the team at Hugging Face.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

This model can be instantiated and run with a Transformers pipeline:

    from transformers import pipeline
    pipe = pipeline("text-classification", model="roberta-base-openai-detector")
    print(pipe("Hello world! Is this content AI-generated?"))  # [{'label': 'Real', 'score': 0.8036582469940186}]

## Model overview

The `roberta-base-openai-detector` model is a fine-tuned RoBERTa base model that can be used to predict if a given text was generated by a GPT-2 model. It was developed by OpenAI and released at the same time as the 1.5B parameter version of the GPT-2 model. The model is related to other GPT-2 and RoBERTa models, such as the [GPT-XL (1.5B parameter version)](https://huggingface.co/gpt2-xl), [GPT-Large (the 774M parameter version)](https://huggingface.co/gpt2-large), [GPT-Medium (the 355M parameter version)](https://huggingface.co/gpt2-medium), [GPT-2 (the 124M parameter version)](https://huggingface.co/gpt2), and the [RoBERTa base](https://huggingface.co/roberta-base) model.

## Model inputs and outputs

### Inputs
- **Text**: The model takes in a piece of text as input.

### Outputs
- **Prediction**: The model outputs a prediction of whether the input text was generated by a GPT-2 model or not.

## Capabilities

The `roberta-base-openai-detector` model can be used to detect if a given text was generated by a GPT-2 model. This can be useful for identifying potential cases of AI-generated text, such as for content moderation or plagiarism detection. However, the model's developers strongly caution against using it as a "ChatGPT detector" to make allegations of academic misconduct, as it may not be accurate in that context.

## What can I use it for?

The primary intended use of the `roberta-base-openai-detector` model is for researchers and practitioners to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models like GPT-2. The model's developers have stated that they released it to help with research into the detection of AI-generated text.

## Things to try

One interesting thing to try with the `roberta-base-openai-detector` model is to compare its predictions on text samples that were and were not generated by a GPT-2 model. This could help you develop a better understanding of the model's capabilities and limitations in detecting AI-generated text. You could also experiment with fine-tuning the model on your own datasets to see if it can be adapted for specific use cases.