Model-attribution-challenge

Rank:

Average Model Cost: $0.0000

Number of Runs: 14,769

Models by this creator

bloom-350m

bloom-350m

model-attribution-challenge

No description available.

Read more

$-/run

14.1K

Huggingface

gpt2-xl

gpt2-xl

GPT-2 XL Table of Contents Model Details How To Get Started With the Model Uses Risks, Limitations and Biases Training Evaluation Environmental Impact Technical Specifications Citation Information Model Card Authors Model Details Model Description: GPT-2 XL is the 1.5B parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective. Developed by: OpenAI, see associated research paper and GitHub repo for model developers. Model Type: Transformer-based language model Language(s): English License: Modified MIT License Related Models: GPT-2, GPT-Medium and GPT-Large Resources for more information: Research Paper OpenAI Blog Post GitHub Repo OpenAI Model Card for GPT-2 OpenAI GPT-2 1.5B Release Blog Post Test the full generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large How to Get Started with the Model Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: and in TensorFlow: Uses In their model card about GPT-2, OpenAI wrote: In their model card about GPT-2, OpenAI wrote: In their model card about GPT-2, OpenAI wrote: Risks, Limitations and Biases CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes. Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example: This bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. When they released the 1.5B parameter model, OpenAI wrote in a blog post: The blog post further discusses the risks, limitations, and biases of the model. Training The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText here. The model is pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences. More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token i only uses the inputs from 1 to i but not the future tokens. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens. Evaluation The following evaluation information is extracted from the associated paper. The model authors write in the associated paper that: The model achieves the following results without any fine-tuning (zero-shot): Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware type and hours used are based on information provided by one of the model authors on Reddit. Hardware Type: 32 TPUv3 chips Hours used: 168 Cloud Provider: Unknown Compute Region: Unknown Carbon Emitted: Unknown Technical Specifications See the associated paper for details on the modeling architecture, objective, and training details. Citation Information Model Card Authors This model card was written by the Hugging Face team.

Read more

$-/run

291

Huggingface

DialoGPT-large

DialoGPT-large

A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT) DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread. Multi-turn generation examples from an interactive environment: Please find the information about preprocessing, training and full details of the DialoGPT in the original DialoGPT repository ArXiv paper: https://arxiv.org/abs/1911.00536 How to use Now we are ready to try out how the model works as a chatting partner!

Read more

$-/run

56

Huggingface

codegen-350M-multi

codegen-350M-multi

CodeGen (CodeGen-Multi 350M) Model description CodeGen is a family of autoregressive language models for program synthesis from the paper: A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. The models are originally released in this repository, under 3 pre-training data variants (NL, Multi, Mono) and 4 model size variants (350M, 2B, 6B, 16B). The checkpoint included in this repository is denoted as CodeGen-Multi 350M in the paper, where "Multi" means the model is initialized with CodeGen-NL 350M and further pre-trained on a dataset of multiple programming languages, and "350M" refers to the number of trainable parameters. Training data This checkpoint (CodeGen-Multi 350M) was firstly initialized with CodeGen-NL 350M, and then pre-trained on BigQuery, a large-scale dataset of multiple programming languages from GitHub repositories. The data consists of 119.2B tokens and includes C, C++, Go, Java, JavaScript, and Python. Training procedure CodeGen was trained using cross-entropy loss to maximize the likelihood of sequential inputs. The family of models are trained using multiple TPU-v4-512 by Google, leveraging data and model parallelism. See Section 2.3 of the paper for more details. Evaluation results We evaluate our models on two code generation benchmark: HumanEval and MTPB. Please refer to the paper for more details. Intended Use and Limitations As an autoregressive language model, CodeGen is capable of extracting features from given natural language and programming language texts, and calculating the likelihood of them. However, the model is intended for and best at program synthesis, that is, generating executable code given English prompts, where the prompts should be in the form of a comment string. The model can complete partially-generated code as well. How to use This model can be easily loaded using the AutoModelForCausalLM functionality: BibTeX entry and citation info

Read more

$-/run

50

Huggingface

distilgpt2

distilgpt2

DistilGPT2 DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the smallest version of Generative Pre-trained Transformer 2 (GPT-2). Like GPT-2, DistilGPT2 can be used to generate text. Users of this model card should also consider information about the design, training, and limitations of GPT-2. Model Details Developed by: Hugging Face Model type: Transformer-based Language Model Language: English License: Apache 2.0 Model Description: DistilGPT2 is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, lighter version of GPT-2. Resources for more information: See this repository for more about Distil* (a class of compressed models including Distilled-GPT2), Sanh et al. (2019) for more information about knowledge distillation and the training procedure, and this page for more about GPT-2. Uses, Limitations and Risks Since DistilGPT2 is a distilled version of GPT-2, it is intended to be used for similar use cases with the increased functionality of being smaller and easier to run than the base model. The developers of GPT-2 state in their model card that they envisioned GPT-2 would be used by researchers to better understand large-scale generative language models, with possible secondary use cases including: Using DistilGPT2, the Hugging Face team built the Write With Transformers web app, which allows users to play with the model to generate text directly from their browser. OpenAI states in the GPT-2 model card: How to Get Started with the Model Training Data DistilGPT2 was trained using OpenWebTextCorpus, an open-source reproduction of OpenAI’s WebText dataset, which was used to train GPT-2. See the OpenWebTextCorpus Dataset Card for additional information about OpenWebTextCorpus and Radford et al. (2019) for additional information about WebText. Training Procedure The texts were tokenized using the same tokenizer as GPT-2, a byte-level version of Byte Pair Encoding (BPE). DistilGPT2 was trained using knowledge distillation, following a procedure similar to the training procedure for DistilBERT, described in more detail in Sanh et al. (2019). Evaluation Results The creators of DistilGPT2 report that, on the WikiText-103 benchmark, GPT-2 reaches a perplexity on the test set of 16.3 compared to 21.1 for DistilGPT2 (after fine-tuning on the train set). Environmental Impact Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. Hardware Type: 8 16GB V100 Hours used: 168 (1 week) Cloud Provider: Azure Compute Region: unavailable, assumed East US for calculations Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): 149.2 kg eq. CO2 Citation Glossary <a name="knowledge-distillation">**Knowledge Distillation**</a>: As described in [Sanh et al. (2019)](https://arxiv.org/pdf/1910.01108.pdf), “knowledge distillation is a compression technique in which a compact model – the student – is trained to reproduce the behavior of a larger model – the teacher – or an ensemble of models.” Also see [Bucila et al. (2006)](https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf) and [Hinton et al. (2015)](https://arxiv.org/abs/1503.02531).

Read more

$-/run

50

Huggingface

Similar creators