Huggingface

Rank:

Average Model Cost: $0.0000

Number of Runs: 155,502,136

Models by this creator

bert-base-uncased

bert-base-uncased

huggingface

BERT-base-uncased is a pretrained language model that has been trained on a large corpus of English text. It uses a masked language modeling objective to learn an inner representation of the English language. This pretrained model can be used for tasks such as sequence classification, token classification, and question answering. It has been trained on the BookCorpus dataset and English Wikipedia using a vocabulary size of 30,000. The model has been trained on 4 cloud TPUs with a batch size of 256 and uses the Adam optimizer. When fine-tuned on downstream tasks, BERT-base-uncased achieves good performance on tasks like sentiment analysis and text classification. However, it is important to note that the model may have biased predictions, and this bias can also affect all fine-tuned versions of the model.

Read more

$-/run

49.6M

Huggingface

xlm-roberta-large

xlm-roberta-large

XLM-RoBERTa is a multilingual version of the RoBERTa model, which is pre-trained on a large corpus of text from 100 languages. It is trained using a masked language modeling (MLM) objective, where 15% of the words in a sentence are randomly masked and the model has to predict the masked words. This allows the model to learn a bidirectional representation of the sentence. The model can be used to extract features for downstream tasks such as classification or question answering. It can also be fine-tuned for specific tasks.

Read more

$-/run

23.7M

Huggingface

gpt2

gpt2

GPT-2 is a transformers model pretrained on a large corpus of English data using a self-supervised learning approach. It was trained to predict the next word in a sentence. The model uses a mask mechanism to ensure predictions only rely on past tokens. GPT-2 can be used for text generation or fine-tuned for downstream tasks. The training data consists of unfiltered internet content and may introduce bias in its predictions. The model was trained on a dataset called WebText, which includes web pages from Reddit links. The texts are tokenized using a version of Byte Pair Encoding (BPE) with a vocabulary size of 50,257. The model achieves impressive results without fine-tuning. However, the training duration and exact details were not disclosed.

Read more

$-/run

17.8M

Huggingface

xlm-roberta-base

xlm-roberta-base

XLM-RoBERTa is a multilingual version of the RoBERTa model that is pre-trained on a large amount of CommonCrawl data containing 100 languages. It uses the Masked language modeling (MLM) objective to randomly mask words in a sentence and predict the masked words. The model learns a bidirectional representation of the sentence, which can be used for downstream tasks such as classification and token labeling. It is primarily intended to be fine-tuned on specific tasks and can be used with a pipeline for masked language modeling.

Read more

$-/run

16.3M

Huggingface

roberta-base

roberta-base

RoBERTa is a transformer model pretrained on a large corpus of English data in a self-supervised manner. It uses a masked language modeling (MLM) objective, where it randomly masks 15% of the words in an input sentence and predicts the masked words. The model learns a bidirectional representation of the sentence, which can be used for downstream tasks such as sequence classification, token classification, and question answering. The model was trained on multiple datasets totaling 160GB of text and uses a byte version of Byte-Pair Encoding (BPE) with a vocabulary size of 50,000. It was trained on 1024 V100 GPUs for 500K steps with specific optimizer settings. When fine-tuned on downstream tasks, it has achieved good results on the GLUE benchmark. However, it is important to note that the training data contains biased content from the internet, which can lead to biased predictions.

Read more

$-/run

11.4M

Huggingface

distilbert-base-uncased

distilbert-base-uncased

The DistilBERT model is a smaller and faster version of the BERT model. It was pretrained using a similar process as BERT, but with a distilled teacher model. It has been pretrained on raw texts and can be fine-tuned for various downstream tasks. The model is primarily designed for tasks that use whole sentences, such as sequence classification or question answering. It has been trained on a dataset called BookCorpus, which consists of unpublished books and English Wikipedia articles. The model has the ability to perform masked language modeling and has been trained to predict missing words in sentences. It is important to note that the model may have biased predictions, as it inherits biases from its teacher model.

Read more

$-/run

11.3M

Huggingface

distilbert-base-multilingual-cased

distilbert-base-multilingual-cased

distilbert-base-multilingual-cased is a distilled version of the BERT base multilingual model. It is trained on Wikipedia in 104 different languages and has 6 layers, 768 dimensions, and 12 heads. It has 134M parameters and is approximately twice as fast as mBERT-base. The model is primarily intended for fine-tuning on downstream tasks such as sequence classification, token classification, or question answering. It should not be used to intentionally create hostile or alienating environments, and users should be aware of the risks, biases, and limitations of the model. The developers report accuracy results for DistilmBERT, and it can be used for masked language modeling using a pipeline.

Read more

$-/run

10.6M

Huggingface

bert-base-cased

bert-base-cased

bert-base-cased is a pre-trained model based on the BERT architecture, which stands for Bidirectional Encoder Representations from Transformers. It is trained on cased English text and can be used to solve various natural language processing tasks such as text classification, named entity recognition, and text summarization. The model is capable of filling in masked tokens in a given sentence and providing the most probable predictions for those tokens.

Read more

$-/run

5.9M

Huggingface

roberta-large

roberta-large

RoBERTa is a pretrained transformers model that has been trained on a large corpus of English data in a self-supervised manner. It uses a masked language modeling (MLM) objective, where it randomly masks words in a sentence and then predicts the masked words. This allows the model to learn a bidirectional representation of the sentence. The model can be fine-tuned for downstream tasks such as sequence classification, token classification, or question answering. The training data used for this model contains unfiltered content from the internet, which may introduce bias into the predictions. The model achieves good performance on various downstream tasks according to the Glue test results.

Read more

$-/run

4.4M

Huggingface

Similar creators