Olm
Rank:Average Model Cost: $0.0000
Number of Runs: 5,801
Models by this creator
olm-roberta-base-dec-2022
$-/run
5.1K
Huggingface
olm-roberta-base-latest
olm-roberta-base-latest
OLM RoBERTa/BERT December 2022 This is a more up-to-date version of the original BERT and original RoBERTa. In addition to being more up-to-date, it also tends to perform better than the original BERT on standard benchmarks. We think it is fair to directly compare our model to the original BERT because our model was trained with about the same level of compute as the original BERT, and the architecture of BERT and RoBERTa are basically the same. The original RoBERTa takes an order of magnitude more compute, although our model is also not that different in performance from the original RoBERTa on many standard benchmarks. Our model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia. This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts. This is important because we want our models to know about events like COVID or a presidential election right after they happen. Intended uses You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task, such as sequence classification, token classification or question answering. How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: Dataset The model and tokenizer were trained with this December 2022 cleaned Common Crawl dataset plus this December 2022 cleaned Wikipedia dataset.The tokenized version of these concatenated datasets is here.The datasets were created with this repo. Training The model was trained according to the OLM BERT/RoBERTa instructions at this repo. Evaluation results The model achieves the following results after tuning on GLUE tasks: For both the original BERT and our model, we used the Hugging Face run_glue.py script here. For both models, we used the default fine-tuning hyperparameters and we averaged the results over five training seeds. These are the results for the GLUE dev sets, which can be a bit different than the results for the test sets.
$-/run
455
Huggingface
olm-gpt2-oct-2022
olm-gpt2-oct-2022
OLM GPT-2 October 2022 This is a more up-to-date version of the original GPT-2. In addition to being more up-to-date, it also tends to perform better than the original GPT2 on standard benchmarks. It was trained on a cleaned October 2022 snapshot of Common Crawl and Wikipedia. This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts. This is important because we want our models to know about events like COVID or a presidential election right after they happen. Intended uses You can use the raw model for text generation or fine-tune it to a downstream task. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: Dataset The model and tokenizer were trained with this October 2022 cleaned Common Crawl dataset plus this October 2022 cleaned Wikipedia dataset.The tokenized version of these concatenated datasets is here.The datasets were created with this repo. Training The model was trained according to the OLM GPT2 instructions at this repo. Evaluation results The model achieves the following results without any fine-tuning (zero-shot): To get these results, we used commit 4f0410a4be0049729078376ce36a42dc308b6e38 of the Eleuther AI evaluation harness here, which can produce results different than those reported in the GPT2 paper. We added a change here to enable evaluation of the OLM GPT2, which has a very slightly different vocab size. The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
$-/run
124
Huggingface
olm-gpt2-dec-2022
olm-gpt2-dec-2022
OLM GPT-2 December 2022 This is a more up-to-date version of the original GPT-2. In addition to being more up-to-date, it also tends to perform better than the original GPT2 on standard benchmarks. It was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia. This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts. This is important because we want our models to know about events like COVID or a presidential election right after they happen. Intended uses You can use the raw model for text generation or fine-tune it to a downstream task. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: Dataset The model and tokenizer were trained with this December 2022 cleaned Common Crawl dataset plus this December 2022 cleaned Wikipedia dataset.The tokenized version of these concatenated datasets is here.The datasets were created with this repo. Training The model was trained according to the OLM GPT2 instructions at this repo. Evaluation results The model achieves the following results without any fine-tuning (zero-shot): To get these results, we used commit f079e322b857714fcef1ada9e78ddc606fe51e84 of the Eleuther AI evaluation harness here, which can produce results different than those reported in the GPT2 paper. We added a change here to enable evaluation of the OLM GPT2, which has a very slightly different vocab size. The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
$-/run
60
Huggingface
olm-gpt2-latest
olm-gpt2-latest
OLM GPT-2 December 2022 This is a more up-to-date version of the original GPT-2. In addition to being more up-to-date, it also tends to perform better than the original GPT2 on standard benchmarks. It was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia. This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts. This is important because we want our models to know about events like COVID or a presidential election right after they happen. Intended uses You can use the raw model for text generation or fine-tune it to a downstream task. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: Dataset The model and tokenizer were trained with this December 2022 cleaned Common Crawl dataset plus this December 2022 cleaned Wikipedia dataset.The tokenized version of these concatenated datasets is here.The datasets were created with this repo. Training The model was trained according to the OLM GPT2 instructions at this repo. Evaluation results The model achieves the following results without any fine-tuning (zero-shot): To get these results, we used commit f079e322b857714fcef1ada9e78ddc606fe51e84 of the Eleuther AI evaluation harness here, which can produce results different than those reported in the GPT2 paper. We added a change here to enable evaluation of the OLM GPT2, which has a very slightly different vocab size. The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
$-/run
35
Huggingface
olm-roberta-base-oct-2022
olm-roberta-base-oct-2022
OLM RoBERTa/BERT October 2022 This is a more up-to-date version of the original BERT and original RoBERTa. In addition to being more up-to-date, it also tends to perform better than the original BERT on standard benchmarks. We think it is fair to directly compare our model to the original BERT because our model was trained with about the same level of compute as the original BERT, and the architecture of BERT and RoBERTa are basically the same. The original RoBERTa takes an order of magnitude more compute, although our model is also not that different in performance from the original RoBERTa on many standard benchmarks. Our model was trained on a cleaned October 2022 snapshot of Common Crawl and Wikipedia. This model was created as part of the OLM project, which has the goal of continuously training and releasing models that are up-to-date and comparable in standard language model performance to their static counterparts. This is important because we want our models to know about events like COVID or a presidential election right after they happen. Intended uses You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task, such as sequence classification, token classification or question answering. How to use You can use this model directly with a pipeline for masked language modeling: Here is how to use this model to get the features of a given text in PyTorch: Dataset The model and tokenizer were trained with this October 2022 cleaned Common Crawl dataset plus this October 2022 cleaned Wikipedia dataset.The tokenized version of these concatenated datasets is here.The datasets were created with this repo. Training The model was trained according to the OLM BERT/RoBERTa instructions at this repo. Evaluation results The model achieves the following results after tuning on GLUE tasks: For both the original BERT and our model, we used the Hugging Face run_glue.py script here. For both models, we used the default fine-tuning hyperparameters and we averaged the results over five training seeds. These are the results for the GLUE dev sets, which can be a bit different than the results for the test sets.
$-/run
16
Huggingface