Mosaicml

Rank:

Average Model Cost: $0.0000

Number of Runs: 3,742,411

Models by this creator

mpt-7b-instruct

mpt-7b-instruct

mosaicml

The model mpt-7b-instruct is a text generation model that provides concise, complete instructional summaries for technical audiences. It has been trained to generate instructions that are clear, informative, and to the point. This model can be used to automatically generate instructions for a wide range of technical tasks, making it a valuable tool for developers, engineers, and other technical professionals.

Read more

$-/run

3.4M

Huggingface

mpt-30b-instruct

mpt-30b-instruct

The mpt-30b-instruct model is a text generation model that has been trained by OpenAI. It is designed to assist with generating instructions for different tasks and activities. The model is trained on a large dataset of instructional text and is capable of providing step-by-step guidance and explanations for various tasks. It can be a useful tool for developers and researchers working on natural language processing tasks, as well as for applications that require generating instructions for users.

Read more

$-/run

37.5K

Huggingface

mpt-1b-redpajama-200b-dolly

mpt-1b-redpajama-200b-dolly

The mpt-1b-redpajama-200b-dolly model is a text generation model that has been trained on an undisclosed dataset to generate text. It uses the GPT (Generative Pre-trained Transformer) architecture and has been fine-tuned with additional training on specific tasks. This model is capable of generating text in response to given prompts, providing detailed, coherent, and contextually relevant responses. It can be useful in a variety of natural language processing (NLP) applications such as chatbots, question answering systems, and content generation.

Read more

$-/run

8.3K

Huggingface

mosaic-bert-base-seqlen-1024

mosaic-bert-base-seqlen-1024

MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining. MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against Hugging Face's bert-base-uncased. It incorporates efficiency insights from the past half a decade of transformers research, from RoBERTa to T5 and GPT. This model was trained with ALiBi on a sequence length of 1024 tokens. ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (Press et al. 2022) It is part of the family of MosaicBERT-Base models trained using ALiBi on different sequence lengths: The primary use case of these models is for research on efficient pretraining and finetuning for long context embeddings. April 2023 The tokenizer for this model is simply the Hugging Face bert-base-uncased tokenizer. To use this model directly for masked language modeling, use pipeline: To continue MLM pretraining, follow the MLM pre-training section of the mosaicml/examples/bert repo. To fine-tune this model for classification, follow the Single-task fine-tuning section of the mosaicml/examples/bert repo. This model requires that trust_remote_code=True be passed to the from_pretrained method. This is because we train using FlashAttention (Dao et al. 2022), which is not part of the transformers library and depends on Triton and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git revision argument that specifies the exact commit of the code, for example: However, if there are updates to this model or code and you specify a revision, you will need to manually check for them and update the commit hash accordingly. In order to build MosaicBERT, we adopted architectural choices from the recent transformer literature. These include FlashAttention (Dao et al. 2022), ALiBi (Press et al. 2021), and Gated Linear Units (Shazeer 2020). In addition, we remove padding inside the transformer block, and apply LayerNorm with low precision. MosaicBERT is pretrained using a standard Masked Language Modeling (MLM) objective: the model is given a sequence of text with some tokens hidden, and it has to predict these masked tokens. MosaicBERT is trained on the English “Colossal, Cleaned, Common Crawl” C4 dataset, which contains roughly 365 million curated text documents scraped from the internet (equivalent to 156 billion tokens). We used this more modern dataset in place of traditional BERT pretraining corpora like English Wikipedia and BooksCorpus. Many of these pretraining optimizations below were informed by our BERT results for the MLPerf v2.1 speed benchmark. This model is intended to be finetuned on downstream tasks. Please cite this model using the following format:

Read more

$-/run

3.0K

Huggingface

Similar creators