Openaccess-ai-collective

Rank:

Average Model Cost: $0.0000

Number of Runs: 4,796

Models by this creator

wizard-mega-13b

wizard-mega-13b

openaccess-ai-collective

Wizard Mega 13B has been updated and is now Manticore 13B 💵 Donate to OpenAccess AI Collective to help us keep building great tools and models! Manticore is available at https://huggingface.co/openaccess-ai-collective/manticore-13b and fixes many issues with Wizard Mega and adds new datasets to the training. Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond. Demo Try out the model in HF Spaces. The demo uses a quantized GGML version of the model to quickly return predictions on smaller GPUs (and even CPUs). Quantized GGML may have some minimal loss of model quality. https://huggingface.co/spaces/openaccess-ai-collective/wizard-mega-ggml Release (Epoch Two) The Wizard Mega 13B SFT model is being released after two epochs as the eval loss increased during the 3rd (final planned epoch). Because of this, we have preliminarily decided to use the epoch 2 checkpoint as the final release candidate. https://wandb.ai/wing-lian/vicuna-13b/runs/5uebgm49 Build Wizard Mega was built with Axolotl on 8xA100 80GB for 15 hours. The configuration to duplicate this build is provided in this repo's /config folder. Bias, Risks, and Limitations Wizard Mega has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Wizard Mega was fine-tuned from the base model LlaMa 13B, please refer to its model card's Limitations Section for relevant information. Examples

Read more

$-/run

1.4K

Huggingface

manticore-13b

manticore-13b

Manticore 13B - (previously Wizard Mega) 💵 Donate to OpenAccess AI Collective to help us keep building great tools and models! Questions, comments, feedback, looking to donate, or want to help? Reach out on our Discord or email wing@openaccessaicollective.org Manticore 13B is a Llama 13B model fine-tuned on the following datasets: ShareGPT - based on a cleaned and de-suped subset WizardLM Wizard-Vicuna subset of QingyiSi/Alpaca-CoT for roleplay and CoT GPT4-LLM-Cleaned GPTeacher-General-Instruct ARC-Easy & ARC-Challenge - instruct augmented for detailed responses mmlu: instruct augmented for detailed responses subset including abstract_algebra conceptual_physics formal_logic high_school_physics logical_fallacies hellaswag - 5K row subset of instruct augmented for concise responses metaeval/ScienceQA_text_only - instruct for concise responses openai/summarize_from_feedback - instruct augmented tl;dr summarization Demo Try out the model in HF Spaces. The demo uses a quantized GGML version of the model to quickly return predictions on smaller GPUs (and even CPUs). Quantized GGML may have some minimal loss of model quality. https://huggingface.co/spaces/openaccess-ai-collective/manticore-ggml Release Notes https://wandb.ai/wing-lian/manticore-13b/runs/nq3u3uoh/workspace Build Manticore was built with Axolotl on 8xA100 80GB Preview Release: 3 epochs taking approximately 24 hours. The configuration to duplicate this build is provided in this repo's /config folder. Bias, Risks, and Limitations Manticore has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Manticore was fine-tuned from the base model LlaMa 13B, please refer to its model card's Limitations Section for relevant information. Examples

Read more

$-/run

872

Huggingface

minotaur-13b

minotaur-13b

💵 Donate to OpenAccess AI Collective to help us keep building great tools and models! DEPRECATION! This model has been superseded by Minotaur 13B Fixed https://huggingface.co/openaccess-ai-collective/minotaur-13b-fixed Due to a bug, the initial release dropped a few datasets during training. We've corrected the issue and retrained the model ============================================================================================================================= ============================================================================================================================= ============================================================================================================================= Minotaur 13B Minotaur 13B is an instruct fine-tuned model on top of LlaMA-13B. Minotaur 13B is fine-tuned on only completely open datasets making this model reproducible by anyone. Questions, comments, feedback, looking to donate, or want to help? Reach out on our Discord or email wing@openaccessaicollective.org Prompts Chat only style prompts using USER:,ASSISTANT:. Training Datasets Minotaur 13B model is fine-tuned on the following openly available datasets: WizardLM subset of QingyiSi/Alpaca-CoT for roleplay and CoT GPTeacher-General-Instruct metaeval/ScienceQA_text_only - instruct for concise responses openai/summarize_from_feedback - instruct augmented tl;dr summarization camel-ai/math camel-ai/physics camel-ai/chemistry camel-ai/biology winglian/evals - instruct augmented datasets custom sysnthetic datasets around misconceptions, in-context qa, jokes, N-tasks problems, and context-insensitivity ARC-Easy & ARC-Challenge - instruct augmented for detailed responses, derived from the train split hellaswag - 30K+ rows of instruct augmented for detailed explanations w 30K+ rows, derived from the train split riddle_sense - instruct augmented, derived from the train split gsm8k - instruct augmented, derived from the train split prose generation Shoutouts Special thanks to Nanobit for helping with Axolotl and TheBloke for quantizing these models are more accessible to all. Demo HF Demo in Spaces available in the Community ChatBot Arena under the OAAIC Chatbots tab. Release Notes https://wandb.ai/wing-lian/minotaur-13b/runs/5zji06u6 Build Minotaur was built with Axolotl on 6XA100 80GB 1 epochs taking approximately 4.5 hours Bias, Risks, and Limitations Minotaur has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Minotaur was fine-tuned from the base model LLaMA-13B, please refer to its model card's Limitations Section for relevant information. (included below) Benchmarks hf-causal-experimental (pretrained=openaccess-ai-collective/minotaur-13b), limit: None, provide_description: False, num_fewshot: 0, batch_size: None Examples - results may vary based on temperature and other settings

Read more

$-/run

431

Huggingface

minotaur-mpt-7b

minotaur-mpt-7b

💵 Donate to OpenAccess AI Collective to help us keep building great tools and models! Minotaur MPT 7B Minotaur 7B is an instruct fine-tuned model on top of MPT-7B. Minotaur 7B is fine-tuned on only completely open datasets making this model reproducible by anyone. Questions, comments, feedback, looking to donate, or want to help? Reach out on our Discord or email wing@openaccessaicollective.org Prompts Chat only style prompts using USER:,ASSISTANT:. Training Datasets Minotaur 7B model is fine-tuned on the following datasets: WizardLM subset of QingyiSi/Alpaca-CoT for roleplay and CoT GPTeacher-General-Instruct metaeval/ScienceQA_text_only - instruct for concise responses openai/summarize_from_feedback - instruct augmented tl;dr summarization camel-ai/math camel-ai/physics camel-ai/chemistry camel-ai/biology winglian/evals custom sysnthetic datasets around misconceptions, in-context qa, jokes, N-tasks problems, and context-insensitivity ARC-Easy & ARC-Challenge - instruct augmented for detailed responses, derived from the train split hellaswag - 30K+ rows of instruct augmented for detailed explanations w 30K+ rows, derived from the train split riddle_sense - instruct augmented gsm8k - instruct augmented Shoutouts Special thanks to Nanobit for helping with Axolotl and TheBloke for quantizing these models are more accessible to all. Demo HF Demo in Spaces coming soon. Release Notes https://wandb.ai/wing-lian/mpt-7b-4k-minotaur/runs/i4zib0j4 Build Minotaur was built with Axolotl on 7xA100 80GB 3 epochs taking approximately 6 hours Bias, Risks, and Limitations Minotaur has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Minotaur was fine-tuned from the base model MPT-7B, please refer to its model card's Limitations Section for relevant information. (included below) Examples - results may vary based on temperature and other settings wut? 🤣 👌 looks like you need the system prompt yup, it still sucks at math MPT-7B MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML. MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's FasterTransformer. This model uses the MosaicML LLM codebase, which can be found in the llm-foundry repository. It was trained by MosaicML’s NLP team on the MosaicML platform for LLM pretraining, finetuning, and inference. How is this model different? MPT-7B is Licensed for the possibility of commercial use (unlike LLaMA). Trained on a large amount of data (1T tokens like LLaMA vs. 300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM). Prepared to handle extremely long inputs thanks to ALiBi (we finetuned MPT-7B-StoryWriter-65k+ on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models). Capable of fast training and inference (via FlashAttention and FasterTransformer) Equipped with highly efficient open-source training code via the llm-foundry repository Models finetuned off MPT-7B: The following models are finetuned on MPT-7B: MPT-7B-StoryWriter-65k+: a model designed to read and write fictional stories with super long context lengths. Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. At inference time, thanks to ALiBi, MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens. We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our blogpost. License: Apache 2.0 MPT-7B-Instruct: a model for short-form instruction following. Built by finetuning MPT-7B on a dataset we also release, derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets. License: CC-By-SA-3.0 Demo on Hugging Face Spaces MPT-7B-Chat: a chatbot-like model for dialogue generation. Built by finetuning MPT-7B on the ShareGPT-Vicuna, HC3, Alpaca, HH-RLHF, and Evol-Instruct datasets. License: CC-By-NC-SA-4.0 Demo on Hugging Face Spaces Model Date May 5, 2023 Model License Apache-2.0 Documentation Blog post: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs Codebase (mosaicml/llm-foundry repo) Questions: Feel free to contact us via the MosaicML Community Slack! How to Use This model is best used with the MosaicML llm-foundry repository for training and finetuning. Note: This model requires that trust_remote_code=True be passed to the from_pretrained method. This is because we use a custom MPT model architecture that is not yet part of the Hugging Face transformers package. MPT includes options for many training efficiency features such as FlashAttention, ALiBi, QK LayerNorm, and more. To use the optimized triton implementation of FlashAttention, you can load the model on GPU (cuda:0) with attn_impl='triton' and with bfloat16 precision: Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example: This model was trained with the EleutherAI/gpt-neox-20b tokenizer. Model Description The architecture is a modification of a standard decoder-only transformer. The model has been modified from a standard transformer in the following ways: It uses FlashAttention It uses ALiBi (Attention with Linear Biases) and does not use positional embeddings It does not use biases Training Data Streaming Datasets Data was formatted using the MosaicML StreamingDataset library to host our data in object storage and efficiently stream it to our compute cluster during training. StreamingDataset obviates the need to download the whole dataset before starting training, and allows instant resumption of training from any point in the dataset. Data Mix The model was trained for 1T tokens (with batch size 1760 and sequence length 2048). It was trained on the following data mix: Samples for each batch were selected from one of the datasets with the probability specified above. The examples were shuffled within each dataset, and each example was constructed from as many sequences from that dataset as were necessary to fill the 2048 sequence length. The data was tokenized using the EleutherAI/gpt-neox-20b tokenizer. This BPE tokenizer has a number of desirable characteristics, most of which are relevant for tokenizing code: (1) It was trained on a diverse mix of data that includes code (The Pile) (2) It applies consistent space delimitation, unlike the GPT2 tokenizer which tokenizes inconsistently depending on the presence of prefix spaces (3) It contains tokens for repeated space characters, which allows superior compression of text with large amounts of repeated space characters. The model vocabulary size of 50432 was set to be a multiple of 128 (as in MEGATRON-LM), model flop utilization (MFU) increased by up to four percentage points. Training Configuration This model was trained on 440 A100-40GBs for about 9.5 days using the MosaicML Platform. The model was trained with sharded data parallelism using FSDP and used the LION optimizer. Limitations and Biases The following language is modified from EleutherAI's GPT-NeoX-20B MPT-7B (Base) is not intended for deployment without finetuning. It should not be used for human-facing interactions without further guardrails and user consent. MPT-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. MPT-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs. MosaicML Platform If you're interested in training and deploying your own MPT or LLMs on the MosaicML Platform, sign up here. Disclaimer The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes. Citation Please cite this model using the following format:

Read more

$-/run

331

Huggingface

openllama-7b-4k

openllama-7b-4k

OpenLLaMA: An Open Reproduction of LLaMA In this repo, we present a permissively licensed open source reproduction of Meta AI's LLaMA large language model. We are releasing a 7B and 3B model trained on 1T tokens, as well as the preview of a 13B model trained on 600B tokens. We provide PyTorch and JAX weights of pre-trained OpenLLaMA models, as well as evaluation results and comparison against the original LLaMA models. Please see the project homepage of OpenLLaMA for more details. Weights Release, License and Usage We release the weights in two formats: an EasyLM format to be use with our EasyLM framework, and a PyTorch format to be used with the Hugging Face transformers library. Both our training framework EasyLM and the checkpoint weights are licensed permissively under the Apache 2.0 license. Loading the Weights with Hugging Face Transformers Preview checkpoints can be directly loaded from Hugging Face Hub. Please note that it is advised to avoid using the Hugging Face fast tokenizer for now, as we’ve observed that the auto-converted fast tokenizer sometimes gives incorrect tokenizations. This can be achieved by directly using the LlamaTokenizer class, or passing in the use_fast=False option for the AutoTokenizer class. See the following example for usage. For more advanced usage, please follow the transformers LLaMA documentation. Evaluating with LM-Eval-Harness The model can be evaluated with lm-eval-harness. However, due to the aforementioned tokenizer issue, we need to avoid using the fast tokenizer to obtain the correct results. This can be achieved by passing in use_fast=False to this part of lm-eval-harness, as shown in the example below: Loading the Weights with EasyLM For using the weights in our EasyLM framework, please refer to the LLaMA documentation of EasyLM. Note that unlike the original LLaMA model, our OpenLLaMA tokenizer and weights are trained completely from scratch so it is no longer needed to obtain the original LLaMA tokenizer and weights. Note that we use BOS (beginning of sentence) token (id=1) during training, so it is best to prepend this token for best performance during few-shot evaluation. Dataset and Training We train our models on the RedPajama dataset released by Together, which is a reproduction of the LLaMA training dataset containing over 1.2 trillion tokens. We follow the exactly same preprocessing steps and training hyperparameters as the original LLaMA paper, including model architecture, context length, training steps, learning rate schedule, and optimizer. The only difference between our setting and the original one is the dataset used: OpenLLaMA employs the RedPajama dataset rather than the one utilized by the original LLaMA. We train the models on cloud TPU-v4s using EasyLM, a JAX based training pipeline we developed for training and fine-tuning large language models. We employ a combination of normal data parallelism and fully sharded data parallelism (also know as ZeRO stage 3) to balance the training throughput and memory usage. Overall we reach a throughput of over 2200 tokens / second / TPU-v4 chip for our 7B model. Evaluation We evaluated OpenLLaMA on a wide range of tasks using lm-evaluation-harness. The LLaMA results are generated by running the original LLaMA model on the same evaluation metrics. We note that our results for the LLaMA model differ slightly from the original LLaMA paper, which we believe is a result of different evaluation protocols. Similar differences have been reported in this issue of lm-evaluation-harness. Additionally, we present the results of GPT-J, a 6B parameter model trained on the Pile dataset by EleutherAI. The original LLaMA model was trained for 1 trillion tokens and GPT-J was trained for 500 billion tokens. We present the results in the table below. OpenLLaMA exhibits comparable performance to the original LLaMA and GPT-J across a majority of tasks, and outperforms them in some tasks. We removed the task CB and WSC from our benchmark, as our model performs suspiciously well on these two tasks. We hypothesize that there could be a benchmark data contamination in the training set. Contact We would love to get feedback from the community. If you have any questions, please open an issue or contact us. OpenLLaMA is developed by: Xinyang Geng* and Hao Liu* from Berkeley AI Research. *Equal Contribution Acknowledgment We thank the Google TPU Research Cloud program for providing part of the computation resources. We’d like to specially thank Jonathan Caton from TPU Research Cloud for helping us organizing compute resources, Rafi Witten from the Google Cloud team and James Bradbury from the Google JAX team for helping us optimizing our training throughput. We’d also want to thank Charlie Snell, Gautier Izacard, Eric Wallace, Lianmin Zheng and our user community for the discussions and feedback. The OpenLLaMA 13B model is trained in collaboration with Stability AI, and we thank Stability AI for providing the computation resources. We’d like to especially thank David Ha and Shivanshu Purohit for the coordinating the logistics and providing engineering support. Reference If you found OpenLLaMA useful in your research or applications, please cite using the following BibTeX:

Read more

$-/run

243

Huggingface

minotaur-13b-fixed

minotaur-13b-fixed

💵 Donate to OpenAccess AI Collective to help us keep building great tools and models! Due to a bug, the initial release of Minotaur 13B dropped a few datasets during training. We have corrected the issue and this is the retrained model The affected datasets include: prose generation classification coding Minotaur 13B (FIXED) Minotaur 13B is an instruct fine-tuned model on top of LlaMA-13B. Minotaur 13B is fine-tuned on only completely open datasets making this model reproducible by anyone. Questions, comments, feedback, looking to donate, or want to help? Reach out on our Discord or email wing@openaccessaicollective.org Prompts Chat only style prompts using USER:,ASSISTANT:. Training Datasets Minotaur 13B model is fine-tuned on the following openly available datasets: WizardLM subset of QingyiSi/Alpaca-CoT for roleplay and CoT GPTeacher-General-Instruct metaeval/ScienceQA_text_only - instruct for concise responses openai/summarize_from_feedback - instruct augmented tl;dr summarization camel-ai/math camel-ai/physics camel-ai/chemistry camel-ai/biology winglian/evals - instruct augmented datasets custom sysnthetic datasets around misconceptions, in-context qa, jokes, N-tasks problems, and context-insensitivity ARC-Easy & ARC-Challenge - instruct augmented for detailed responses, derived from the train split hellaswag - 30K+ rows of instruct augmented for detailed explanations w 30K+ rows, derived from the train split riddle_sense - instruct augmented, derived from the train split gsm8k - instruct augmented, derived from the train split prose generation Shoutouts Special thanks to Nanobit for helping with Axolotl and TheBloke for quantizing these models are more accessible to all. Demo HF Demo in Spaces available in the Community ChatBot Arena under the OAAIC Chatbots tab. Release Notes https://wandb.ai/wing-lian/minotaur-13b/runs/5ystr7w6/workspace Build Minotaur was built with Axolotl on 6XA100 80GB 1 epochs taking approximately 7.5 hours Bias, Risks, and Limitations Minotaur has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Minotaur was fine-tuned from the base model LLaMA-13B, please refer to its model card's Limitations Section for relevant information. (included below) Benchmarks hf-causal-experimental (pretrained=openaccess-ai-collective/minotaur-13b-fixed), limit: None, provide_description: False, num_fewshot: 0, batch_size: None Examples - results may vary based on temperature (0.7 for this) and other settings meh, 6/7/7 rather than 5/7/5

Read more

$-/run

196

Huggingface

Similar creators