dbrx-instruct

Maintainer: databricks

Total Score

1.0K

Last updated 4/28/2024

🎯

PropertyValue
Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Get summaries of the top AI models delivered straight to your inbox:

Model overview

dbrx-instruct is a 132 billion parameter mixture-of-experts (MoE) large language model developed by Databricks. It uses a fine-grained MoE architecture with 16 experts, choosing 4 on any given input, which provides 65x more possible expert combinations compared to other open MoE models like Mixtral-8x7B and Grok-1. This allows dbrx-instruct to achieve higher quality outputs than those models.

dbrx-instruct was pretrained on 12 trillion tokens of carefully curated data, which Databricks estimates is at least 2x better token-for-token than the data used to pretrain the MPT family of models. It uses techniques like curriculum learning, rotary position encodings, gated linear units, and grouped query attention to further improve performance.

Model inputs and outputs

Inputs

  • dbrx-instruct only accepts text-based inputs and accepts a context length of up to 32,768 tokens.

Outputs

  • dbrx-instruct only produces text-based outputs.

Capabilities

dbrx-instruct exhibits strong few-turn interaction capabilities, thanks to its fine-grained MoE architecture. It can engage in natural conversations, answer questions, and complete a variety of text-based tasks with high quality.

What can I use it for?

dbrx-instruct can be used for any natural language generation task where a high-performance, open-source model is needed. This could include building conversational assistants, question-answering systems, text summarization tools, and more. The model's broad capabilities make it a versatile choice for many AI and ML applications.

Things to try

One interesting aspect of dbrx-instruct is its ability to handle long-form inputs and outputs effectively, thanks to its large context window of 32,768 tokens. This makes it well-suited for tasks that require processing and generating longer pieces of text, such as summarizing research papers or engaging in multi-turn dialogues. Developers may want to experiment with pushing the boundaries of what the model can do in terms of the length and complexity of the inputs and outputs.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

📈

dbrx-base

databricks

Total Score

532

dbrx-base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. It uses a fine-grained MoE architecture with 132B total parameters, of which 36B are active on any input. Compared to other open MoE models like Mixtral-8x7B and Grok-1, dbrx-base has 16 experts and chooses 4, providing 65x more possible expert combinations. This fine-grained approach improves model quality. dbrx-base was pretrained on 12T tokens of carefully curated data, which is estimated to be 2x better than the data used for the Databricks MPT models. DBRX Instruct is a related model that has been instruction-tuned, specializing in few-turn interactions. Model inputs and outputs Inputs dbrx-base only accepts text-based inputs and accepts a context length of up to 32,768 tokens. Outputs dbrx-base only produces text-based outputs. Capabilities dbrx-base outperforms established open-source and open-weight base models on benchmarks like the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval, which measure performance across a range of tasks including world knowledge, common sense reasoning, language understanding, reading comprehension, symbolic problem solving, and programming. What can I use it for? dbrx-base and dbrx-instruct are intended for commercial and research use in English. The instruction-tuned dbrx-instruct model can be used as an off-the-shelf model for few-turn question answering related to general English-language and coding tasks. Both models can also be further fine-tuned for various domain-specific natural language and coding tasks. Things to try While dbrx-base demonstrates strong performance on a variety of benchmarks, users should exercise judgment and evaluate model outputs for accuracy and appropriateness before using or sharing them, as all foundation models carry risks. Databricks recommends using retrieval-augmented generation (RAG) in scenarios where accuracy and fidelity are important, and performing additional safety testing when fine-tuning the models.

Read more

Updated Invalid Date

🤔

DeciLM-7B-instruct

Deci

Total Score

96

DeciLM-7B-instruct is a 7 billion parameter language model developed by Deci that has been fine-tuned for short-form instruction following. It is built by LoRA fine-tuning on the SlimOrca dataset. The model leverages an optimized transformer decoder architecture with variable Grouped-Query Attention to achieve strong performance and efficiency. Compared to similar models like DeciLM-6B-instruct and DeciLM-7B, DeciLM-7B-instruct offers enhanced instruction-following capabilities while retaining the speed and accuracy of its base model. Model inputs and outputs DeciLM-7B-instruct is a text generation model that takes prompts as input and generates relevant text outputs. It can be used for a variety of natural language tasks, including question answering, summarization, and open-ended conversation. Inputs Prompts**: Free-form text that the model uses as a starting point to generate relevant output. Outputs Generated text**: The model's response to the input prompt, which can range from a single sentence to multiple paragraphs depending on the task. Capabilities DeciLM-7B-instruct is highly capable at understanding and following instructions provided in natural language. It can break down complex tasks into step-by-step instructions, provide detailed explanations, and generate relevant text outputs. The model's strong performance and efficiency make it a compelling choice for a wide range of applications, from customer service chatbots to task-oriented virtual assistants. What can I use it for? DeciLM-7B-instruct is well-suited for commercial and research use cases that require a language model with strong instruction-following capabilities. Some potential applications include: Customer service**: The model can be used to power chatbots that can provide detailed, step-by-step instructions to assist customers with product usage, troubleshooting, and other queries. Virtual assistants**: By leveraging the model's ability to understand and follow instructions, virtual assistants can be developed to help users with a variety of tasks, from scheduling appointments to providing cooking instructions. Content generation**: The model can be used to generate high-quality, relevant content for websites, blogs, and other digital platforms, with the ability to follow specific instructions or guidelines. Things to try One interesting aspect of DeciLM-7B-instruct is its ability to break down complex tasks into clear, step-by-step instructions. Try providing the model with prompts that involve multi-step processes, such as "How do I bake a cake?" or "Walk me through the process of changing a tire." Observe how the model responds, noting the level of detail and the clarity of the instructions provided. Another interesting experiment would be to explore the model's ability to follow instructions that involve creative or open-ended tasks, such as "Write a short story about a talking giraffe" or "Design a poster for a new music festival." This can help demonstrate the model's flexibility and its capacity for generating diverse and engaging content.

Read more

Updated Invalid Date

🏋️

DeciLM-6b-instruct

Deci

Total Score

133

DeciLM-6b-instruct is a 6 billion parameter language model developed by Deci that is optimized for short-form instruction following. It is built by fine-tuning the DeciLM 6B model on a subset of the OpenOrca dataset. The model uses an optimized transformer decoder architecture that includes variable Grouped-Query Attention, which allows for efficient processing while maintaining performance. Model inputs and outputs Inputs Natural language instructions or queries Outputs Coherent and relevant text responses to the provided inputs Capabilities DeciLM-6b-instruct is capable of following a wide range of instructions and generating appropriate responses. It can assist with tasks like answering questions, providing step-by-step instructions, and generating creative content. The model has demonstrated strong performance on benchmarks like ARC Challenge, BoolQ, and PIQA. What can I use it for? DeciLM-6b-instruct can be used for various commercial and research applications that require short-form instruction following in English. This includes virtual assistants, content generation, and task automation. The model can also be fine-tuned on additional data to adapt it to specific use cases or languages. For example, the DeciLM-7B-instruct model is a larger version of the DeciLM-6b-instruct model that has been fine-tuned for instruction following. Things to try One interesting aspect of DeciLM-6b-instruct is its use of variable Grouped-Query Attention, which allows it to maintain high performance while being computationally efficient. You could experiment with this model's ability to generate concise and accurate responses to a variety of instructions, and compare its performance to other instruction-following language models like Falcon-7B-Instruct or MPT-7B-Instruct. This could provide insights into the tradeoffs between model size, architecture, and instruction-following capabilities.

Read more

Updated Invalid Date

🧠

Phi-3-mini-128k-instruct

microsoft

Total Score

1.2K

The Phi-3-mini-128k-instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K, which is the context length (in tokens) that it can support. After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures. When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters. Model inputs and outputs Inputs Text prompts Outputs Generated text responses Capabilities The Phi-3-mini-128k-instruct model is designed to excel in memory/compute constrained environments, latency-bound scenarios, and tasks requiring strong reasoning skills, especially in areas like code, math, and logic. It can be used to accelerate research on language and multimodal models, serving as a building block for generative AI-powered features. What can I use it for? The Phi-3-mini-128k-instruct model is intended for commercial and research use in English. It can be particularly useful for applications that require efficient performance in resource-constrained settings or low-latency scenarios, such as mobile devices or edge computing environments. Given its strong reasoning capabilities, the model can be leveraged for tasks involving coding, mathematical reasoning, and logical problem-solving. Things to try One interesting aspect of the Phi-3-mini-128k-instruct model is its ability to perform well on benchmarks testing common sense, language understanding, and logical reasoning, even with a relatively small parameter count compared to larger language models. This suggests it could be a useful starting point for exploring ways to build efficient and capable AI assistants that can understand and reason about the world in a robust manner.

Read more

Updated Invalid Date