dbrx-base

Maintainer: databricks

Total Score

532

Last updated 4/28/2024

📈

PropertyValue
Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Get summaries of the top AI models delivered straight to your inbox:

Model overview

dbrx-base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. It uses a fine-grained MoE architecture with 132B total parameters, of which 36B are active on any input. Compared to other open MoE models like Mixtral-8x7B and Grok-1, dbrx-base has 16 experts and chooses 4, providing 65x more possible expert combinations. This fine-grained approach improves model quality. dbrx-base was pretrained on 12T tokens of carefully curated data, which is estimated to be 2x better than the data used for the Databricks MPT models.

DBRX Instruct is a related model that has been instruction-tuned, specializing in few-turn interactions.

Model inputs and outputs

Inputs

  • dbrx-base only accepts text-based inputs and accepts a context length of up to 32,768 tokens.

Outputs

  • dbrx-base only produces text-based outputs.

Capabilities

dbrx-base outperforms established open-source and open-weight base models on benchmarks like the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval, which measure performance across a range of tasks including world knowledge, common sense reasoning, language understanding, reading comprehension, symbolic problem solving, and programming.

What can I use it for?

dbrx-base and dbrx-instruct are intended for commercial and research use in English. The instruction-tuned dbrx-instruct model can be used as an off-the-shelf model for few-turn question answering related to general English-language and coding tasks. Both models can also be further fine-tuned for various domain-specific natural language and coding tasks.

Things to try

While dbrx-base demonstrates strong performance on a variety of benchmarks, users should exercise judgment and evaluate model outputs for accuracy and appropriateness before using or sharing them, as all foundation models carry risks. Databricks recommends using retrieval-augmented generation (RAG) in scenarios where accuracy and fidelity are important, and performing additional safety testing when fine-tuning the models.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🎯

dbrx-instruct

databricks

Total Score

1.0K

dbrx-instruct is a 132 billion parameter mixture-of-experts (MoE) large language model developed by Databricks. It uses a fine-grained MoE architecture with 16 experts, choosing 4 on any given input, which provides 65x more possible expert combinations compared to other open MoE models like Mixtral-8x7B and Grok-1. This allows dbrx-instruct to achieve higher quality outputs than those models. dbrx-instruct was pretrained on 12 trillion tokens of carefully curated data, which Databricks estimates is at least 2x better token-for-token than the data used to pretrain the MPT family of models. It uses techniques like curriculum learning, rotary position encodings, gated linear units, and grouped query attention to further improve performance. Model inputs and outputs Inputs dbrx-instruct only accepts text-based inputs and accepts a context length of up to 32,768 tokens. Outputs dbrx-instruct only produces text-based outputs. Capabilities dbrx-instruct exhibits strong few-turn interaction capabilities, thanks to its fine-grained MoE architecture. It can engage in natural conversations, answer questions, and complete a variety of text-based tasks with high quality. What can I use it for? dbrx-instruct can be used for any natural language generation task where a high-performance, open-source model is needed. This could include building conversational assistants, question-answering systems, text summarization tools, and more. The model's broad capabilities make it a versatile choice for many AI and ML applications. Things to try One interesting aspect of dbrx-instruct is its ability to handle long-form inputs and outputs effectively, thanks to its large context window of 32,768 tokens. This makes it well-suited for tasks that require processing and generating longer pieces of text, such as summarizing research papers or engaging in multi-turn dialogues. Developers may want to experiment with pushing the boundaries of what the model can do in terms of the length and complexity of the inputs and outputs.

Read more

Updated Invalid Date

xlm-roberta-base

FacebookAI

Total Score

513

The xlm-roberta-base model is a multilingual version of the RoBERTa transformer model, developed by FacebookAI. It was pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages, building on the innovations of the original RoBERTa model. Like RoBERTa, xlm-roberta-base uses the masked language modeling (MLM) objective, which randomly masks 15% of the words in the input and has the model predict the masked words. This allows the model to learn a robust, bidirectional representation of the sentences. The xlm-roberta-base model can be contrasted with other large multilingual models like BERT-base-multilingual-cased, which was trained on 104 languages but used a simpler pre-training objective. The xlm-roberta-base model aims to provide strong cross-lingual transfer learning capabilities by leveraging a much larger and more diverse training dataset. Model inputs and outputs Inputs Text**: The xlm-roberta-base model takes natural language text as input. Outputs Masked word predictions**: The primary output of the model is a probability distribution over the vocabulary for each masked token in the input. Contextual text representations**: The model can also be used to extract feature representations of the input text, which can be useful for downstream tasks like text classification or sequence labeling. Capabilities The xlm-roberta-base model has been shown to perform well on a variety of cross-lingual tasks, outperforming other multilingual models on benchmarks like XNLI and MLQA. It is particularly well-suited for applications that require understanding text in multiple languages, such as multilingual customer support, cross-lingual search, and translation assistance. What can I use it for? The xlm-roberta-base model can be fine-tuned on a wide range of downstream tasks, from text classification to question answering. Some potential use cases include: Multilingual text classification**: Classify documents, social media posts, or other text into categories like sentiment, topic, or intent, across multiple languages. Cross-lingual search and retrieval**: Retrieve relevant documents in one language based on a query in another language. Multilingual question answering**: Build systems that can answer questions posed in different languages by leveraging the model's cross-lingual understanding. Multilingual conversational AI**: Power chatbots and virtual assistants that can communicate fluently in multiple languages. Things to try One interesting aspect of the xlm-roberta-base model is its ability to handle code-switching - the practice of alternating between multiple languages within a single sentence or paragraph. You could experiment with feeding the model text that mixes languages, and observe how well it is able to understand and process the input. Additionally, you could try fine-tuning the model on specialized datasets in different languages to see how it adapts to specific domains and use cases.

Read more

Updated Invalid Date

🛸

dolly-v2-3b

databricks

Total Score

281

The dolly-v2-3b is a 2.8 billion parameter causal language model created by Databricks, a leading cloud data and AI company. It is derived from EleutherAI's pythia-2.8b model and fine-tuned on a ~15K record instruction corpus generated by Databricks employees. This makes dolly-v2-3b an instruction-following model, trained to perform a variety of tasks like brainstorming, classification, QA, generation, and summarization. While dolly-v2-3b is not a state-of-the-art model, it exhibits surprisingly high-quality instruction following behavior compared to its foundation model. Databricks has also released larger versions of the Dolly model, including dolly-v2-12b and dolly-v2-7b, which leverage larger pretrained models from EleutherAI. Model inputs and outputs Inputs Instruction**: The model takes a natural language instruction as input, which can cover a wide range of tasks like question answering, text generation, language understanding, and more. Outputs Generated text**: The model generates text in response to the given instruction. The output length and quality will depend on the complexity of the instruction and the model's capabilities. Capabilities The dolly-v2-3b model demonstrates strong instruction following behavior across a variety of domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. For example, it can generate coherent and relevant responses to prompts like "Write a press release announcing a new underwater research facility" or "Classify these sentences as positive, negative, or neutral in sentiment." What can I use it for? The dolly-v2-3b model can be a valuable tool for developers and researchers working on natural language processing applications that require instruction following or generation capabilities. Some potential use cases include: Chatbots and virtual assistants**: The model's ability to understand and respond to natural language instructions can be leveraged to build more engaging and capable conversational AI systems. Content generation**: dolly-v2-3b can be used to generate a wide range of text-based content, from creative writing to technical documentation, based on high-level instructions. Task automation**: The model can be used to automate various text-based tasks, like research, summarization, and data extraction, by translating high-level instructions into concrete actions. Things to try One key capability of dolly-v2-3b is its ability to follow complex instructions and generate coherent responses, even for tasks that may be outside the scope of its training data. For example, you can try providing the model with instructions that require reasoning, such as "Explain the difference between nuclear fission and fusion in a way that a 10-year-old would understand." The model's ability to break down technical concepts and explain them clearly is an impressive feature. Another interesting aspect to explore is the model's performance on open-ended tasks, where the instruction leaves room for creative interpretation. For instance, you could try prompting the model with "Write a short story about a robot who discovers their true purpose" and see how it generates an engaging narrative.

Read more

Updated Invalid Date

dolly-v2-7b

databricks

Total Score

146

dolly-v2-7b is a 6.9 billion parameter causal language model created by Databricks that is derived from EleutherAI's Pythia-6.9b and fine-tuned on a ~15K record instruction corpus generated by Databricks employees. It is designed to exhibit high-quality instruction following behavior, though it is not considered a state-of-the-art model. Dolly v2 is also available in larger model sizes, including dolly-v2-12b and dolly-v2-3b. Model inputs and outputs dolly-v2-7b is an instruction-following language model, meaning it takes natural language instructions as input and generates corresponding text responses. The model was trained on a diverse set of instruction-response pairs, allowing it to handle a wide range of tasks such as brainstorming, classification, question answering, text generation, and summarization. Inputs Natural language instructions or prompts Outputs Text responses that complete the given instruction or prompt Capabilities dolly-v2-7b exhibits strong performance on instruction-following tasks, capable of generating coherent and relevant responses across a variety of domains. For example, it can help with brainstorming ideas, providing summaries of text, answering questions, and generating text on specified topics. However, the model is not state-of-the-art and has limitations, such as struggling with complex prompts, mathematical operations, and open-ended question answering. What can I use it for? dolly-v2-7b could be useful for a variety of applications that involve natural language processing and generation, such as: Content creation: Generating text for blog posts, marketing materials, or other written content Question answering: Providing informative responses to user questions on a wide range of topics Task assistance: Helping with brainstorming, research, or other open-ended tasks that require text generation However, it's important to keep in mind the model's limitations and use it accordingly. The model may not be suitable for high-stakes or safety-critical applications. Things to try One interesting aspect of dolly-v2-7b is its ability to exhibit instruction-following behavior that is more advanced than its underlying foundation model, Pythia-6.9b. This suggests that fine-tuning on a focused dataset can meaningfully improve a model's capabilities in specific domains, even if it does not outperform more recent state-of-the-art models. Experimenting with different prompts and task types could reveal interesting insights about the model's strengths and weaknesses. Additionally, comparing the performance of dolly-v2-7b to the larger dolly-v2-12b and smaller dolly-v2-3b models could provide useful information about the relationship between model size and instruction-following capabilities.

Read more

Updated Invalid Date