Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Databricks

Models by this creator

📶

dolly-v2-12b

databricks

Total Score

1.9K

dolly-v2-12b is a 12 billion parameter causal language model created by Databricks that is derived from EleutherAI's Pythia-12b and fine-tuned on a ~15K record instruction corpus generated by Databricks employees. It exhibits strong instruction-following behavior beyond what is typical of the foundation model it is based on. Dolly v2 is also available in smaller model sizes of 7B and 3B parameters. All Dolly models are licensed for commercial use. Model Inputs and Outputs Inputs Text prompts that the model should follow as instructions Outputs Textual responses generated by the model based on the provided instruction Capabilities dolly-v2-12b demonstrates strong performance on a range of instruction-following tasks, including brainstorming, classification, closed-ended QA, generation, information extraction, open-ended QA, and summarization. It outperforms its foundation model, pythia-12b, on these capabilities. What Can I Use It For? The dolly-v2-12b model can be useful for a variety of commercial and research applications that require following open-ended instructions, such as: Building conversational AI assistants that can engage in helpful, open-ended dialogue Automating content generation tasks like summarization, question-answering, and creative writing Developing applications that require robust language understanding and generation capabilities Things to Try One interesting aspect of dolly-v2-12b is its ability to follow instructions that require reasoning and multi-step problem solving, going beyond simple language generation. Developers could experiment with using the model for tasks like code generation, task planning, and complex data analysis, and see how it performs compared to other instruction-following language models.

Read more

Updated 5/17/2024

🎯

dbrx-instruct

databricks

Total Score

1.0K

dbrx-instruct is a 132 billion parameter mixture-of-experts (MoE) large language model developed by Databricks. It uses a fine-grained MoE architecture with 16 experts, choosing 4 on any given input, which provides 65x more possible expert combinations compared to other open MoE models like Mixtral-8x7B and Grok-1. This allows dbrx-instruct to achieve higher quality outputs than those models. dbrx-instruct was pretrained on 12 trillion tokens of carefully curated data, which Databricks estimates is at least 2x better token-for-token than the data used to pretrain the MPT family of models. It uses techniques like curriculum learning, rotary position encodings, gated linear units, and grouped query attention to further improve performance. Model inputs and outputs Inputs dbrx-instruct only accepts text-based inputs and accepts a context length of up to 32,768 tokens. Outputs dbrx-instruct only produces text-based outputs. Capabilities dbrx-instruct exhibits strong few-turn interaction capabilities, thanks to its fine-grained MoE architecture. It can engage in natural conversations, answer questions, and complete a variety of text-based tasks with high quality. What can I use it for? dbrx-instruct can be used for any natural language generation task where a high-performance, open-source model is needed. This could include building conversational assistants, question-answering systems, text summarization tools, and more. The model's broad capabilities make it a versatile choice for many AI and ML applications. Things to try One interesting aspect of dbrx-instruct is its ability to handle long-form inputs and outputs effectively, thanks to its large context window of 32,768 tokens. This makes it well-suited for tasks that require processing and generating longer pieces of text, such as summarizing research papers or engaging in multi-turn dialogues. Developers may want to experiment with pushing the boundaries of what the model can do in terms of the length and complexity of the inputs and outputs.

Read more

Updated 4/28/2024

📈

dbrx-base

databricks

Total Score

532

dbrx-base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. It uses a fine-grained MoE architecture with 132B total parameters, of which 36B are active on any input. Compared to other open MoE models like Mixtral-8x7B and Grok-1, dbrx-base has 16 experts and chooses 4, providing 65x more possible expert combinations. This fine-grained approach improves model quality. dbrx-base was pretrained on 12T tokens of carefully curated data, which is estimated to be 2x better than the data used for the Databricks MPT models. DBRX Instruct is a related model that has been instruction-tuned, specializing in few-turn interactions. Model inputs and outputs Inputs dbrx-base only accepts text-based inputs and accepts a context length of up to 32,768 tokens. Outputs dbrx-base only produces text-based outputs. Capabilities dbrx-base outperforms established open-source and open-weight base models on benchmarks like the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval, which measure performance across a range of tasks including world knowledge, common sense reasoning, language understanding, reading comprehension, symbolic problem solving, and programming. What can I use it for? dbrx-base and dbrx-instruct are intended for commercial and research use in English. The instruction-tuned dbrx-instruct model can be used as an off-the-shelf model for few-turn question answering related to general English-language and coding tasks. Both models can also be further fine-tuned for various domain-specific natural language and coding tasks. Things to try While dbrx-base demonstrates strong performance on a variety of benchmarks, users should exercise judgment and evaluate model outputs for accuracy and appropriateness before using or sharing them, as all foundation models carry risks. Databricks recommends using retrieval-augmented generation (RAG) in scenarios where accuracy and fidelity are important, and performing additional safety testing when fine-tuning the models.

Read more

Updated 4/28/2024

🤔

dolly-v1-6b

databricks

Total Score

309

dolly-v1-6b is a 6 billion parameter causal language model developed by Databricks that is derived from EleutherAI's GPT-J model and fine-tuned on a ~52K record instruction corpus from the Stanford Alpaca dataset. This model demonstrates that a relatively small amount of fine-tuning on a focused dataset can imbue an existing language model with surprisingly high-quality instruction following capabilities. The Dolly model family represents Databricks' first steps towards democratizing powerful AI technologies. Dolly v2, which includes larger model sizes, has since been released and is recommended over this initial v1 model. Model Inputs and Outputs Inputs Text prompts**: dolly-v1-6b can accept natural language text prompts as input, which it then uses to generate relevant output text. Outputs Textual responses**: Given an input prompt, the model will generate a textual response attempting to follow the instructions or answer the query posed in the prompt. Capabilities dolly-v1-6b exhibits surprisingly high-quality instruction following behavior compared to its GPT-J foundation model, despite being fine-tuned in just 30 minutes on a relatively small dataset. This suggests that the ability to create powerful AI technologies is more accessible than previously thought. What Can I Use It For? The Dolly model family is intended to be used for research, experimentation, and the development of creative or educational tools that leverage language model capabilities. Potential use cases include generating text-based content, answering questions, and following instructions, though the model may exhibit biases or limitations in certain domains. Things to Try Since dolly-v1-6b is derived from an older GPT-J model, it may not exhibit the same level of performance or capabilities as more recent, larger language models. Experimenting with prompts and evaluating the model's outputs can help uncover its strengths and limitations. Additionally, exploring the newer Dolly v2 models could provide insights into how fine-tuning and scaling can enhance an AI model's instruction-following abilities.

Read more

Updated 5/17/2024

🛸

dolly-v2-3b

databricks

Total Score

281

The dolly-v2-3b is a 2.8 billion parameter causal language model created by Databricks, a leading cloud data and AI company. It is derived from EleutherAI's pythia-2.8b model and fine-tuned on a ~15K record instruction corpus generated by Databricks employees. This makes dolly-v2-3b an instruction-following model, trained to perform a variety of tasks like brainstorming, classification, QA, generation, and summarization. While dolly-v2-3b is not a state-of-the-art model, it exhibits surprisingly high-quality instruction following behavior compared to its foundation model. Databricks has also released larger versions of the Dolly model, including dolly-v2-12b and dolly-v2-7b, which leverage larger pretrained models from EleutherAI. Model inputs and outputs Inputs Instruction**: The model takes a natural language instruction as input, which can cover a wide range of tasks like question answering, text generation, language understanding, and more. Outputs Generated text**: The model generates text in response to the given instruction. The output length and quality will depend on the complexity of the instruction and the model's capabilities. Capabilities The dolly-v2-3b model demonstrates strong instruction following behavior across a variety of domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. For example, it can generate coherent and relevant responses to prompts like "Write a press release announcing a new underwater research facility" or "Classify these sentences as positive, negative, or neutral in sentiment." What can I use it for? The dolly-v2-3b model can be a valuable tool for developers and researchers working on natural language processing applications that require instruction following or generation capabilities. Some potential use cases include: Chatbots and virtual assistants**: The model's ability to understand and respond to natural language instructions can be leveraged to build more engaging and capable conversational AI systems. Content generation**: dolly-v2-3b can be used to generate a wide range of text-based content, from creative writing to technical documentation, based on high-level instructions. Task automation**: The model can be used to automate various text-based tasks, like research, summarization, and data extraction, by translating high-level instructions into concrete actions. Things to try One key capability of dolly-v2-3b is its ability to follow complex instructions and generate coherent responses, even for tasks that may be outside the scope of its training data. For example, you can try providing the model with instructions that require reasoning, such as "Explain the difference between nuclear fission and fusion in a way that a 10-year-old would understand." The model's ability to break down technical concepts and explain them clearly is an impressive feature. Another interesting aspect to explore is the model's performance on open-ended tasks, where the instruction leaves room for creative interpretation. For instance, you could try prompting the model with "Write a short story about a robot who discovers their true purpose" and see how it generates an engaging narrative.

Read more

Updated 5/17/2024

dolly-v2-7b

databricks

Total Score

146

dolly-v2-7b is a 6.9 billion parameter causal language model created by Databricks that is derived from EleutherAI's Pythia-6.9b and fine-tuned on a ~15K record instruction corpus generated by Databricks employees. It is designed to exhibit high-quality instruction following behavior, though it is not considered a state-of-the-art model. Dolly v2 is also available in larger model sizes, including dolly-v2-12b and dolly-v2-3b. Model inputs and outputs dolly-v2-7b is an instruction-following language model, meaning it takes natural language instructions as input and generates corresponding text responses. The model was trained on a diverse set of instruction-response pairs, allowing it to handle a wide range of tasks such as brainstorming, classification, question answering, text generation, and summarization. Inputs Natural language instructions or prompts Outputs Text responses that complete the given instruction or prompt Capabilities dolly-v2-7b exhibits strong performance on instruction-following tasks, capable of generating coherent and relevant responses across a variety of domains. For example, it can help with brainstorming ideas, providing summaries of text, answering questions, and generating text on specified topics. However, the model is not state-of-the-art and has limitations, such as struggling with complex prompts, mathematical operations, and open-ended question answering. What can I use it for? dolly-v2-7b could be useful for a variety of applications that involve natural language processing and generation, such as: Content creation: Generating text for blog posts, marketing materials, or other written content Question answering: Providing informative responses to user questions on a wide range of topics Task assistance: Helping with brainstorming, research, or other open-ended tasks that require text generation However, it's important to keep in mind the model's limitations and use it accordingly. The model may not be suitable for high-stakes or safety-critical applications. Things to try One interesting aspect of dolly-v2-7b is its ability to exhibit instruction-following behavior that is more advanced than its underlying foundation model, Pythia-6.9b. This suggests that fine-tuning on a focused dataset can meaningfully improve a model's capabilities in specific domains, even if it does not outperform more recent state-of-the-art models. Experimenting with different prompts and task types could reveal interesting insights about the model's strengths and weaknesses. Additionally, comparing the performance of dolly-v2-7b to the larger dolly-v2-12b and smaller dolly-v2-3b models could provide useful information about the relationship between model size and instruction-following capabilities.

Read more

Updated 5/17/2024