Maintainer: microsoft

Total Score


Last updated 5/28/2024


Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Get summaries of the top AI models delivered straight to your inbox:

Model overview

llava-med-7b-delta is a large language and vision assistant model focused on the biomedical domain. It was developed by researchers at Microsoft and is based on the LLaMA model. The model was initialized with the general-domain LLaVA and then continuously trained in a curriculum learning fashion, first on biomedical concept alignment and then on full-blown instruction tuning.

This model is similar to other medical-focused language models like MedAlpaca 13b and MedAlpaca 7b, which are also fine-tuned on medical datasets to improve performance on tasks like question answering and medical dialogue. However, llava-med-7b-delta goes beyond text-only capabilities by incorporating visual understanding through its connection to the general-domain LLaVA model.

The model was also trained on the PMC-15M dataset, a large-scale parallel image-text dataset for biomedical vision-language processing, which is the same dataset used to train the BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 model.

Model inputs and outputs


  • Images: The model can accept images as input, enabling it to perform visual reasoning and understanding tasks in the biomedical domain.
  • Text: The model can also accept text input, allowing it to engage in language-based interactions and tasks.


  • Text generation: The model can generate relevant and coherent text in response to prompts, leveraging its biomedical knowledge.
  • Multimodal understanding: The model can combine its understanding of both images and text to perform tasks like visual question answering or image captioning.


llava-med-7b-delta exhibits strong performance on a variety of biomedical tasks, particularly those that require both language and visual understanding. For example, the model can accurately describe the contents of a medical image, answer questions about a radiological scan, or provide step-by-step instructions for a medical procedure.

The model's visual understanding capabilities are a key strength, allowing it to excel at tasks like interpreting medical images and diagrams. This sets it apart from language-only models that may struggle with visual inputs.

What can I use it for?

Researchers and developers working on biomedical applications could use llava-med-7b-delta for a variety of projects, such as:

  • Medical image analysis: The model could be used to build tools that analyze medical images, such as X-rays or MRI scans, and provide insights or recommendations.
  • Biomedical question answering: The model could be integrated into chatbots or virtual assistants to answer questions about medical conditions, treatments, or procedures.
  • Multimodal medical education: The model could be used to create interactive learning experiences that combine text, images, and video to teach medical concepts.

However, it's important to note that the model should only be used for research purposes and not for any clinical or deployed applications, as it has not been thoroughly tested for real-world use.

Things to try

One interesting aspect of llava-med-7b-delta is its ability to combine visual and language understanding to tackle complex biomedical tasks. For example, you could try prompting the model with a medical image and asking it to provide a step-by-step explanation of the procedure or condition depicted. This would showcase the model's capacity to integrate its knowledge of both visual and textual information.

Another avenue to explore would be using the model for creative or exploratory tasks, such as generating medical illustrations or diagrams based on textual descriptions. This could inspire new ways of visualizing and communicating biomedical concepts.

Ultimately, the versatility of llava-med-7b-delta makes it a valuable tool for researchers and developers working to advance the state of the art in biomedical artificial intelligence.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models



Total Score


The LLaVA-13b-delta-v0 model is an open-source chatbot trained by fine-tuning the LLaMA language model and Vicuna on GPT-generated multimodal instruction-following data. It is an autoregressive language model based on the transformer architecture. The model was developed by liuhaotian, who has also created similar models such as llava-v1.6-mistral-7b and llava-med-7b-delta. Model Inputs and Outputs The LLaVA-13b-delta-v0 model is a language model that can generate human-like text given a prompt. It also has multimodal capabilities, allowing it to generate text based on both textual and visual inputs. Inputs Text prompts**: The model can accept text prompts to generate relevant responses. Images**: The model can also accept images as part of the input, allowing it to generate text describing or relating to the provided image. Outputs Textual responses**: The primary output of the model is human-like textual responses to the provided prompts or image-text combinations. Capabilities The LLaVA-13b-delta-v0 model has been trained to engage in open-ended conversation, answer questions, and describe images. It demonstrates strong language understanding and generation capabilities, as well as the ability to reason about and describe visual information. The model can be particularly useful for research on large multimodal models and chatbots. What Can I Use It For? The primary intended use of the LLaVA-13b-delta-v0 model is for research on large multimodal models and chatbots. Researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence may find this model useful for exploring various multimodal applications and advancing the state of the art in these fields. Things to Try Some interesting things to try with the LLaVA-13b-delta-v0 model include: Evaluating the model's ability to understand and describe complex visual scenes by providing it with a diverse set of images. Exploring the model's language understanding and generation capabilities by engaging it in open-ended conversations on a variety of topics. Investigating the model's reasoning abilities by asking it to answer questions that require combining information from both text and visual inputs. Experimenting with different prompting strategies to see how the model's responses can be tailored for specific use cases or applications.

Read more

Updated Invalid Date




Total Score


Llama3-OpenBioLLM-8B is an advanced open-source language model designed specifically for the biomedical domain. Developed by Saama AI Labs, this model leverages cutting-edge techniques to achieve state-of-the-art performance on a wide range of biomedical tasks. It builds upon the powerful foundations of the Meta-Llama-3-8B model, incorporating the DPO dataset and fine-tuning recipe along with a custom diverse medical instruction dataset. Compared to Llama3-OpenBioLLM-70B, the 8B version has a smaller parameter count but still outperforms other open-source biomedical language models of similar scale. It has also demonstrated better results compared to larger proprietary & open-source models like GPT-3.5 on biomedical benchmarks. Model inputs and outputs Inputs Text data from the biomedical domain, such as research papers, clinical notes, and medical literature. Outputs Generated text responses to biomedical queries, questions, and prompts. Summarization of complex medical information. Extraction of biomedical entities, such as diseases, symptoms, and treatments. Classification of medical documents and data. Capabilities Llama3-OpenBioLLM-8B can efficiently analyze and summarize clinical notes, extract key medical information, answer a wide range of biomedical questions, and perform advanced clinical entity recognition. The model's strong performance on domain-specific tasks, such as Medical Genetics and PubMedQA, highlights its ability to effectively capture and apply biomedical knowledge. What can I use it for? Llama3-OpenBioLLM-8B can be a valuable tool for researchers, clinicians, and developers working in the healthcare and life sciences fields. It can be used to accelerate medical research, improve clinical decision-making, and enhance access to biomedical knowledge. Some potential use cases include: Summarizing complex medical records and literature Answering medical queries and providing information to patients or healthcare professionals Extracting relevant biomedical entities from text Classifying medical documents and data Generating medical reports and content Things to try One interesting aspect of Llama3-OpenBioLLM-8B is its ability to leverage its deep understanding of medical terminology and context to accurately annotate and categorize clinical entities. This capability can support various downstream applications, such as clinical decision support, pharmacovigilance, and medical research. You could try experimenting with the model's entity recognition abilities on your own biomedical text data to see how it performs. Another interesting feature is the model's strong performance on biomedical question-answering tasks, such as PubMedQA. You could try prompting the model with a range of medical questions and see how it responds, paying attention to the level of detail and accuracy in the answers.

Read more

Updated Invalid Date




Total Score


Llama3-OpenBioLLM-70B is an advanced open-source biomedical large language model developed by Saama AI Labs. It builds upon the powerful foundations of the Meta-Llama-3-70B-Instruct model, incorporating novel training techniques like Direct Preference Optimization to achieve state-of-the-art performance on a wide range of biomedical tasks. Compared to other open-source models like Meditron-70B and proprietary models like GPT-4, it demonstrates superior results on biomedical benchmarks. Model inputs and outputs Inputs Llama3-OpenBioLLM-70B is a text-to-text model, taking in textual inputs only. Outputs The model generates fluent and coherent text responses, suitable for a variety of natural language processing tasks in the biomedical domain. Capabilities Llama3-OpenBioLLM-70B is designed for specialized performance on biomedical tasks. It excels at understanding and generating domain-specific language, allowing for accurate responses to queries about medical conditions, treatments, and research. The model's advanced training techniques enable it to outperform other open-source and proprietary language models on benchmarks evaluating tasks like medical exam question answering, disease information retrieval, and supporting differential diagnosis. What can I use it for? Llama3-OpenBioLLM-70B is well-suited for a variety of biomedical applications, such as powering virtual assistants to enhance clinical decision-making, providing general health information to the public, and supporting research efforts by automating tasks like literature review and hypothesis generation. Its strong performance on biomedical benchmarks suggests it could be a valuable tool for developers and researchers working in the life sciences and healthcare fields. Things to try Developers can explore using Llama3-OpenBioLLM-70B as a foundation for building custom biomedical natural language processing applications. The model's specialized knowledge and capabilities could be leveraged to create chatbots, question-answering systems, and text generation tools tailored to the needs of the medical and life sciences communities. Additionally, the model's performance could be further fine-tuned on domain-specific datasets to optimize it for specific biomedical use cases.

Read more

Updated Invalid Date




Total Score


The llava-v1.6-34b is an open-source chatbot developed by liuhaotian that is trained by fine-tuning a large language model (LLM) on multimodal instruction-following data. It is based on the transformer architecture and uses the NousResearch/Nous-Hermes-2-Yi-34B as its base LLM. The model is part of the LLaVA family, which includes similar versions like llava-v1.5-13b, llava-v1.5-7b, llava-v1.6-mistral-7b, and LLaVA-13b-delta-v0. These models differ in their base LLM, training dataset, and model size. Model inputs and outputs Inputs The model accepts natural language instructions and prompts as input. It can also accept image data as input for multimodal tasks. Outputs The model generates human-like responses in natural language. For multimodal tasks, the model can generate relevant images as output. Capabilities The llava-v1.6-34b model has been trained to engage in a wide range of tasks, including natural language processing, computer vision, and multimodal reasoning. It has shown strong performance on tasks such as answering complex questions, following detailed instructions, and generating relevant images. What can I use it for? The primary use of the llava-v1.6-34b model is for research on large multimodal models and chatbots. It can be particularly useful for researchers and hobbyists working in computer vision, natural language processing, machine learning, and artificial intelligence. Some potential use cases for the model include: Building chatbots and virtual assistants with multimodal capabilities Developing visual question answering systems Exploring new techniques for instruction-following in language models Advancing research on multimodal reasoning and understanding Things to try One interesting aspect of the llava-v1.6-34b model is its ability to combine text and image data to perform complex tasks. Researchers could experiment with using the model to generate images based on textual descriptions, or to answer questions that require both visual and linguistic understanding. Another area to explore is the model's performance on tasks that require strong reasoning and problem-solving skills, such as scientific question answering or task-oriented dialogue. By probing the model's capabilities in these areas, researchers can gain valuable insights into the strengths and limitations of large multimodal language models.

Read more

Updated Invalid Date