Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Llava V1.6 Vicuna 7b

yorickvp

AI model preview image
The LLaVA v1.6: Large Language and Vision Assistant (Vicuna-7B) is a text-to-text model that interprets and describes images given in input. The user inputs an image URL and a prompt, such as "What is unusual about this image?" The model then generates a description of the image based on the prompt. The model's output is a string of text providing a detailed description and interpretation of the image, potentially identifying unusual or noteworthy details. In this case, it describes a potentially dangerous and unusual situation of a person ironing on a clothes iron atop a moving vehicle.
Image-to-Text

Pricing

Cost per run
$-
USD
Avg run time
-
Seconds
Hardware
-
Prediction

Creator Models

ModelCostRuns
Temporalnet Sdxl$?117
Llava V1.6 Vicuna 13b$?175,659
Jina Embeddings V2 Base En$?38
Jina Embeddings V2 Small En$?32
Llava V1.6 34b$?223,157

Similar Models

Try it!

You can use this area to play around with demo applications that incorporate the Llava V1.6 Vicuna 7b model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.

Overview

Summary of this model and related resources.

PropertyValue
Creatoryorickvp
Model NameLlava V1.6 Vicuna 7b
Description
LLaVA v1.6: Large Language and Vision Assistant (Vicuna-7B)
TagsImage-to-Text
Model LinkView on Replicate
API SpecView on Replicate
Github LinkView on Github
Paper LinkNo paper link provided

Popularity

How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

PropertyValue
Runs5,168
Model Rank
Creator Rank

Cost

How much does it cost to run this model? How long, on average, does it take to complete a run?

PropertyValue
Cost per Run$-
Prediction Hardware-
Average Completion Time-