Llava 13b



llava-13b is a Text-to-Text model that provides interpretations or responses based on a given image and associated text prompt. This model, which has GPT-4 level capabilities, analyses the image URL provided along with the text prompt to generate a comprehensive response. The model's input schema requires the image URL, a text prompt, the maximum number of tokens, and the temperature for response variability. The output is a statement or response relevant to the prompt based on the image.

Use cases

The llava-13b AI model, which combines visual interpretation with language comprehension, can be employed across a wide range of applications. For instance, in the field of surveillance and safety systems, this model can interpret video footage and provide text-based reports on any suspicious activities or potential hazards. Its ability to comprehend visual instructions and turn it into articulated language can also be used to automate customer service in the e-commerce industry, where the model can ascertain objects from images sent by customers and provide precise responses. Additionally, through its GPT-4 level capabilities, the model can be used to create interactive educational programs, capable of interpreting diagrams, images, or visual content and providing comprehensible explanations. Furthermore, it can be worked into travel applications, determining activities permissible in an image-based location, like swimming in a specific body of water. The AI's potential ability to visually interpret images and translate that understanding into coherent text could also be utilized across industries ranging from healthcare, aiding in the interpretation of medical images to advertising, defining images to create more dynamic, responsive ads.



Cost per run
Avg run time

Creator Models

Temporalnet Sdxl$?79
Jina Embeddings V2 Base En$?19
Jina Embeddings V2 Small En$?12
Llava 13b$?191,009

Similar Models

Try it!

You can use this area to play around with demo applications that incorporate the Llava 13b model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.


Summary of this model and related resources.

Model NameLlava 13b

Visual instruction tuning towards large language and vision models with GPT...

Model LinkView on Replicate
API SpecView on Replicate
Github LinkView on Github
Paper LinkView on Arxiv


How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

Model Rank
Creator Rank


How much does it cost to run this model? How long, on average, does it take to complete a run?

Cost per Run$-
Prediction Hardware-
Average Completion Time-