Get a weekly rundown of the latest AI models and research... subscribe!

Owlvit Base Patch32



The owlvit-base-patch32 model is a zero-shot / open vocabulary object detection tool. By inputting an image URL and a query listing of items to identify, the model outputs the coordinates of the queried items within the image, along with the confidence level of each identification. The model can also provide a visualisation of the identified objects within the image, overlaying bounding boxes around the detected objects. The model accomplishes this without being previously trained on the specific objects, operating on a zero-shot detection basis.

Use cases

The AI model, owlvit-base-patch32, is designed for zero-shot or open vocabulary object detection, an advanced image-to-text translation tool that could prove to be revolutionary in multiple fields. One of the key use cases is in the realm of surveillance and security, wherein it could be used to interpret surveillance images and thereby facilitate seamless object detection in real-time, potentially improving response times. This model could also be utilized in the field of healthcare, more specifically in diagnostics, where it can be used to identify and label anomalies in medical images such as MRI scans, X-rays, and CT scans, making diagnosis processes more efficient. Additional use cases could be in the e-commerce sector where this model could be deployed to automatically describe product images, thus improving online shopping experiences for visually impaired customers. It might facilitate the automated tagging and categorization of a large number of product images, improving search efficiency. In terms of products, this model could be integrated into surveillance software systems, healthcare diagnostic tools, and mobile apps or web platforms for e-commerce. It could also be used to create smart cameras for wildlife monitoring that can identify and label various animal species. Even in recreational activities, like bird-watching or stargazing, it could assist users in identifying different species or celestial bodies.


Cost per run
Avg run time

Creator Models

Inst Inpaint$?251
T2i Adapter Sdxl Openpose$?4,645
Lightweight Openpose$?1,442

Similar Models

No similar models found

Try it!

You can use this area to play around with demo applications that incorporate the Owlvit Base Patch32 model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.


Summary of this model and related resources.

Model NameOwlvit Base Patch32
Zero-shot / open vocabulary object detection
Model LinkView on Replicate
API SpecView on Replicate
Github LinkView on Github
Paper LinkView on Arxiv


How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

Model Rank
Creator Rank


How much does it cost to run this model? How long, on average, does it take to complete a run?

Cost per Run$-
Prediction Hardware-
Average Completion Time-