Laion

Rank:

Average Model Cost: $0.0000

Number of Runs: 13,924,054

Models by this creator

CLIP-ViT-B-16-laion2B-s34B-b88K

CLIP-ViT-B-16-laion2B-s34B-b88K

laion

CLIP-ViT-B-16-laion2B-s34B-b88K is a CLIP ViT-B/16 model trained with the LAION-2B English subset of the LAION-5B dataset. It is intended for research purposes and can be used for zero-shot image classification, image and text retrieval, image classification fine-tuning, linear probe image classification, and image generation guiding and conditioning. The model achieves a 70.2 zero-shot top-1 accuracy on ImageNet-1k. It has not been tested or evaluated on languages other than English. The training dataset is uncurated and contains potentially disturbing content. It is recommended to use the dataset for research purposes only and exercise caution when accessing the links. The model's use for deployed, commercial, surveillance, facial recognition, and non-English language cases is out-of-scope.

Read more

$-/run

10.9M

Huggingface

CLIP-ViT-H-14-laion2B-s32B-b79K

CLIP-ViT-H-14-laion2B-s32B-b79K

CLIP-ViT-H-14-laion2B-s32B-b79K is a zero-shot image classification model. It is built using the vision transformer (ViT) architecture and has been fine-tuned on a large dataset. This model can classify images into a wide range of categories, even for categories that it has not been explicitly trained on. It achieves this by leveraging a contrastive learning framework that aligns images and their textual descriptions.

Read more

$-/run

1.4M

Huggingface

CLIP-ViT-B-32-laion2B-s34B-b79K

CLIP-ViT-B-32-laion2B-s34B-b79K

CLIP-ViT-B-32-laion2B-s34B-b79K is a CLIP ViT-B/32 model trained with the LAION-2B English subset of the LAION-5B dataset. It is intended for research purposes and can be used for zero-shot image classification, image and text retrieval, image classification fine-tuning, linear probe image classification, and image generation guiding and conditioning. The model achieved a zero-shot top-1 accuracy of 66.6% on ImageNet-1k. It was trained using 2 billion samples from the LAION-5B dataset. The dataset is uncurated and may contain disturbing content, so caution is advised when using it. The model's performance has not been evaluated on languages other than English. Proper testing and evaluation is recommended before deploying the model in any use case. The model was trained on the stability.ai cluster and can be cited by acknowledging stability.ai, the LAION-5B paper, the OpenAI CLIP paper, and the OpenCLIP software. Code snippets for getting started with the model are provided.

Read more

$-/run

1.2M

Huggingface

CLIP-ViT-bigG-14-laion2B-39B-b160k

CLIP-ViT-bigG-14-laion2B-39B-b160k

The CLIP-ViT-bigG-14-laion2B-39B-b160k model is a zero-shot image classification model. It uses a combination of Contrastive Language-Image Pretraining (CLIP) and Vision Transformer (ViT) techniques to perform image classification tasks. This model is trained on a large-scale dataset and can understand images based on the accompanying text descriptions without needing additional training for specific labels.

Read more

$-/run

283.3K

Huggingface

CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k

CLIP-ViT-B-32-xlm-roberta-base-laion5B-s13B-b90k

The CLIP ViT-B/32 xlm roberta base model, trained with the LAION-5B dataset, is a model that can be used for zero-shot image classification, image and text retrieval, as well as downstream tasks such as image classification fine-tuning, linear probe image classification, and image generation guiding and conditioning. The model was trained with a batch size of 90k for 13B samples of the LAION-5B dataset. It achieves competitive results on several benchmarks such as imagenet 1k, MSCOCO, and flickr30k. The model demonstrates good performance in multilingual evaluation as well. The code and resources for this model are available on the OpenCLIP GitHub repository.

Read more

$-/run

37.2K

Huggingface

CLIP-ViT-L-14-laion2B-s32B-b82K

CLIP-ViT-L-14-laion2B-s32B-b82K

CLIP-ViT-L-14-laion2B-s32B-b82K is a zero-shot image classification model. It is a variant of the Vision Transformer (ViT) architecture that uses Contrastive Language-Image Pretraining (CLIP) as its training objective. This model can classify images into a wide range of different categories without being specifically trained on those categories. It learns to understand the relationship between images and natural language descriptions, enabling it to generalize to unseen classifications.

Read more

$-/run

33.6K

Huggingface

CLIP-ViT-g-14-laion2B-s34B-b88K

CLIP-ViT-g-14-laion2B-s34B-b88K

CLIP-ViT-g-14-laion2B-s34B-b88K is a model that combines the power of Vision Transformer (ViT) and Contrastive Language-Image Pretraining (CLIP) to perform zero-shot image classification. It is a highly capable model that can understand and classify images based on natural language descriptions without the need for any additional training or fine-tuning. The model has been trained on a large-scale dataset with 14 billion image-text pairs and achieves high accuracy across a wide range of image classification tasks.

Read more

$-/run

23.5K

Huggingface

CLIP-convnext_base_w-laion2B-s13B-b82K-augreg

CLIP-convnext_base_w-laion2B-s13B-b82K-augreg

The CLIP-convnext_base_w-laion2B-s13B-b82K-augreg model is a series of CLIP ConvNeXt-Base models trained on subsets of the LAION-5B dataset using OpenCLIP. These models utilize the timm ConvNeXt-Base model for the image tower and the same text tower as the RN50x4 model in OpenAI CLIP. The models were trained for 13 billion samples and have a zero-shot top-1 accuracy of >= 70.8% on ImageNet-1k. They were trained with increased augmentation and regularization techniques. The model can be used for zero-shot image classification, image and text retrieval, and other downstream tasks such as fine-tuning and linear probe image classification. The models were trained on an uncurated dataset, so caution should be exercised when using the model.

Read more

$-/run

13.7K

Huggingface

Similar creators