Kakaobrain
Rank:Average Model Cost: $0.0000
Number of Runs: 22,288
Models by this creator
align-base
align-base
The ALIGN (base model) is a dual-encoder model that aligns visual and text representations using contrastive learning. It consists of an EfficientNet as the vision encoder and BERT as the text encoder. The model is trained on a large-scale noisy dataset called COYO-700M, which contains 700 million image-text pairs. The model is designed for zero-shot image classification and multi-modal embedding retrieval tasks. It can be used with Transformers and is intended for research purposes to explore zero-shot image classification and the potential impact of such models.
$-/run
9.2K
Huggingface
karlo-v1-alpha
karlo-v1-alpha
The karlo-v1-alpha model is a text-to-image model that generates images based on a given textual input. This model can be used to create visual representations of text, enabling applications that require the generation of images from textual descriptions.
$-/run
5.8K
Huggingface
kogpt
kogpt
KoGPT is a language model developed by OpenAI. It is based on the GPT-2 architecture and trained on a large corpus of Korean text. The model utilizes deep learning techniques to generate human-like text responses given a prompt. KoGPT is capable of understanding and generating Korean language, making it useful for various natural language processing tasks such as chatbots, text generation, and question-answering.
$-/run
5.7K
Huggingface
karlo-v1-alpha-image-variations
karlo-v1-alpha-image-variations
Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details only in the small number of denoising steps. Karlo is available in diffusers! Karlo is a text-conditional diffusion model based on unCLIP, composed of prior, decoder, and super-resolution modules. In this repository, we include the improved version of the standard super-resolution module for upscaling 64px to 256px only in 7 reverse steps, as illustrated in the figure below: In specific, the standard SR module trained by DDPM objective upscales 64px to 256px in the first 6 denoising steps based on the respacing technique. Then, the additional fine-tuned SR module trained by VQ-GAN-style loss performs the final reverse step to recover high-frequency details. We observe that this approach is very effective to upscale the low-resolution in a small number of reverse steps. We train all components from scratch on 115M image-text pairs including COYO-100M, CC3M, and CC12M. In the case of Prior and Decoder, we use ViT-L/14 provided by OpenAI’s CLIP repository. Unlike the original implementation of unCLIP, we replace the trainable transformer in the decoder into the text encoder in ViT-L/14 for efficiency. In the case of the SR module, we first train the model using the DDPM objective in 1M steps, followed by additional 234K steps to fine-tune the additional component. The table below summarizes the important statistics of our components: In the checkpoint links, ViT-L-14 is equivalent to the original version, but we include it for convenience. We also remark that ViT-L-14-stats is required to normalize the outputs of the prior module. We quantitatively measure the performance of Karlo-v1.0.alpha in the validation split of CC3M and MS-COCO. The table below presents CLIP-score and FID. To measure FID, we resize the image of the shorter side to 256px, followed by cropping it at the center. We set classifier-free guidance scales for prior and decoder to 4 and 8 in all cases. We observe that our model achieves reasonable performance even with 25 sampling steps of decoder. CC3M MS-COCO For more information, please refer to the upcoming technical report. This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M. For those who are interested in a better version of Karlo trained on more large-scale high-quality datasets, please visit the landing page of our application B^DISCOVER. If you find this repository useful in your research, please cite:
$-/run
1.6K
Huggingface
vit-large-patch16-512
$-/run
33
Huggingface
vit-large-patch16-384
$-/run
17
Huggingface
coyo-align-b7-base
$-/run
0
Huggingface
vit-l16-coyo-labeled-300m-i1k512
vit-l16-coyo-labeled-300m-i1k512
Platform did not provide a description for this model.
$-/run
0
Huggingface
vit-l16-coyo-labeled-300m-i1k384
vit-l16-coyo-labeled-300m-i1k384
Platform did not provide a description for this model.
$-/run
0
Huggingface
vit-l16-coyo-labeled-300m
$-/run
0
Huggingface