Stabilityai
Rank:Average Model Cost: $0.0000
Number of Runs: 2,792,802
Models by this creator
sd-vae-ft-mse
sd-vae-ft-mse
The sd-vae-ft-mse model is an improved autoencoder that has been fine-tuned for image reconstruction tasks. It is a variant of the kl-f8 autoencoder that has been trained on a combination of the LAION-Aesthetics and LAION-Humans datasets. The model has two versions: ft-EMA and ft-MSE. ft-EMA uses exponential moving average weights and the same loss configuration as the original checkpoint, while ft-MSE uses MSE reconstruction loss with a small LPIPS loss. Both versions have been evaluated on the COCO 2017 and LAION-Aesthetics datasets and have shown improved performance compared to the original autoencoder. The fine-tuned models can be used as drop-in replacements for the existing autoencoder.
$-/run
902.5K
Huggingface
stable-diffusion-2-1
stable-diffusion-2-1
The stable-diffusion-2-1 model is a text-to-image model that generates images from given text inputs. It uses a diffusion process to stabilize the image generation, ensuring high-quality and consistent results. This model is designed to be stable and reliable in producing accurate visual representations of textual descriptions.
$-/run
782.0K
Huggingface
stable-diffusion-2-1-base
stable-diffusion-2-1-base
The stable-diffusion-2-1-base model is a text-to-image model trained on a large dataset. It takes in text input and generates a corresponding image based on that input. It can be used to create visuals or artwork based on textual descriptions.
$-/run
434.1K
Huggingface
stable-diffusion-2-inpainting
stable-diffusion-2-inpainting
The stable-diffusion-2-inpainting model is a diffusion-based text-to-image generation model. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder. The model can be used to generate and modify images based on text prompts. It has been trained on a large-scale dataset and has certain limitations and biases. The model is intended for research purposes only and should not be used for creating harmful or offensive content. It is trained using a specific training procedure and provides different checkpoints for various purposes. The model's environmental impact in terms of CO2 emissions has been estimated as well.
$-/run
184.3K
Huggingface
sd-vae-ft-ema
sd-vae-ft-ema
The sd-vae-ft-ema model is a fine-tuned VAE (Variational Autoencoder) decoder that is intended to be used with the diffusers library. It is a variant of the kl-f8 autoencoder that has been trained on a dataset containing images of humans to improve the reconstruction of faces. The model comes in two versions: ft-EMA and ft-MSE. The ft-EMA version was trained for 313198 steps with EMA (Exponential Moving Average) weights and uses the same loss configuration as the original checkpoint. The ft-MSE version was trained for an additional 280k steps using a different loss that emphasizes MSE (Mean Squared Error) reconstruction. Both versions only fine-tune the decoder part and can be used as a replacement for the existing autoencoder. The model has been evaluated on the COCO 2017 and LAION-Aesthetics datasets, and visualizations of reconstructions on 256x256 images are provided.
$-/run
106.6K
Huggingface
stable-diffusion-x4-upscaler
stable-diffusion-x4-upscaler
The Stable Diffusion x4 Upscaler model is a text-guided latent upscaling diffusion model. It is trained on a subset of the LAION dataset and can be used to generate and modify high-resolution images based on text prompts. The model takes both textual input and a noise level parameter as inputs. It has limitations in achieving perfect photorealism, rendering legible text, and generating complex compositions. It may also exhibit biases and limitations when used with non-English prompts. The model is intended for research purposes only and should not be used to generate harmful or offensive content. It has been trained with a focus on safety and has been filtered for explicit and inappropriate material.
$-/run
103.7K
Huggingface
stable-diffusion-2
stable-diffusion-2
The stable-diffusion-2 model is a text-to-image model that generates images based on textual descriptions. It uses a diffusion process to sample from a generative model and produces visually consistent and high-quality images. The model has been trained on a large dataset of images and their corresponding text descriptions, enabling it to understand and convert textual input into realistic images.
$-/run
102.7K
Huggingface
sdxl-vae
sdxl-vae
sdxl-vae is a model that is designed to perform Variational Autoencoder (VAE) operations. VAE is a generative model used to learn and generate new data. It is a neural network that can encode and decode high-dimensional data, such as images or text, into a lower-dimensional latent space. The sdxl-vae model provides a platform for researchers and developers to train and deploy VAE models for various applications, such as image generation, text generation, and dimensionality reduction.
$-/run
97.9K
Huggingface
sd-x2-latent-upscaler
sd-x2-latent-upscaler
The sd-x2-latent-upscaler is a model that is designed to upscale low-resolution images to a higher resolution. It uses a latent space-based approach to improve the quality of the images by learning the underlying structure and features of the image data. This model can be used in various applications where there is a need to enhance the resolution of images, such as in image processing and computer vision tasks.
$-/run
49.8K
Huggingface
stable-diffusion-2-base
$-/run
29.3K
Huggingface