coreml-stable-diffusion-xl-base

Maintainer: apple

Total Score

58

Last updated 5/28/2024

🤯

PropertyValue
Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

The coreml-stable-diffusion-xl-base model is a text-to-image generation model developed by Apple. It is based on the Stable Diffusion XL (SDXL) model, which consists of an ensemble of experts pipeline for latent diffusion. The base model generates initial noisy latents, which are then further processed with a refinement model to produce the final denoised image. Alternatively, the base model can be used on its own in a two-stage pipeline to first generate latents and then apply a specialized high-resolution model for the final image.

Model inputs and outputs

The coreml-stable-diffusion-xl-base model takes text prompts as input and generates corresponding images as output. The text prompts can describe a wide variety of scenes, objects, and concepts, which the model then translates into visual form.

Inputs

  • Text prompt: A natural language description of the desired image, such as "a photo of an astronaut riding a horse on mars".

Outputs

  • Generated image: The model outputs a corresponding image based on the input text prompt.

Capabilities

The coreml-stable-diffusion-xl-base model is capable of generating high-quality, photorealistic images from text prompts. It can create a wide range of scenes, objects, and concepts, and performs significantly better than previous versions of Stable Diffusion. The model can also be used in a two-stage pipeline with a specialized high-resolution refinement model to further improve image quality.

What can I use it for?

The coreml-stable-diffusion-xl-base model is intended for research purposes, such as the generation of artworks, applications in educational or creative tools, and probing the limitations and biases of generative models. The model should not be used to create content that is harmful, offensive, or misrepresents people or events.

Things to try

Experiment with different text prompts to see the variety of images the model can generate. Try combining the base model with the stable-diffusion-xl-refiner-1.0 model to see if the additional refinement step improves the image quality. Explore the model's capabilities and limitations, and consider how it could be applied in creative or educational contexts.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

📊

stable-diffusion-xl-base-1.0

stabilityai

Total Score

5.3K

The stable-diffusion-xl-base-1.0 model is a text-to-image generative AI model developed by Stability AI. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). The model is an ensemble of experts pipeline, where the base model generates latents that are then further processed by a specialized refinement model. Alternatively, the base model can be used on its own to generate latents, which can then be processed using a high-resolution model and the SDEdit technique for image-to-image generation. Similar models include the stable-diffusion-xl-refiner-1.0 and stable-diffusion-xl-refiner-0.9 models, which serve as the refinement modules for the base stable-diffusion-xl-base-1.0 model. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image to generate. Outputs Generated image**: An image generated from the input text prompt. Capabilities The stable-diffusion-xl-base-1.0 model can generate a wide variety of images based on text prompts, ranging from photorealistic scenes to more abstract and stylized imagery. The model performs particularly well on tasks like generating artworks, fantasy scenes, and conceptual designs. However, it struggles with more complex tasks involving compositionality, such as rendering an image of a red cube on top of a blue sphere. What can I use it for? The stable-diffusion-xl-base-1.0 model is intended for research purposes, such as: Generation of artworks and use in design and other artistic processes. Applications in educational or creative tools. Research on generative models and their limitations and biases. Safe deployment of models with the potential to generate harmful content. For commercial use, Stability AI provides a membership program, as detailed on their website. Things to try One interesting aspect of the stable-diffusion-xl-base-1.0 model is its ability to generate high-quality images with relatively few inference steps. By using the specialized refinement model or the SDEdit technique, users can achieve impressive results with a more efficient inference process. Additionally, the model's performance can be further optimized by utilizing techniques like CPU offloading or torch.compile, as mentioned in the provided documentation.

Read more

Updated Invalid Date

🌿

coreml-stable-diffusion-2-base

apple

Total Score

77

The coreml-stable-diffusion-2-base model is a text-to-image generation model developed by Apple. It is a version of the Stable Diffusion v2 model that has been converted for use on Apple Silicon hardware. This model is capable of generating high-quality images from text prompts and can be used with the diffusers library. The model was trained on a filtered subset of the large-scale LAION-5B dataset, with a focus on images with high aesthetic quality and the removal of explicit pornographic content. It uses a Latent Diffusion Model architecture that combines an autoencoder with a diffusion model, along with a fixed, pretrained text encoder (OpenCLIP-ViT/H). There are four variants of the Core ML weights available, with different attention mechanisms and compilation targets. Users can choose the version that best fits their needs, whether that's Swift-based inference or Python-based inference, and the "original" or "split_einsum" attention mechanisms. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image. Outputs Generated image**: The model outputs a high-quality image that corresponds to the input text prompt. Capabilities The coreml-stable-diffusion-2-base model is capable of generating a wide variety of images from text prompts, including scenes, objects, and abstract concepts. It can produce photorealistic images, as well as more stylized or imaginative compositions. The model performs well on a range of prompts, though it may struggle with more complex or compositional tasks. What can I use it for? The coreml-stable-diffusion-2-base model is intended for research purposes only. Possible applications include: Safe deployment of generative models**: Researching techniques to safely deploy models that have the potential to generate harmful content. Understanding model biases**: Probing the limitations and biases of the model to improve future iterations. Creative applications**: Generating artwork, designs, and other creative content. Educational tools**: Developing interactive educational or creative applications. Generative model research**: Furthering the state of the art in text-to-image generation. The model should not be used to create content that is harmful, offensive, or in violation of copyrights. Things to try One interesting aspect of the coreml-stable-diffusion-2-base model is the availability of different attention mechanisms and compilation targets. Users can experiment with the "original" and "split_einsum" attention variants to see how they perform on their specific use cases and hardware setups. Additionally, the model's ability to generate high-quality images at 512x512 resolution makes it a compelling tool for creative applications and research.

Read more

Updated Invalid Date

🎲

coreml-stable-diffusion-v1-5

apple

Total Score

53

The coreml-stable-diffusion-v1-5 model is a version of the Stable Diffusion v1-5 model that has been converted to Core ML format for use on Apple Silicon hardware. It was developed by Hugging Face using Apple's repository, which has an ASCL license. The Stable Diffusion v1-5 model is a latent text-to-image diffusion model capable of generating photo-realistic images from text prompts. This model was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned to improve classifier-free guidance sampling. There are four variants of the Core ML weights available, including different attention implementations and compilation options for Swift and Python inference. Model inputs and outputs Inputs Text prompt**: The text prompt describing the desired image to be generated. Outputs Generated image**: The photo-realistic image generated based on the input text prompt. Capabilities The coreml-stable-diffusion-v1-5 model is capable of generating a wide variety of photo-realistic images from text prompts, ranging from landscapes and scenes to intricate illustrations and creative concepts. Like other Stable Diffusion models, it excels at rendering detailed, imaginative imagery, but may struggle with tasks involving more complex compositionality or generating legible text. What can I use it for? The coreml-stable-diffusion-v1-5 model is intended for research purposes, such as exploring the capabilities and limitations of generative models, generating artworks and creative content, and developing educational or creative tools. However, the model should not be used to intentionally create or disseminate images that could be harmful, disturbing, or offensive, or to impersonate individuals without their consent. Things to try One interesting aspect of the coreml-stable-diffusion-v1-5 model is the availability of different attention implementations and compilation options, which can affect the performance and memory usage of the model on Apple Silicon hardware. Developers may want to experiment with these variants to find the best balance of speed and efficiency for their specific use cases.

Read more

Updated Invalid Date

🗣️

stable-diffusion-xl-base-0.9

stabilityai

Total Score

1.4K

The stable-diffusion-xl-base-0.9 model is a text-to-image generative model developed by Stability AI. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). The model consists of a two-step pipeline for latent diffusion - first generating latents of the desired output size, then refining them using a specialized high-resolution model and a technique called SDEdit (https://arxiv.org/abs/2108.01073). This model builds upon the capabilities of previous Stable Diffusion models, improving image quality and prompt following. Model inputs and outputs Inputs Prompt**: A text description of the desired image to generate. Outputs Image**: A 512x512 pixel image generated based on the input prompt. Capabilities The stable-diffusion-xl-base-0.9 model can generate a wide variety of images based on text prompts, from realistic scenes to fantastical creations. It performs significantly better than previous Stable Diffusion models in terms of image quality and prompt following, as demonstrated by user preference evaluations. The model can be particularly useful for tasks like artwork generation, creative design, and educational applications. What can I use it for? The stable-diffusion-xl-base-0.9 model is intended for research purposes, such as generation of artworks, applications in educational or creative tools, research on generative models, and probing the limitations and biases of the model. While the model is not suitable for generating factual or true representations of people or events, it can be a powerful tool for artistic expression and exploration. For commercial use, please refer to Stability AI's membership options. Things to try One interesting aspect of the stable-diffusion-xl-base-0.9 model is its ability to generate high-quality images using a two-step pipeline. Try experimenting with different combinations of the base model and refinement model to see how the results vary in terms of image quality, detail, and prompt following. You can also explore the model's capabilities in generating specific types of imagery, such as surreal or fantastical scenes, and see how it handles more complex prompts involving compositional elements.

Read more

Updated Invalid Date