Image Manipulatinon with Diffusion Autoencoders

## Model overview

`diffae` is an AI model for image manipulation developed by Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, and Supasorn Suwajanakorn. It is based on their research paper "Diffusion Autoencoders: Toward a Meaningful and Decodable Representation" which was presented at CVPR 2022. The model builds on [Stable Diffusion](https://aimodels.fyi/models/replicate/stable-diffusion-stability-ai) by using a diffusion autoencoder to learn a more meaningful and decodable latent representation, enabling controllable image manipulation.

## Model inputs and outputs

`diffae` takes an input image and allows for targeted manipulation along various semantic attributes, such as adding bangs, eyeglasses, or changing hairstyles. The key inputs are the original image, the target manipulation class, and the desired manipulation amplitude. The model then outputs the manipulated image.

### Inputs
- **image**: The input image to be manipulated. The image will be aligned and cropped before processing.
- **target_class**: The semantic attribute to be modified, such as "Bangs", "Eyeglasses", or "Wavy Hair".
- **manipulation_amplitude**: The strength of the desired manipulation, from -0.5 to 0.5.
- **T_step**: The number of steps to use for the image generation process, with a default of 100.
- **T_inv**: The number of steps to use for the inversion process, with a default of 200.

### Outputs
- **Array of manipulated images**: The model outputs an array of manipulated images, with the target attribute adjusted based on the provided inputs.

## Capabilities

`diffae` is capable of performing high-quality, semantic-aware image manipulation on faces. Unlike simple attribute editing, `diffae` can seamlessly blend the target attribute with the original image, producing natural-looking results. The model has been trained on the FFHQ, Bedroom, and Horse datasets, allowing for manipulation of a variety of image types.

## What can I use it for?

`diffae` can be a powerful tool for content creation, photo editing, and creative exploration. By enabling targeted modifications to facial attributes, the model can be used to experiment with different styles, hairstyles, or accessories on portraits and selfies. This can be useful for applications like virtual try-on, photo editing, and social media content generation.

## Things to try

One interesting aspect of `diffae` is its ability to perform latent-space manipulation, where changes are made directly to the model's internal representation of the image. This can lead to more nuanced and seamless modifications compared to traditional image editing techniques. Users can experiment with different manipulation amplitudes to find the right balance of change and preservation of the original image.