[](#evt_v4-preview)Evt\_V4-preview
==================================

EVT series is an experimental project for finetune with large datasets on animation style model. Evt\_V4 uses a larger dataset than before, and its cosine similarity with ACertainty reaches 85%. It may behave differently from other models, hope you enjoy it.

[](#-diffusers) Diffusers
-----------------------------

This model can be used just like any other Stable Diffusion model. For more information, please have a look at the [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion).

You can also export the model to [ONNX](https://huggingface.co/docs/diffusers/optimization/onnx), [MPS](https://huggingface.co/docs/diffusers/optimization/mps) and/or FLAX/JAX.

    from diffusers import StableDiffusionPipeline
    import torch
    
    model_id = "haor/Evt_V4-preview"
    branch_name= "main"
    
    pipe = StableDiffusionPipeline.from_pretrained(model_id, revision=branch_name, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")
    
    prompt = "1girl"
    image = pipe(prompt).images[0]
    
    image.save("./1girl.png")
    

[](#examples)Examples
---------------------

**Prompt1:** [![Prompt1](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-05-09.png)](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-05-09.png) [![Prompt1](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-08-53.png)](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-08-53.png)

    1girl in black serafuku standing in a field solo, food, fruit, lemon, bubble, planet, moon, orange \(fruit\), lemon slice, leaf, fish, orange slice, by (tabi:1.25), spot color, looking at viewer, closeup cowboy shot
    Negative prompt: (bad:0.81), (comic:0.81), (cropped:0.81), (error:0.81), (extra:0.81), (low:0.81), (lowres:0.81), (speech:0.81), (worst:0.81), (blush:0.9), 2koma, 3koma, 4koma, collage, lipstick
    Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2285895007, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2
    

**Prompt2:** [![Prompt2](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-11-36.png)](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-11-36.png) [![Prompt2](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-15-39.png)](https://huggingface.co/haor/Evt_V4-preview/resolve/main/samples/image_2023-01-09_17-15-39.png)

    {Masterpiece, Kaname_Madoka, tall and long double tails, well rooted hair, (pink hair), pink eyes, crossed bangs, ojousama, jk, thigh bandages, wrist cuffs, (pink bow: 1.2)}, plain color, sketch, masterpiece, high detail, masterpiece portrait, best quality, ray tracing, {:<, look at the edge}
    Negative prompt: ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)),extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((bad proportions))), ((extra limbs)), (((deformed))), (((disfigured))), cloned face, gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), too many fingers, (((long neck))), (((low quality))), normal quality, blurry, bad feet, text font ui, ((((worst quality)))), anatomical nonsense, (((bad shadow))), unnatural body, liquid body, 3D, 3D game, 3D game scene, 3D character, bad hairs, poorly drawn hairs, fused hairs, big muscles, bad face, extra eyes, furry, pony, mosaic, disappearing calf, disappearing legs, extra digit, fewer digit, fused digit, missing digit, fused feet, poorly drawn eyes, big face, long face, bad eyes, thick lips, obesity, strong girl, beardExcess legs
    Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2468255263, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2
    

[](#training)Training
---------------------

base model:[ACertainty](https://huggingface.co/JosephusCheung/ACertainty)  
Trained for 10 epochs using around 550k anime-style images(pixiv and yandere).  
Resolution: 512  
UCG:0.1  
Use arb:True  
Trainer:[Mikubill/naifu-diffusion](https://github.com/Mikubill/naifu-diffusion)

    arb:
      enabled: true
      debug: false
      base_res: [512, 512]
      max_size: [768, 512]
      divisible: 64
      max_ar_error: 4
      min_dim: 256
      dim_limit: 1024
    

    scheduler:
      name: diffusers.DDIMScheduler
      params:
          beta_end: 0.012
          beta_schedule: "scaled_linear"
          beta_start: 0.00085
          clip_sample: false
          num_train_timesteps: 1000
          set_alpha_to_one: false
          steps_offset: 1
          trained_betas: null
    
    optimizer:
      name: bitsandbytes.optim.AdamW8bit
      params:
        lr: 2e-6
        weight_decay: 5e-2
        eps: 1e-7
    
    lr_scheduler:
      name: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
      warmup: 
        enabled: true
        init_lr: 2e-8
        num_warmup: 50
        strategy: "cos"  
      params:
        T_0: 5
        T_mult: 1
        eta_min: 6e-7
        last_epoch: -1
    

Spent about 300 V100 GPU hours.

[](#license)License
-------------------

This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies:

1.  You can't use the model to deliberately produce nor share illegal or harmful outputs or content
2.  The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
3.  You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)

## Model overview

The `Evt_V4-preview` model is an experimental text-to-image diffusion model created by maintainer [haor](https://aimodels.fyi/creators/huggingFace/haor) that is focused on generating animation-style images. It is part of the EVT series, which aims to fine-tune large datasets to produce diverse artistic styles. Compared to previous EVT models, Evt_V4-preview uses an even larger dataset, resulting in images that have a cosine similarity of 85% with the ACertainty model.

Similar models include [Stable Diffusion v1-4](https://aimodels.fyi/models/huggingFace/stable-diffusion-v1-4-compvis), a general-purpose text-to-image diffusion model, and [Epic Diffusion](https://aimodels.fyi/models/huggingFace/epic-diffusion-johnslegers), a highly customized version of Stable Diffusion aimed at producing high-quality results in a wide range of styles.

## Model inputs and outputs

### Inputs
- **Prompt**: A text description of the desired image, which can include specific details about the content, style, and artistic references.

### Outputs
- **Image**: A generated image that corresponds to the provided text prompt. The model can produce images in a variety of artistic styles, including animation-influenced aesthetics.

## Capabilities

The `Evt_V4-preview` model is capable of generating diverse, artistically-styled images from text prompts. The model excels at producing anime-inspired artwork, as evidenced by the provided samples that feature detailed characters, fantastical environments, and a vibrant color palette.

## What can I use it for?

The `Evt_V4-preview` model is well-suited for artistic and creative applications, such as generating concept art, character designs, and illustrations. It could be used to quickly produce draft images for creative projects or as a tool for ideation and exploration. However, the model's capabilities are not limited to animation-style art, and it may be able to generate images in a range of other artistic genres as well.

## Things to try

One interesting aspect of the `Evt_V4-preview` model is its potential to generate unique animation-inspired styles that differ from traditional anime or manga aesthetics. Experimenting with different prompts that blend various artistic influences, such as combining anime elements with western comic book styles or surreal, dreamlike compositions, could yield intriguing results. Additionally, trying the model with prompts that focus on less common subject matter, such as sci-fi or fantasy settings, might uncover new creative directions for the model's animation-influenced capabilities.