[](#segformer-b2-fine-tuned-for-clothes-segmentation)Segformer B2 fine-tuned for clothes segmentation
=====================================================================================================

SegFormer model fine-tuned on [ATR dataset](https://github.com/lemondan/HumanParsing-Dataset) for clothes segmentation but can also be used for human segmentation. The dataset on hugging face is called "mattmdjaga/human\_parsing\_dataset".

**NEW** - **[Training code](https://github.com/mattmdjaga/segformer_b2_clothes)**. Right now it only contains the pure code with some comments, but soon I'll add a colab notebook version and a blog post with it to make it more friendly.

    from transformers import SegformerImageProcessor, AutoModelForSemanticSegmentation
    from PIL import Image
    import requests
    import matplotlib.pyplot as plt
    import torch.nn as nn
    
    processor = SegformerImageProcessor.from_pretrained("mattmdjaga/segformer_b2_clothes")
    model = AutoModelForSemanticSegmentation.from_pretrained("mattmdjaga/segformer_b2_clothes")
    
    url = "https://plus.unsplash.com/premium_photo-1673210886161-bfcc40f54d1f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8MXx8cGVyc29uJTIwc3RhbmRpbmd8ZW58MHx8MHx8&w=1000&q=80"
    
    image = Image.open(requests.get(url, stream=True).raw)
    inputs = processor(images=image, return_tensors="pt")
    
    outputs = model(**inputs)
    logits = outputs.logits.cpu()
    
    upsampled_logits = nn.functional.interpolate(
        logits,
        size=image.size[::-1],
        mode="bilinear",
        align_corners=False,
    )
    
    pred_seg = upsampled_logits.argmax(dim=1)[0]
    plt.imshow(pred_seg)
    

Labels: 0: "Background", 1: "Hat", 2: "Hair", 3: "Sunglasses", 4: "Upper-clothes", 5: "Skirt", 6: "Pants", 7: "Dress", 8: "Belt", 9: "Left-shoe", 10: "Right-shoe", 11: "Face", 12: "Left-leg", 13: "Right-leg", 14: "Left-arm", 15: "Right-arm", 16: "Bag", 17: "Scarf"

### [](#evaluation)Evaluation

Label Index

Label Name

Category Accuracy

Category IoU

0

Background

0.99

0.99

1

Hat

0.73

0.68

2

Hair

0.91

0.82

3

Sunglasses

0.73

0.63

4

Upper-clothes

0.87

0.78

5

Skirt

0.76

0.65

6

Pants

0.90

0.84

7

Dress

0.74

0.55

8

Belt

0.35

0.30

9

Left-shoe

0.74

0.58

10

Right-shoe

0.75

0.60

11

Face

0.92

0.85

12

Left-leg

0.90

0.82

13

Right-leg

0.90

0.81

14

Left-arm

0.86

0.74

15

Right-arm

0.82

0.73

16

Bag

0.91

0.84

17

Scarf

0.63

0.29

Overall Evaluation Metrics:

*   Evaluation Loss: 0.15
*   Mean Accuracy: 0.80
*   Mean IoU: 0.69

### [](#license)License

The license for this model can be found [here](https://github.com/NVlabs/SegFormer/blob/master/LICENSE).

### [](#bibtex-entry-and-citation-info)BibTeX entry and citation info

    @article{DBLP:journals/corr/abs-2105-15203,
      author    = {Enze Xie and
                   Wenhai Wang and
                   Zhiding Yu and
                   Anima Anandkumar and
                   Jose M. Alvarez and
                   Ping Luo},
      title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
                   Transformers},
      journal   = {CoRR},
      volume    = {abs/2105.15203},
      year      = {2021},
      url       = {https://arxiv.org/abs/2105.15203},
      eprinttype = {arXiv},
      eprint    = {2105.15203},
      timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
      biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }

## Model Overview

The `segformer_b2_clothes` model is a Segformer B2 model fine-tuned on the ATR dataset for clothes segmentation by maintainer [mattmdjaga](https://aimodels.fyi/creators/huggingFace/mattmdjaga). It can also be used for human segmentation. The model was trained on the "mattmdjaga/human_parsing_dataset" dataset.

The Segformer architecture combines a vision transformer with a segmentation head, allowing the model to learn global and local features for effective image segmentation. This fine-tuned version focuses on accurately segmenting clothes and human parts in images.

## Model Inputs and Outputs

### Inputs
- Images of people or scenes containing people
- The model takes the image as input and returns segmentation logits

### Outputs 
- Segmentation masks identifying various parts of the human body and clothing
- The model outputs a tensor of logits, which can be post-processed to obtain the final segmentation map

## Capabilities

The `segformer_b2_clothes` model is capable of accurately segmenting clothes and human body parts in images. It can identify 18 different classes, including hats, hair, sunglasses, upper-clothes, skirts, pants, dresses, shoes, face, legs, arms, bags, and scarves.

The model achieves high performance, with a mean IoU of 0.69 and mean accuracy of 0.80 on the test set. It particularly excels at segmenting background, pants, face, and legs.

## What Can I Use it For?

This model can be useful for a variety of applications involving human segmentation and clothing analysis, such as:

- Fashion and retail applications, to automatically detect and extract clothing items from images
- Virtual try-on and augmented reality experiences, by accurately segmenting the human body and clothing
- Semantic understanding of scenes with people, for applications like video surveillance or human-computer interaction
- Data annotation and dataset creation, by automating the labeling of human body parts and clothing

The maintainer has also provided the training code, which can be fine-tuned further on custom datasets for specialized use cases.

## Things to Try

One interesting aspect of this model is its ability to segment a wide range of clothing and body parts. Try experimenting with different types of images, such as full-body shots, close-ups, or images with multiple people, to see how the model performs. 

You can also try incorporating the segmentation outputs into downstream applications, such as virtual clothing try-on or fashion recommendation systems. The detailed segmentation masks can provide valuable information about the person's appearance and clothing.

Additionally, the maintainer has mentioned plans to release a colab notebook and a blog post to make the model more user-friendly. Keep an eye out for these resources, as they may provide further insights and guidance on using the `segformer_b2_clothes` model effectively.