DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

2401.16465

YC

2

Reddit

0

Published 5/2/2024 by Kai He, Kaixin Yao, Qixuan Zhang, Lingjie Liu, Jingyi Yu, Lan Xu
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Abstract

Apparel's significant role in human appearance underscores the importance of garment digitalization for digital human creation. Recent advances in 3D content creation are pivotal for digital human creation. Nonetheless, garment generation from text guidance is still nascent. We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation. We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns with text guidance. We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments. By leveraging a large language model, our framework generates CG-friendly garments through natural language interaction. It also facilitates pattern completion and texture editing, streamlining the design process through user-friendly interaction. This framework fosters innovation by allowing creators to freely experiment with designs and incorporate unique elements into their work. With comprehensive evaluations and comparisons with other state-of-the-art methods, our method showcases superior quality and alignment with input prompts. User studies further validate our high-quality rendering results, highlighting its practical utility and potential in production settings. Our project page is https://IHe-KaiI.github.io/DressCode/.

Get summaries of the top AI research delivered straight to your inbox:

Overview

ā€¢ The paper "DressCode: Autoregressively Sewing and Generating Garments from Text Guidance" presents a novel deep learning model for generating detailed 3D garments from textual descriptions.

ā€¢ The model, called DressCode, can autoregressive ally sew and generate complex garments, such as shirts, pants, and dresses, given input text that describes the desired style and attributes.

ā€¢ This research advances the field of text-to-image and text-to-3D generation, enabling more detailed and controllable synthesis of clothing.

Plain English Explanation

ā€¢ The researchers developed an AI system that can create 3D models of clothes based on written descriptions. For example, if you give the system a text description like "a long-sleeved blue denim jacket with front pockets," it can generate a 3D digital model of that jacket.

ā€¢ This is a challenging task because clothing has complex shapes, textures, and folds that are difficult for computers to capture. The key innovation in this paper is the "autoregressive" approach, where the system builds the garment piece-by-piece, similar to how a human might sew a garment.

ā€¢ By generating the garment in this step-by-step way, rather than all at once, the system is able to capture the intricate details and realistic draping of the final 3D model. This could be useful for applications like virtual fashion design, online clothing visualization, and even 3D printing of custom garments.

Technical Explanation

ā€¢ The DressCode model uses a transformer-based architecture to encode the input text description and then autoregressively generate the 3D garment geometry.

ā€¢ The model first encodes the text into a latent representation, then uses this to initialize the generation of a sequence of 2D "sewing patterns." These sewing patterns are gradually stitched together in an autoregressive manner to form the final 3D garment mesh.

ā€¢ Key technical innovations include the use of a garment-specific latent space and novel training objectives to encourage realistic garment geometry and draping.

ā€¢ Experiments show DressCode can generate a diverse range of clothing types, from simple t-shirts to more complex dresses and coats, with high fidelity to the input text prompts.

Critical Analysis

ā€¢ A limitation of the current work is that the generated garments are not fully physically simulated, so the dynamics and motion of the clothing may not be perfectly accurate.

ā€¢ Additionally, the system is trained on a relatively limited dataset of garment types and styles, so its ability to generalize to more diverse or custom clothing designs may be constrained.

ā€¢ Future research could explore ways to integrate physical simulation or leverage large-scale fashion datasets to further improve the realism and versatility of the generated garments.

Conclusion

ā€¢ The DressCode model represents an important step forward in text-to-3D garment generation, enabling more detailed and controllable synthesis of clothing from natural language descriptions.

ā€¢ This technology could have significant implications for virtual fashion design, online shopping, and even custom clothing manufacturing, by allowing users to easily visualize and create desired garments.

ā€¢ While the current system has some limitations, the core autoregressive approach and other technical innovations showcased in this paper point to promising directions for continued progress in this emerging field.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

YC

0

Reddit

0

We present FashionEngine, an interactive 3D human generation and editing system that allows us to design 3D digital humans in a way that aligns with how humans interact with the world, such as natural languages, visual perceptions, and hand-drawing. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks. 2) Multimodality-UV Space encoding the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space, which faithfully aligns the user multimodal inputs with the implicit UV latent space for controllable 3D human editing. The multimodality-UV space is shared across different user inputs, such as texts, images, and sketches, which enables various joint multimodal editing tasks. 3) Multimodality-UV Aligned Sampler learns to sample high-quality and diverse 3D humans from the diffusion prior for multimodal user inputs. Extensive experiments validate FashionEngine's state-of-the-art performance for conditional generation/editing tasks. In addition, we present an interactive user interface for our FashionEngine that enables both conditional and unconditional generation tasks, and editing tasks including pose/view/shape control, text-, image-, and sketch-driven 3D human editing and 3D virtual try-on, in a unified framework. Our project page is at: https://taohuumd.github.io/projects/FashionEngine.

Read more

4/8/2024

Fashion Style Editing with Generative Human Prior

Fashion Style Editing with Generative Human Prior

Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang

YC

0

Reddit

0

Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashion style of human imagery using text descriptions. Specifically, we leverage a generative human prior and achieve fashion style editing by navigating its learned latent space. We first verify that the existing text-driven editing methods fall short for our problem due to their overly simplified guidance signal, and propose two directions to reinforce the guidance: textual augmentation and visual referencing. Combined with our empirical findings on the latent space structure, our Fashion Style Editing framework (FaSE) successfully projects abstract fashion concepts onto human images and introduces exciting new applications to the field.

Read more

4/3/2024

šŸ›ø

TELA: Text to Layer-wise 3D Clothed Human Generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

YC

0

Reddit

0

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: http://jtdong.com/tela_layer/

Read more

4/26/2024

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Abhishek Kumar Singh, Ioannis Patras

YC

0

Reddit

0

The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in revolutionizing fashion design workflows. This research paves the way for more interactive, personalized, and technologically enriched methodologies in fashion design and representation, bridging the gap between creative vision and practical application.

Read more

4/30/2024