DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

2404.06903

Published 4/11/2024 by Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

cs.CV cs.AI

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Abstract

The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary flat (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper presents a novel 3D scene generation system called DreamScene360 that can create panoramic 3D scenes from text descriptions.
The system uses a unique panoramic Gaussian splatting technique to generate 3D content that seamlessly integrates with the 360-degree environment.
DreamScene360 allows for unconstrained text-to-3D scene generation, enabling users to describe complex scenes without limitations.

Plain English Explanation

The DreamScene360 system allows users to describe a scene using regular text, and then automatically generates a corresponding 3D panoramic environment. For example, you could type "A cozy cabin in a snowy forest with a roaring fireplace," and the system would create a 3D scene with a cabin, trees, snow, and a fireplace, all viewable in 360 degrees.

This is made possible through a unique Gaussian splatting technique that smoothly integrates the generated 3D content into the surrounding panoramic view. Previous text-to-3D systems were often limited in the types of scenes they could create, but DreamScene360 allows for much more open-ended and detailed descriptions.

The ability to generate fully 3D panoramic scenes from text could have many applications, such as virtual tourism, game development, and architectural visualization. It allows users to easily create immersive 3D environments without the need for 3D modeling expertise.

Technical Explanation

DreamScene360 utilizes a two-stage pipeline to generate the 3D panoramic scenes. First, a text-to-3D model converts the input text description into a collection of 3D objects and their properties. Then, a panoramic Gaussian splatting module seamlessly integrates these 3D elements into a 360-degree environment.

The text-to-3D model is a large language model that has been fine-tuned on a dataset of text descriptions paired with 3D scene data. This allows it to learn the mapping between language and 3D content. The panoramic Gaussian splatting module uses a depth-aware splatting technique to blend the generated 3D objects into the surrounding environment, creating a cohesive and immersive panoramic scene.

Experiments show that DreamScene360 can generate high-quality 3D scenes from a wide range of text descriptions, outperforming previous methods in both qualitative and quantitative evaluations.

Critical Analysis

The DreamScene360 system represents a significant advancement in the field of text-to-3D scene generation, allowing for more open-ended and detailed scene descriptions. However, the paper does note some limitations, such as the system's ability to handle complex object interactions or dynamic scenes.

Additionally, while the panoramic Gaussian splatting technique is a novel contribution, it is not clear how well it would scale to very large or highly complex scenes. The paper also does not address potential biases or safety concerns that could arise from an unconstrained text-to-3D system.

Further research could explore ways to enhance the system's understanding of spatial relationships, object physics, and scene semantics to enable the generation of even more realistic and coherent 3D environments. Incorporating safety and bias mitigation measures would also be an important area for future work.

Conclusion

The DreamScene360 system represents a significant advance in the field of text-to-3D scene generation, enabling users to create immersive panoramic 3D environments from natural language descriptions. The novel panoramic Gaussian splatting technique allows for seamless integration of generated 3D content into the surrounding 360-degree view.

This technology could have widespread applications in areas such as virtual tourism, game development, and architectural visualization, empowering users to easily create detailed 3D scenes without specialized 3D modeling skills. While the system has some limitations, the overall approach demonstrates the potential of language-driven 3D content generation to transform how we interact with and experience virtual environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting

Yikun Ma, Dandan Zhan, Zhi Jin

Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.

5/10/2024

cs.CV

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang

Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide ($3{DG^2}$) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.

4/16/2024

cs.CV

🌐

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang, Huaping Liu

Automatic text-to-3D generation that combines Score Distillation Sampling (SDS) with the optimization of volume rendering has achieved remarkable progress in synthesizing realistic 3D objects. Yet most existing text-to-3D methods by SDS and volume rendering suffer from inaccurate geometry, e.g., the Janus issue, since it is hard to explicitly integrate 3D priors into implicit 3D representations. Besides, it is usually time-consuming for them to generate elaborate 3D models with rich colors. In response, this paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under 3D point cloud diffusion prior along with the ordinary 2D SDS optimization, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative appearance refinement to enrich texture details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D assets with delicate details and accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. Our code is available at https://github.com/gsgen3d/gsgen

4/3/2024

cs.CV

🛸

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-hang Lee, Pengyuan Zhou

Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .

4/5/2024

cs.CV