Generative models for 3D content creation

Publication Type:
Thesis
Issue Date:
2025
Full metadata record
Creating photorealistic and controllable 3D objects has been a long-standing problem in computer vision and graphics, with a wide range of downstream applications in various fields, such as digital human avatars, virtual and augmented reality, immersive telepresence, video games, and movie production. Recent progress in deep learning and generative models has shown great potential in automating 3D content generation, offering scalable and efficient solutions. Despite these advancements, several challenges still remain, including ensuring geometric consistency in generated 3D content, modeling compositional and scalable 3D scenes, handling dynamic objects with large articulated motions, and achieving fine-grained controllability in 3D synthesis. Therefore, this thesis focuses on advancing 3D generation techniques from the following aspects: 1) how to generate 3D content that is both structurally sound and visually realistic? 2) how to ensure the compositionality in synthesized content and the scalability to complex scenes? 3) how to model dynamic 3D objects with large motions and deformations? 4) how can we create 3D contents with specific design conditions? This thesis addresses these challenges by proposing novel generative frameworks for high-fidelity, scalable, and controllable 3D generation. First, we introduce a geometry-constrained generative model to improve multi-view consistency in 3D-aware image synthesis. Then we present a compositional generation approach that enhances scalability to complex multi-object scenes by separately modeling foreground and background elements. Second, we develop a framework for animatable 3D human avatar generation, supporting realistic shape deformations and articulated motions. Last, a novel framework is introduced to create high-fidelity animatable 3D human avatars from text prompts. Extensive experiments, including quantitative and qualitative evaluations, demonstrate the effectiveness of our proposed approaches in generating structurally sound, visually realistic, and highly controllable 3D content. In summary, this thesis contributes to the advancement of AI-driven 3D generation, making automated content creation more accessible and efficient for real-world applications.
Please use this identifier to cite or link to this item: