Towards structured visual percpetion

Li, Liulei

Towards structured visual percpetion

Li, Liulei

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (15.87 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, Liulei
dc.date.accessioned	2026-04-15T08:06:12Z
dc.date.available	2026-04-15T08:06:12Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/194715
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Human visual perception, the foundation for our understanding of the world, is characterized by its ability to interpret scenes as structured, coherent wholes rather than mere collections of isolated objects. Despite deep learning has driven significant progress in computer vision, current visual perception models still fall short in achieving this holistic comprehension. This thesis argues that attaining human-like visual intelligence requires a fundamental shift towards structured visual perception, and presents a body of research effort to develop computational methods that can explicitly model, learn, and reason with visual structures. This dissertation advances this vision through three interconnected and progressively deepening research thrusts. First, I model dynamic visual structures by leveraging temporal correspondences to capture the evolution of scenes and objects over time. Then, the focus is extended to spatial relational structures, developing approaches to uncover the rich connections between objects and their components to build structured representations of scenes. Finally, I investigate general principles for structured perception through the integration of symbolic knowledge, using commonsense or domain-specific constraints to guide both the learning and inference processes of deep models. Collectively, this thesis outlines a comprehensive roadmap towards equipping machines with visual intelligence that more closely emulates the structured, holistic nature of human visual perception.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/194715/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Liulei Li
dc.rights	au.edu.uts.lib/cph
dc.title	Towards structured visual percpetion	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Human visual perception, the foundation for our understanding of the world, is characterized by its ability to interpret scenes as structured, coherent wholes rather than mere collections of isolated objects. Despite deep learning has driven significant progress in computer vision, current visual perception models still fall short in achieving this holistic comprehension. This thesis argues that attaining human-like visual intelligence requires a fundamental shift towards structured visual perception, and presents a body of research effort to develop computational methods that can explicitly model, learn, and reason with visual structures. This dissertation advances this vision through three interconnected and progressively deepening research thrusts. First, I model dynamic visual structures by leveraging temporal correspondences to capture the evolution of scenes and objects over time. Then, the focus is extended to spatial relational structures, developing approaches to uncover the rich connections between objects and their components to build structured representations of scenes. Finally, I investigate general principles for structured perception through the integration of symbolic knowledge, using commonsense or domain-specific constraints to guide both the learning and inference processes of deep models. Collectively, this thesis outlines a comprehensive roadmap towards equipping machines with visual intelligence that more closely emulates the structured, holistic nature of human visual perception.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/194715