Exploiting and Transferring Generalizable Knowledge for 2D/3D Object Recognition

Publication Type:
Thesis
Issue Date:
2024
Full metadata record
In recent years, deep neural networks have significantly advanced the field of computer vision. However, these advancements have largely relied on the assumption of independent and identical training and test data distribution. In real-world scenarios, violations of this assumption due to covariate shift can trigger performance degradation, thereby highlighting the challenge of out-of-distribution (o.o.d.) generalization. In contrast, humans excel in o.o.d. generalizability based on their acquired generalizable knowledge. However, current deep learning models struggle with biased dataset confounders, hindering their acquisition of such knowledge. Therefore, in this research, experiments are conducted to explore the mechanisms and principles behind the acquisition and exploitation of generalizable knowledge, in order to address the challenge of o.o.d. generalization. Our initial explorations focus on the learnability of generalizable knowledge using 2D transformation estimation tasks. Results demonstrate that utilizing a convolution neural networks that accept image pairs as inputs, along with causal synthetic datasets, enables machines to acquire knowledge about 2D transformations that can be generalized to unrelated semantic domains. Based on this insight, this research introduces InterpretNet, a novel architecture to explicitly exploit generalizable knowledge of 2D transformations, which achieves enhanced test accuracy and explainability in hand-written digit classification. Expanding the scope, we integrate the learning methodology into a contrastive learning paradigm to implicitly exploit the generalizable knowledge. The results demonstrate enhanced model representation capability and classification accuracy in point cloud understanding tasks. Finally, to further validate the potential of disentangling more confounding mechanisms in real-world tasks, we propose PCExpert, a self-supervised representation learning approach to transfer knowledge learned from a pre-trained image-text model to 3D point cloud understanding. Our results show that PCExpert outperforms state-of-the-art models across various tasks with enhanced representation capability, while substantially reducing trainable parameters. In summary, this research investigates knowledge acquisition of target concepts based on causal theory, and introduces InterpretNet and regression loss to explicitly and implicitly exploit the acquired knowledge, respectively. This methodology is further validated through the PCExpert architecture in 3D understanding tasks. The findings in this research offers new insights and methodologies for future studies on o.o.d generalization.
Please use this identifier to cite or link to this item: