Efficient learning for fine-grained recognition

Liang, Mingjiang

Efficient learning for fine-grained recognition

Liang, Mingjiang

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (7.21 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, Mingjiang
dc.date.accessioned	2025-11-10T20:46:56Z
dc.date.available	2025-11-10T20:46:56Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/190643
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Fine-grained image classification, which aims to distinguish visually similar subcategories, has gained increasing attention in computer vision. However, compared to standard image classification, fine-grained classification presents unique challenges due to the limited availability of labeled training data and the combination of low inter-class variance with high intra-class variance. This thesis addresses these challenges by exploring efficient learning strategies for deep models in fine-grained recognition. First, we provide a comprehensive review of existing literature, summarizing key advancements in fine-grained recognition. Through an in-depth analysis of the underlying mechanisms of existing methods, we identify key limitations and draw attention to the critical yet under-explored problem of efficient learning—encompassing both data efficiency and model efficiency. Second, few-shot learning has emerged as a promising approach to mitigate data inefficiency. However, existing methods often fail to fully leverage the representation power of unseen categories and the weight generation capacity in feature learning, leading to performance bottlenecks. To address this, we propose a multi-level weight-centric feature learning framework. This approach enhances the dual role of the feature extractor in few-shot learning through two key techniques: (1) a weight-centric training strategy, which improves the prototype-ability of features, enabling the construction of more discriminative decision boundaries with only a few samples, and (2) a multi-level feature incorporation mechanism, which integrates mid-level and relation-level information to enhance transferability for novel categories while preserving classification accuracy for base classes. Extensive experiments on low-shot classification benchmarks demonstrate that our method significantly outperforms existing approaches. Additionally, we tackle the low-resolution classification problem by proposing a dynamic semantic structure distillation framework. Our approach perturbs semantic structures to facilitate knowledge distillation while introducing a decoupled distillation objective to preserve essential part relations. We evaluate our approach on two knowledge distillation asks: high-to-low resolution and large-to-small model distillation. Experimental results confirm its superiority in low-resolution fine-grained classification and its effectiveness in general image classification. In summary, this thesis advances fine-grained image classification by introducing novel techniques for few-shot learning and low-resolution knowledge distillation, both of which contribute to improving data and model efficiency. Our findings provide valuable insights into efficient deep-learning strategies for fine-grained recognition.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/190643/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Mingjiang Liang
dc.rights	au.edu.uts.lib/cph
dc.title	Efficient learning for fine-grained recognition	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Fine-grained image classification, which aims to distinguish visually similar subcategories, has gained increasing attention in computer vision. However, compared to standard image classification, fine-grained classification presents unique challenges due to the limited availability of labeled training data and the combination of low inter-class variance with high intra-class variance. This thesis addresses these challenges by exploring efficient learning strategies for deep models in fine-grained recognition. First, we provide a comprehensive review of existing literature, summarizing key advancements in fine-grained recognition. Through an in-depth analysis of the underlying mechanisms of existing methods, we identify key limitations and draw attention to the critical yet under-explored problem of efficient learning—encompassing both data efficiency and model efficiency. Second, few-shot learning has emerged as a promising approach to mitigate data inefficiency. However, existing methods often fail to fully leverage the representation power of unseen categories and the weight generation capacity in feature learning, leading to performance bottlenecks. To address this, we propose a multi-level weight-centric feature learning framework. This approach enhances the dual role of the feature extractor in few-shot learning through two key techniques: (1) a weight-centric training strategy, which improves the prototype-ability of features, enabling the construction of more discriminative decision boundaries with only a few samples, and (2) a multi-level feature incorporation mechanism, which integrates mid-level and relation-level information to enhance transferability for novel categories while preserving classification accuracy for base classes. Extensive experiments on low-shot classification benchmarks demonstrate that our method significantly outperforms existing approaches. Additionally, we tackle the low-resolution classification problem by proposing a dynamic semantic structure distillation framework. Our approach perturbs semantic structures to facilitate knowledge distillation while introducing a decoupled distillation objective to preserve essential part relations. We evaluate our approach on two knowledge distillation asks: high-to-low resolution and large-to-small model distillation. Experimental results confirm its superiority in low-resolution fine-grained classification and its effectiveness in general image classification. In summary, this thesis advances fine-grained image classification by introducing novel techniques for few-shot learning and low-resolution knowledge distillation, both of which contribute to improving data and model efficiency. Our findings provide valuable insights into efficient deep-learning strategies for fine-grained recognition.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/190643