Efficient learning for fine-grained recognition

Publication Type:
Thesis
Issue Date:
2025
Full metadata record
Fine-grained image classification, which aims to distinguish visually similar subcategories, has gained increasing attention in computer vision. However, compared to standard image classification, fine-grained classification presents unique challenges due to the limited availability of labeled training data and the combination of low inter-class variance with high intra-class variance. This thesis addresses these challenges by exploring efficient learning strategies for deep models in fine-grained recognition. First, we provide a comprehensive review of existing literature, summarizing key advancements in fine-grained recognition. Through an in-depth analysis of the underlying mechanisms of existing methods, we identify key limitations and draw attention to the critical yet under-explored problem of efficient learning—encompassing both data efficiency and model efficiency. Second, few-shot learning has emerged as a promising approach to mitigate data inefficiency. However, existing methods often fail to fully leverage the representation power of unseen categories and the weight generation capacity in feature learning, leading to performance bottlenecks. To address this, we propose a multi-level weight-centric feature learning framework. This approach enhances the dual role of the feature extractor in few-shot learning through two key techniques: (1) a weight-centric training strategy, which improves the prototype-ability of features, enabling the construction of more discriminative decision boundaries with only a few samples, and (2) a multi-level feature incorporation mechanism, which integrates mid-level and relation-level information to enhance transferability for novel categories while preserving classification accuracy for base classes. Extensive experiments on low-shot classification benchmarks demonstrate that our method significantly outperforms existing approaches. Additionally, we tackle the low-resolution classification problem by proposing a dynamic semantic structure distillation framework. Our approach perturbs semantic structures to facilitate knowledge distillation while introducing a decoupled distillation objective to preserve essential part relations. We evaluate our approach on two knowledge distillation asks: high-to-low resolution and large-to-small model distillation. Experimental results confirm its superiority in low-resolution fine-grained classification and its effectiveness in general image classification. In summary, this thesis advances fine-grained image classification by introducing novel techniques for few-shot learning and low-resolution knowledge distillation, both of which contribute to improving data and model efficiency. Our findings provide valuable insights into efficient deep-learning strategies for fine-grained recognition.
Please use this identifier to cite or link to this item: