Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification.

Liang, Y; Zhu, L; Wang, X; Yang, Y

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification.

Liang, Y

Zhu, L

Wang, X Yang, Y

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Trans Neural Netw Learn Syst, 2022, PP, (99)
Issue Date:: 2022-11-21

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 21 Nov 2024

Adobe PDF

Download Accepted versionAdobe PDF (3.97 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, Y https://orcid.org/0000-0002-3429-9798
dc.contributor.author	Zhu, L https://orcid.org/0000-0002-4093-7557
dc.contributor.author	Wang, X
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2023-03-22T04:38:20Z
dc.date.available	2023-03-22T04:38:20Z
dc.date.issued	2022-11-21
dc.identifier.citation	IEEE Trans Neural Netw Learn Syst, 2022, PP, (99)
dc.identifier.issn	2162-237X
dc.identifier.issn	2162-2388
dc.identifier.uri	http://hdl.handle.net/10453/168056
dc.description.abstract	Though significant progress has been achieved on fine-grained visual classification (FGVC), severe overfitting still hinders model generalization. A recent study shows that hard samples in the training set can be easily fit, but most existing FGVC methods fail to classify some hard examples in the test set. The reason is that the model overfits those hard examples in the training set, but does not learn to generalize to unseen examples in the test set. In this article, we propose a moderate hard example modulation (MHEM) strategy to properly modulate the hard examples. MHEM encourages the model to not overfit hard examples and offers better generalization and discrimination. First, we introduce three conditions and formulate a general form of a modulated loss function. Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods. Moreover, we demonstrate that our baseline can be readily incorporated into the existing methods and empower these methods to be more discriminative. Equipped with our strong baseline, we achieve consistent improvements on three typical FGVC datasets, i.e., CUB-200-2011, Stanford Cars, and FGVC-Aircraft. We hope the idea of moderate hard example modulation will inspire future research work toward more effective fine-grained visual recognition.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation	http://purl.org/au-research/grants/arc/DP200100938
dc.relation.ispartof	IEEE Trans Neural Netw Learn Syst
dc.relation.isbasedon	10.1109/TNNLS.2022.3213563
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2024-11-21T00:00:00+1000Z
dc.date.updated	2023-03-22T04:38:13Z
pubs.issue	99
pubs.publication-status	Published online
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Though significant progress has been achieved on fine-grained visual classification (FGVC), severe overfitting still hinders model generalization. A recent study shows that hard samples in the training set can be easily fit, but most existing FGVC methods fail to classify some hard examples in the test set. The reason is that the model overfits those hard examples in the training set, but does not learn to generalize to unseen examples in the test set. In this article, we propose a moderate hard example modulation (MHEM) strategy to properly modulate the hard examples. MHEM encourages the model to not overfit hard examples and offers better generalization and discrimination. First, we introduce three conditions and formulate a general form of a modulated loss function. Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods. Moreover, we demonstrate that our baseline can be readily incorporated into the existing methods and empower these methods to be more discriminative. Equipped with our strong baseline, we achieve consistent improvements on three typical FGVC datasets, i.e., CUB-200-2011, Stanford Cars, and FGVC-Aircraft. We hope the idea of moderate hard example modulation will inspire future research work toward more effective fine-grained visual recognition.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168056