Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline

Wei, T; Chen, Z; Huang, Z; Yu, X

Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline

Wei, T Chen, Z Huang, Z Yu, X

Permalink

Publisher:: Association for Computing Machinery (ACM)
Publication Type:: Conference Proceeding
Citation:: MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 1593-1601
Issue Date:: 2024-10-28

Closed Access

	Filename	Description	Size
	3664647.3680599.pdf	Published version	3.48 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wei, T
dc.contributor.author	Chen, Z
dc.contributor.author	Huang, Z
dc.contributor.author	Yu, X https://orcid.org/0000-0002-0269-5649
dc.date.accessioned	2025-02-26T21:29:33Z
dc.date.available	2025-02-26T21:29:33Z
dc.date.issued	2024-10-28
dc.identifier.citation	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 1593-1601
dc.identifier.uri	http://hdl.handle.net/10453/185353
dc.description.abstract	Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works.
dc.language	en
dc.publisher	Association for Computing Machinery (ACM)
dc.relation.ispartof	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
dc.relation.ispartof	Proceedings of the 32nd ACM International Conference on Multimedia
dc.relation.isbasedon	10.1145/3664647.3680599
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline
dc.type	Conference Proceeding
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2025-02-26T21:29:30Z
pubs.publication-status	Published

Abstract:

Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/185353