Meta Parsing Networks: Towards Generalized Few-Shot Scene Parsing with Adaptive Metric Learning

Li, P; Wei, Y; Yang, Y

Meta Parsing Networks: Towards Generalized Few-Shot Scene Parsing with Adaptive Metric Learning

Li, P

Wei, Y Yang, Y

Permalink

Publisher:: Association for Computing Machinery
Publication Type:: Conference Proceeding
Citation:: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 64–72-64–72
Issue Date:: 2020

Closed Access

	Filename	Description	Size
	3394171.3413944.pdf	Published version	8.37 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, P https://orcid.org/0000-0003-1809-2137
dc.contributor.author	Wei, Y
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2021-04-11T06:30:00Z
dc.date.available	2021-04-11T06:30:00Z
dc.date.issued	2020
dc.identifier.citation	Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 64–72-64–72
dc.identifier.isbn	9781450379885
dc.identifier.uri	http://hdl.handle.net/10453/147988
dc.description.abstract	Recent progress in few-shot segmentation usually aims at performing novel object segmentation using a few annotated examples as guidance. In this work, we advance this few-shot segmentation paradigm towards a more challenging yet general scenario, i.e., Generalized Few-shot Scene Parsing (GFSP). In this task, we take a fully annotated image as guidance to segment all pixels in a query image. Our mission is to study a generalizable and robust segmentation network from the meta-learning perspective so that both seen and unseen categories can be correctly recognized. Different from previous practices, this task performs segmentation on a joint label space consisting of both previously seen and novel categories. Moreover, pixels from these multiple categories need to be simultaneously taken into account, which is actually not well explored before. Accordingly, we present Meta Parsing Networks (MPNet) to better exploit the guidance information in the support set. Our MPNet contains two basic modules, i.e., the Adaptive Deep Metric Learning (ADML) module and the Contrastive Inter-class Distraction (CID) module. Specially, the ADML takes the annotated pixels from the support image as the guidance and adaptively produces high-quality prototypes for learning a deep comparison metric. In addition, MPNet further introduces the CID module learning to enlarge the feature discrepancy of different categories in the embedding space, leading the MPNet to generate more discriminative feature embeddings. We conduct experiments on two newly constructed benchmarks, i.e., GFSP-Cityscapes and GFSP-Pascal-Context. Extensive ablation studies well demonstrate the effectiveness and generalization ability of our MPNet.
dc.language	en
dc.publisher	Association for Computing Machinery
dc.relation.ispartof	Proceedings of the 28th ACM International Conference on Multimedia
dc.relation.ispartof	MM '20: The 28th ACM International Conference on Multimedia
dc.relation.ispartofseries	MM ’20
dc.relation.isbasedon	10.1145/3394171.3413944
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Meta Parsing Networks: Towards Generalized Few-Shot Scene Parsing with Adaptive Metric Learning
dc.type	Conference Proceeding
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2021-04-11T06:29:47Z
pubs.place-of-publication	New York, NY, USA
pubs.publication-status	Published
dc.location	New York, NY, USA

Abstract:

Recent progress in few-shot segmentation usually aims at performing novel object segmentation using a few annotated examples as guidance. In this work, we advance this few-shot segmentation paradigm towards a more challenging yet general scenario, i.e., Generalized Few-shot Scene Parsing (GFSP). In this task, we take a fully annotated image as guidance to segment all pixels in a query image. Our mission is to study a generalizable and robust segmentation network from the meta-learning perspective so that both seen and unseen categories can be correctly recognized. Different from previous practices, this task performs segmentation on a joint label space consisting of both previously seen and novel categories. Moreover, pixels from these multiple categories need to be simultaneously taken into account, which is actually not well explored before. Accordingly, we present Meta Parsing Networks (MPNet) to better exploit the guidance information in the support set. Our MPNet contains two basic modules, i.e., the Adaptive Deep Metric Learning (ADML) module and the Contrastive Inter-class Distraction (CID) module. Specially, the ADML takes the annotated pixels from the support image as the guidance and adaptively produces high-quality prototypes for learning a deep comparison metric. In addition, MPNet further introduces the CID module learning to enlarge the feature discrepancy of different categories in the embedding space, leading the MPNet to generate more discriminative feature embeddings. We conduct experiments on two newly constructed benchmarks, i.e., GFSP-Cityscapes and GFSP-Pascal-Context. Extensive ablation studies well demonstrate the effectiveness and generalization ability of our MPNet.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/147988