MoE-SPNet: A mixture-of-experts scene parsing network

Fu, H; Gong, M; Wang, C; Tao, D

MoE-SPNet: A mixture-of-experts scene parsing network

Fu, H Gong, M Wang, C Tao, D

Permalink

Publication Type:: Journal Article
Citation:: Pattern Recognition, 2018, 84 pp. 226 - 236
Issue Date:: 2018-12-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0031320318302541-main.pdf	Published Version	2.83 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Fu, H	en_US
dc.contributor.author	Gong, M	en_US
dc.contributor.author	Wang, C	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2018-12-01	en_US
dc.identifier.citation	Pattern Recognition, 2018, 84 pp. 226 - 236	en_US
dc.identifier.issn	0031-3203	en_US
dc.identifier.uri	http://hdl.handle.net/10453/132239
dc.description.abstract	© 2018 Elsevier Ltd Scene parsing is an indispensable component in understanding the semantics within a scene. Traditional methods rely on handcrafted local features and probabilistic graphical models to incorporate local and global cues. Recently, methods based on fully convolutional neural networks have achieved new records on scene parsing. An important strategy common to these methods is the aggregation of hierarchical features yielded by a deep convolutional neural network. However, typical algorithms usually aggregate hierarchical convolutional features via concatenation or linear combination, which cannot sufficiently exploit the diversities of contextual information in multi-scale features and the spatial inhomogeneity of a scene. In this paper, we propose a mixture-of-experts scene parsing network (MoE-SPNet) that incorporates a convolutional mixture-of-experts layer to assess the importance of features from different levels and at different spatial locations. In addition, we propose a variant of mixture-of-experts called the adaptive hierarchical feature aggregation (AHFA) mechanism which can be incorporated into existing scene parsing networks that use skip-connections to fuse features layer-wisely. In the proposed networks, different levels of features at each spatial location are adaptively re-weighted according to the local structure and surrounding contextual information before aggregation. We demonstrate the effectiveness of the proposed methods on two scene parsing datasets including PASCAL VOC 2012 and SceneParse150 based on two kinds of baseline models FCN-8s and DeepLab-ASPP.	en_US
dc.relation.ispartof	Pattern Recognition	en_US
dc.relation.isbasedon	10.1016/j.patcog.2018.07.020	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	MoE-SPNet: A mixture-of-experts scene parsing network	en_US
dc.type	Journal Article
utslib.citation.volume	84	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	84	en_US

Abstract:

© 2018 Elsevier Ltd Scene parsing is an indispensable component in understanding the semantics within a scene. Traditional methods rely on handcrafted local features and probabilistic graphical models to incorporate local and global cues. Recently, methods based on fully convolutional neural networks have achieved new records on scene parsing. An important strategy common to these methods is the aggregation of hierarchical features yielded by a deep convolutional neural network. However, typical algorithms usually aggregate hierarchical convolutional features via concatenation or linear combination, which cannot sufficiently exploit the diversities of contextual information in multi-scale features and the spatial inhomogeneity of a scene. In this paper, we propose a mixture-of-experts scene parsing network (MoE-SPNet) that incorporates a convolutional mixture-of-experts layer to assess the importance of features from different levels and at different spatial locations. In addition, we propose a variant of mixture-of-experts called the adaptive hierarchical feature aggregation (AHFA) mechanism which can be incorporated into existing scene parsing networks that use skip-connections to fuse features layer-wisely. In the proposed networks, different levels of features at each spatial location are adaptively re-weighted according to the local structure and surrounding contextual information before aggregation. We demonstrate the effectiveness of the proposed methods on two scene parsing datasets including PASCAL VOC 2012 and SceneParse150 based on two kinds of baseline models FCN-8s and DeepLab-ASPP.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/132239