MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects

Liang, Y; Wang, X; Zhu, L; Yang, Y

MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects

Liang, Y Wang, X Zhu, L Yang, Y

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE International Conference on Computer Vision, 2023, 00, pp. 217-227
Issue Date:: 2023-01-01

Closed Access

	Filename	Description	Size
	MAAL_Multimodality-Aware_Autoencoder-based_Affordance_Learning_for_3D_Articulated_Objects.pdf	Published version	1.59 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, Y
dc.contributor.author	Wang, X
dc.contributor.author	Zhu, L
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date	2023-10-01
dc.date.accessioned	2024-03-18T01:08:03Z
dc.date.available	2024-03-18T01:08:03Z
dc.date.issued	2023-01-01
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2023, 00, pp. 217-227
dc.identifier.isbn	9798350307184
dc.identifier.issn	1550-5499
dc.identifier.uri	http://hdl.handle.net/10453/176834
dc.description.abstract	Inferring affordance for 3D articulated objects is a challenging and practical problem. It is a primary problem for applying robots to real-world scenarios. The exploration can be summarized as figuring out where to act and how to act. Correspondingly, the task mainly requires producing actionability scores, action proposals, and success likelihood scores according to the given 3D object information and robotic information. Current works usually directly process multi-modal inputs with early fusion and apply critic networks to produce scores, which leads to insufficient multi-modal learning ability and inefficiently iterative training in multiple stages. This paper proposes a novel Multimodality-Aware Autoencoder-based affordance Learning (MAAL) for the 3D object affordance problem. It is an efficient pipeline, trained in one go, and only requires a few positive samples in training data. More importantly, MAAL contains a MultiModal Energized Encoder (MME) for better multi-modal learning. It comprehensively models all multi-modal inputs from 3D objects and robotic actions. Jointly considering information from multiple modalities, the encoder further learns interactions between robots and objects. MME empowers the better multi-modal learning ability for understanding object affordance. Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/DP200100938
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision
dc.relation.ispartof	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
dc.relation.isbasedon	10.1109/ICCV51070.2023.00027
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects
dc.type	Conference Proceeding
utslib.citation.volume	00
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2024-03-18T01:08:02Z
pubs.finish-date	2023-10-06
pubs.publication-status	Published
pubs.start-date	2023-10-01
pubs.volume	00

Abstract:

Inferring affordance for 3D articulated objects is a challenging and practical problem. It is a primary problem for applying robots to real-world scenarios. The exploration can be summarized as figuring out where to act and how to act. Correspondingly, the task mainly requires producing actionability scores, action proposals, and success likelihood scores according to the given 3D object information and robotic information. Current works usually directly process multi-modal inputs with early fusion and apply critic networks to produce scores, which leads to insufficient multi-modal learning ability and inefficiently iterative training in multiple stages. This paper proposes a novel Multimodality-Aware Autoencoder-based affordance Learning (MAAL) for the 3D object affordance problem. It is an efficient pipeline, trained in one go, and only requires a few positive samples in training data. More importantly, MAAL contains a MultiModal Energized Encoder (MME) for better multi-modal learning. It comprehensively models all multi-modal inputs from 3D objects and robotic actions. Jointly considering information from multiple modalities, the encoder further learns interactions between robots and objects. MME empowers the better multi-modal learning ability for understanding object affordance. Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176834