The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

Ben-Shabat, Y; Yu, X; Saleh, F; Campbell, D; Rodriguez-Opazo, C; Li, H; Gould, S

The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

Ben-Shabat, Y Yu, X

Saleh, F Campbell, D Rodriguez-Opazo, C Li, H Gould, S

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, 00, pp. 846-858
Issue Date:: 2021-06-14

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (4.8 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Ben-Shabat, Y
dc.contributor.author	Yu, X https://orcid.org/0000-0002-0269-5649
dc.contributor.author	Saleh, F
dc.contributor.author	Campbell, D
dc.contributor.author	Rodriguez-Opazo, C
dc.contributor.author	Li, H
dc.contributor.author	Gould, S
dc.date	2021-01-03
dc.date.accessioned	2022-03-31T00:37:34Z
dc.date.available	2022-03-31T00:37:34Z
dc.date.issued	2021-06-14
dc.identifier.citation	2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, 00, pp. 846-858
dc.identifier.isbn	9780738142661
dc.identifier.issn	2472-6737
dc.identifier.uri	http://hdl.handle.net/10453/155741
dc.description.abstract	The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities, we introduce IKEA ASM-a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human poses. Additionally, we benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
dc.relation.ispartof	2021 IEEE Winter Conference on Applications of Computer Vision
dc.relation.ispartofseries	IEEE Winter Conference on Applications of Computer Vision
dc.relation.isbasedon	10.1109/wacv48630.2021.00089
dc.rights	info:eu-repo/semantics/openAccess
dc.title	The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Waikoloa, HI, USA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2022-03-31T00:37:32Z
pubs.finish-date	2021-01-08
pubs.publication-status	Published
pubs.start-date	2021-01-03
pubs.volume	00

Abstract:

The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities, we introduce IKEA ASM-a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human poses. Additionally, we benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/155741