Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step towards the Screenless Retailing

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1120-1128
Issue Date:
2021-10-17
Filename Description Size
3474085.3481538.pdfPublished version3.57 MB
Adobe PDF
Full metadata record
Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items' appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item's appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item's appearance has a significant impact on people's purchase behavior. To solve this problem, we extract multi-modal features from three different views, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with a carefully designed multi-modal enhancement module. Experimental results verify the effectiveness of the proposed method.
Please use this identifier to cite or link to this item: