Scalable factorization model to discover implicit and explicit similarities across domains

Do, Duc Minh Quan

Scalable factorization model to discover implicit and explicit similarities across domains

Do, Duc Minh Quan

Permalink

Publication Type:: Thesis
Issue Date:: 2018

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (201.53 kB)

Adobe PDF

Download thesisAdobe PDF (2.01 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Do, Duc Minh Quan
dc.date.accessioned	2019-05-08T03:10:32Z
dc.date.available	2019-05-08T03:10:32Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/10453/133197
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	E-commerce businesses increasingly depend on recommendation systems to introduce personalized services and products to their target customers. Achieving accurate recommendations requires a sufficient understanding of user preferences and item characteristics. Given the current innovations on the Web, coupled datasets are abundantly available across domains. An analysis of these datasets can provide a broader knowledge to understand the underlying relationship between users and items. This thorough understanding results in more collaborative filtering power and leads to a higher recommendation accuracy. However, how to effectively use this knowledge for recommendation is still a challenging problem. In this research, we propose to exploit both explicit and implicit similarities extracted from latent factors across domains with matrix tri-factorization. On the coupled dimensions, common parts of the coupled factors across domains are shared among them. At the same time, their domain-specific parts are preserved. We show that such a configuration of both common and domain-specific parts benefits cross-domain recommendations significantly. Moreover, on the non-coupled dimensions, the middle factor of the tri-factorization is proposed to use to match the closely related clusters across datasets and align the matched ones to transfer cross-domain implicit similarities, further improving the recommendation. Furthermore, when dealing with data coupled from different sources, the scalability of the analytical method is another significant concern. We design a distributed factorization model that can scale up as the observed data across domains increases. Our data parallelism, based on Apache Spark, enables the model to have the smallest communication cost. Also, the model is equipped with an optimized solver that converges faster. We demonstrate that these key features stabilize our model’s performance when the data grows. Validated on real-world datasets, our developed model outperforms the existing algorithms regarding recommendation accuracy and scalability. These empirical results illustrate the potential of our research in exploiting both explicit and implicit similarities across domains for improving recommendation performance.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/133197/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.subject	Factorization.
dc.subject	E-commerce.
dc.subject	Data parallelism.
dc.subject	Apache Spark.
dc.subject	Latent variable.
dc.subject	Algorithm.
dc.title	Scalable factorization model to discover implicit and explicit similarities across domains	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

E-commerce businesses increasingly depend on recommendation systems to introduce personalized services and products to their target customers. Achieving accurate recommendations requires a sufficient understanding of user preferences and item characteristics. Given the current innovations on the Web, coupled datasets are abundantly available across domains. An analysis of these datasets can provide a broader knowledge to understand the underlying relationship between users and items. This thorough understanding results in more collaborative filtering power and leads to a higher recommendation accuracy. However, how to effectively use this knowledge for recommendation is still a challenging problem. In this research, we propose to exploit both explicit and implicit similarities extracted from latent factors across domains with matrix tri-factorization. On the coupled dimensions, common parts of the coupled factors across domains are shared among them. At the same time, their domain-specific parts are preserved. We show that such a configuration of both common and domain-specific parts benefits cross-domain recommendations significantly. Moreover, on the non-coupled dimensions, the middle factor of the tri-factorization is proposed to use to match the closely related clusters across datasets and align the matched ones to transfer cross-domain implicit similarities, further improving the recommendation. Furthermore, when dealing with data coupled from different sources, the scalability of the analytical method is another significant concern. We design a distributed factorization model that can scale up as the observed data across domains increases. Our data parallelism, based on Apache Spark, enables the model to have the smallest communication cost. Also, the model is equipped with an optimized solver that converges faster. We demonstrate that these key features stabilize our model’s performance when the data grows. Validated on real-world datasets, our developed model outperforms the existing algorithms regarding recommendation accuracy and scalability. These empirical results illustrate the potential of our research in exploiting both explicit and implicit similarities across domains for improving recommendation performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133197