Scalable Multimodal Factorization for Learning from Very Big Data

Publication Type:
Multimodal Analytics for Next-Generation Big Data Technologies and Applications, 2018
Issue Date:
Filename Description Size
Chapter 10.pdfPublished version827.17 kB
Adobe PDF
Full metadata record
Recent technology advances in data acquisition bring to re- search communities new opportunities as well as new challenges. They enable researchers to acquire multiple modes of information about the real world. This multimodal data can be naturally and e ciently represented by a multi-way structure, so-called tensors, which can be analyzed to ex- tract the underlying core patterns of the observed data. Multiple datasets obtained from di erent acquisition methods and sensors are increasingly available. The increasing availability of multiple modalities, captured in correlated tensors, provides a complete picture of the whole data patterns. Given large-scale datasets, existing distributed methods for joint analy- sis of multi-dimensional data generated from multiple sources decompose them on several computing nodes following Map-Reduce paradigm. How to improve the performance of Map-Reduce based factorization algorithms as observed data gets bigger is still an open problem. This requires an even more e cient solution that not only reduces communication overhead but also optimizes factors faster. In this book chapter, we provide readers knowledge about Tensor Factor- ization and joint analysis of several correlated Tensors. We propose a Scalable Multimodal Factorization (SMF) algorithm for analyzing corre- lated big multimodal data. It has two key features to enable big multimodal data analysis. Firstly, SMF's design, based on Apache Spark, enables it to have the smallest communication cost. Secondly, its optimized solver converges faster. These key advantages reduce factorization's time com- plexity. As a result, SMF's performance is extremely e cient as the data increases. Con rmed by our experiments with 1 billion known entries, SMF outperforms the currently fastest Coupled Tensor Factorization and Tensor Factorization by 17.8 and 3.8 times, respectively. Compellingly, SMF achieves this speed with the highest accuracy.
Please use this identifier to cite or link to this item: