Advances in multi-output learning via nearest neighbours

Publication Type:
Issue Date:
Full metadata record
Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the 𝘷𝘰𝘭𝘶𝘮𝘦, 𝘷𝘢𝘳𝘪𝘦𝘵𝘺, 𝘷𝘦𝘭𝘰𝘤𝘪𝘵𝘺 and 𝘷𝘦𝘳𝘢𝘤𝘪𝘵𝘺 of the outputs. 𝘝𝘰𝘭𝘶𝘮𝘦 refers to the explosive growth of output labels that have been generated and it leads to two challenges, large output dimensions and unseen outputs. 𝘝𝘢𝘳𝘪𝘦𝘵𝘺 refers to heterogeneous nature of output labels and it results in complex structures of the output. 𝘝𝘦𝘭𝘰𝘤𝘪𝘵𝘺 refers to speed of output label acquisition including the phenomenon of concept drift and update to the model. The challenge imposed by velocity could be the change of output distributions, where the target outputs are changing over time in unforeseen ways. The nearest neighbours is one of the most classic frameworks in handling multi-output problems. In this thesis, I focus to overcome the challenges encountered by the first three of the 4Vs characteristics of multi-output, using nearest neighbours-based methods. The first work of this thesis deals with the challenges imposed by 𝘷𝘰𝘭𝘶𝘮𝘦 and 𝘷𝘢𝘳𝘪𝘦𝘵𝘺 of multi-output. It focuses on the nearest neighbours-based semantic retrieval and zero-shot learning, which are sub-problems of multi-output learning. I propose a novel concept-based information retrieval system that combines general semantic feature representation and a metric learning model. It achieves better semantic retrieval performance for domain-specific information retrieval problems. Together with the better learned semantic representation, the distance metric can be generalized to unseen output labels and can be applied to zero-shot learning applications. The second work of the thesis handles the challenge of changing of output distribution caused by velocity of multi-output. Nearest neighbours cannot be successfully adapted to deal with this challenge due to the inefficiency issue. This work focuses on improving the nearest neighbours efficiency for multi-output learning problems. An online product quantization (online PQ) model is developed to accommodate to the streaming data with time and memory requirements. A loss bound is derived to guarantee the performance of the model.
Please use this identifier to cite or link to this item: