Mining heterogeneous enterprise data

Publication Type:
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
01front.pdf213.33 kB
Adobe PDF
02whole.pdf1.67 MB
Adobe PDF
Heterogeneity is becoming one of the key characteristics inside enterprise data, because the current nature of globalization and competition stress the importance of leveraging huge amounts of enterprise accumulated data, according to various organizational processes, resources and standards. Effectively deriving meaningful insights from complex large-scaled heterogeneous enterprise data poses an interesting, but critical challenge. The aim of this thesis is to investigate the theoretical foundations of mining heterogeneous enterprise data in light of the above challenges and to develop new algorithms and frameworks that are able to effectively and efficiently consider heterogeneity in four elements of the data: objects, events, context, and domains. Objects describe a variety of business roles and instruments involved in business systems. Object heterogeneity means that object information at both the data and structural level is heterogeneous. The cost-sensitive hybrid neural network (Cs-HNN) proposed leverages parallel network architectures and an algorithm specifically designed for minority classification to generate a robust model for learning heterogeneous objects. Events trace an object’s behaviours or activities. Event heterogeneity reflects the level of variety in business events and is normally expressed in the type and format of features. The approach proposed in this thesis focuses on fleet tracking as a practical example of an application with a high degree of event heterogeneity. Context describes the environment and circumstances surrounding objects and events. Context heterogeneity reflects the degree of diversity in contextual features. The coupled collaborative filtering (CCF) approach proposed in this thesis is able to provide context-aware recommendations by measuring the non-independent and identically distributed (non-IID) relationships across diverse contexts. Domains are the sources of information and reflect the nature of the business or function that has generated the data. The cross-domain deep learning (Cd-DLA) proposed in this thesis provides a potential avenue to overcome the complexity and nonlinearity of heterogeneous domains. Each of the approaches, algorithms, and frameworks for heterogeneous enterprise data mining presented in this thesis outperform the state-of-the-art methods in a range of backgrounds and scenarios, as evidenced by a theoretical analysis, an empirical study, or both. All outcomes derived from this research have been published or accepted for publication, and the follow-up work has also been recognised, which demonstrates scholarly interest in mining heterogeneous enterprise data as a research topic. However, despite this interest, heterogeneous data mining still holds increasing attractive opportunities for further exploration and development in both academia and industry.
Please use this identifier to cite or link to this item: