Detection of transmissible service failure in distributed service-based systems

Publication Type:
Journal Article
Citation:
Journal of Parallel and Distributed Computing, 2018, 119 pp. 36 - 49
Issue Date:
2018-09-01
Filename Description Size
1-s2.0-S0743731518301783-main.pdfPublished Version1.63 MB
Adobe PDF
Full metadata record
© 2018 Elsevier Inc. Detection of service failure, also known as service monitoring, is an important research problem in distributed service-based systems (SBSs). Failure of services is a transmissible threat in distributed SBSs, because services in distributed SBSs may have dependent relationships among them and thus the failure of one service may cause the failure of other services. Therefore, such transmissible service failure has to be detected in a timely manner whereas the corresponding resource consumption should be as little as possible. Most of the existing service monitoring approaches are centralised which suffer the potential of single point of failure and are not suitable in large scale distributed SBSs. Moreover, these centralised approaches are designed only in single-tenant SBSs. Nowadays, the scale of distributed SBSs is extremely large, i.e., including a large number of services and clients. Thus, it is essential for monitoring approaches to work well in large scale distributed SBSs and support multi-tenancy. Towards this end, in this paper, a novel agent-based decentralised service monitoring approach is developed in distributed SBSs. Compared to the centralised approaches, the proposed decentralised approach can avoid the single point of failure and can balance the computation over the monitoring agents. Also, unlike existing approaches which consider only single tenancy, the proposed approach takes multi-tenancy into account in distributed SBSs. Experimental results demonstrate that the proposed approach can respond as quickly as centralised approaches with much less computation overhead.
Please use this identifier to cite or link to this item: