Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling

Wang, Y; Peng, X; Clarke, A; Schlegel, C; Jiang, J

Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling

Wang, Y Peng, X

Clarke, A Schlegel, C Jiang, J

Permalink

Publisher:: Springer International Publishing
Publication Type:: Chapter
Citation:: AI 2021: Advances in Artificial Intelligence, 2022, 13151 LNAI, pp. 320-331
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 1 Jan 2024

Adobe PDF

Download Accepted versionAdobe PDF (360.75 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, Y
dc.contributor.author	Peng, X https://orcid.org/0000-0002-8901-1472
dc.contributor.author	Clarke, A
dc.contributor.author	Schlegel, C
dc.contributor.author	Jiang, J
dc.date.accessioned	2022-05-26T06:04:58Z
dc.date.available	2022-05-26T06:04:58Z
dc.date.issued	2022-01-01
dc.identifier.citation	AI 2021: Advances in Artificial Intelligence, 2022, 13151 LNAI, pp. 320-331
dc.identifier.isbn	9783030975456
dc.identifier.uri	http://hdl.handle.net/10453/157721
dc.description.abstract	A data custodian of a big organization (such as a Commonwealth Data Integrating Authority), namely teacher, can easily build an intelligent model which is well trained by comprehensive data collected from multiple sources. However, due to information security and privacy-related regulation requirements, full access to the well-trained intelligent model and the comprehensive training data is usually limited to the teacher only and not available to any unit (or branch) of that organization. Therefore, if a unit, namely student, needs an intelligent function similar to the trained intelligent model, the student has to train a similar model from scratch using the student’s own dataset. Such a dataset is usually unlabelled, requiring a big workload on labelling. Inspired by the Iterative Machine Teaching, we propose a novel collaboration pipeline. It enables the teacher to iteratively guide the student to select samples that are most worth labelling from the student’s own dataset, which significantly reduces the requirement for human labelling and, at the same time, prevents regulation and information security breaches. The effectiveness and efficiency of the proposed pipeline is empirically demonstrated on two publicly available healthcare datasets in comparison with baseline methods. This work has broad implications for the healthcare sector to facilitate data modelling in instances where the large labelled datasets are not accessible to each unit.
dc.language	en
dc.publisher	Springer International Publishing
dc.relation	Commonwealth Department of Health
dc.relation.ispartof	AI 2021: Advances in Artificial Intelligence
dc.relation.isbasedon	10.1007/978-3-030-97546-3_26
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling
dc.type	Chapter
utslib.citation.volume	13151 LNAI
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2024-01-01T00:00:00+1000Z
dc.date.updated	2022-05-26T06:04:57Z
pubs.publication-status	Published
pubs.volume	13151 LNAI

Abstract:

A data custodian of a big organization (such as a Commonwealth Data Integrating Authority), namely teacher, can easily build an intelligent model which is well trained by comprehensive data collected from multiple sources. However, due to information security and privacy-related regulation requirements, full access to the well-trained intelligent model and the comprehensive training data is usually limited to the teacher only and not available to any unit (or branch) of that organization. Therefore, if a unit, namely student, needs an intelligent function similar to the trained intelligent model, the student has to train a similar model from scratch using the student’s own dataset. Such a dataset is usually unlabelled, requiring a big workload on labelling. Inspired by the Iterative Machine Teaching, we propose a novel collaboration pipeline. It enables the teacher to iteratively guide the student to select samples that are most worth labelling from the student’s own dataset, which significantly reduces the requirement for human labelling and, at the same time, prevents regulation and information security breaches. The effectiveness and efficiency of the proposed pipeline is empirically demonstrated on two publicly available healthcare datasets in comparison with baseline methods. This work has broad implications for the healthcare sector to facilitate data modelling in instances where the large labelled datasets are not accessible to each unit.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157721