Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks

Zhang, Y; Bai, G; Li, X; Nepal, S; Grobler, M; Chen, C; Ko, RKL

Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks

Zhang, Y

Bai, G Li, X Nepal, S Grobler, M Chen, C Ko, RKL

Permalink

Publisher:: IEEE COMPUTER SOC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Dependable and Secure Computing, 2023, 20, (4), pp. 3341-3357
Issue Date:: 2023-07-01

Closed Access

	Filename	Description	Size
	23‘ TDSC Preserving_Privacy_for_Distributed_Genome-Wide_Analysis_Against_Identity_Tracing_Attacks-2.pdf	Accepted version	1.27 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Y https://orcid.org/0000-0001-5611-3483
dc.contributor.author	Bai, G
dc.contributor.author	Li, X
dc.contributor.author	Nepal, S
dc.contributor.author	Grobler, M
dc.contributor.author	Chen, C
dc.contributor.author	Ko, RKL
dc.date.accessioned	2024-02-01T00:46:43Z
dc.date.available	2024-02-01T00:46:43Z
dc.date.issued	2023-07-01
dc.identifier.citation	IEEE Transactions on Dependable and Secure Computing, 2023, 20, (4), pp. 3341-3357
dc.identifier.issn	1545-5971
dc.identifier.issn	1941-0018
dc.identifier.uri	http://hdl.handle.net/10453/175177
dc.description.abstract	Genome-wide analysis has demonstrated both health and social benefits. However, large scale sharing of such data may reveal sensitive information about individuals. One of the emerging challenges is identity tracing attack that exploits correlations among genomic data to reveal the identity of DNA samples. In this paper, we first demonstrate that the adversary can narrow down the sample's identity by detecting his/her genetic relatives and quantify such privacy threat by employing a Shannon entropy-based measurement. For example, we exemplify that when the dataset size reaches 30% of the population, for any target from that population, the uncertainty of the target's identity is reduced to merely 2.3 bits of entropy (i.e., the identity is pinned down within 5 people). Direct application of existing approaches such as differential privacy (DP), secure multiparty computation (MPC) and homomorphic encryption (HE) may not be applicable to this challenge in genome-wide analysis because of the compromise on utility (i.e., accuracy or efficiency). Towards addressing this challenge, this paper proposes a framework named υFrag to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFrag mitigates privacy risks by using a vertical fragmentation to disrupt the genetic architecture on which the adversary relies for identity tracing without sacrificing the capability of genome-wide analysis. We theoretically prove that it preserves the correctness of the primitive functionalities and algorithms ranging from basic summary statistics to advanced neural networks. Our experiments demonstrate that υFrag outperforms secure multiparty computation (MPC) and homomorphic encryption (HE) protocols, with a speedup of more than 221x for training neural networks, and also traditional non-private algorithms and a state-of-the-art noise-based differential privacy (DP) solution in most settings.
dc.language	English
dc.publisher	IEEE COMPUTER SOC
dc.relation.ispartof	IEEE Transactions on Dependable and Secure Computing
dc.relation.isbasedon	10.1109/TDSC.2022.3186672
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0803 Computer Software, 0804 Data Format, 0805 Distributed Computing
dc.subject.classification	Strategic, Defence & Security Studies
dc.subject.classification	4604 Cybersecurity and privacy
dc.subject.classification	4606 Distributed computing and systems software
dc.title	Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks
dc.type	Journal Article
utslib.citation.volume	20
utslib.for	0803 Computer Software
utslib.for	0804 Data Format
utslib.for	0805 Distributed Computing
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2024-02-01T00:46:42Z
pubs.issue	4
pubs.publication-status	Published
pubs.volume	20
utslib.citation.issue	4

Abstract:

Genome-wide analysis has demonstrated both health and social benefits. However, large scale sharing of such data may reveal sensitive information about individuals. One of the emerging challenges is identity tracing attack that exploits correlations among genomic data to reveal the identity of DNA samples. In this paper, we first demonstrate that the adversary can narrow down the sample's identity by detecting his/her genetic relatives and quantify such privacy threat by employing a Shannon entropy-based measurement. For example, we exemplify that when the dataset size reaches 30% of the population, for any target from that population, the uncertainty of the target's identity is reduced to merely 2.3 bits of entropy (i.e., the identity is pinned down within 5 people). Direct application of existing approaches such as differential privacy (DP), secure multiparty computation (MPC) and homomorphic encryption (HE) may not be applicable to this challenge in genome-wide analysis because of the compromise on utility (i.e., accuracy or efficiency). Towards addressing this challenge, this paper proposes a framework named υFrag to facilitate privacy-preserving data sharing and computation in genome-wide analysis. υFrag mitigates privacy risks by using a vertical fragmentation to disrupt the genetic architecture on which the adversary relies for identity tracing without sacrificing the capability of genome-wide analysis. We theoretically prove that it preserves the correctness of the primitive functionalities and algorithms ranging from basic summary statistics to advanced neural networks. Our experiments demonstrate that υFrag outperforms secure multiparty computation (MPC) and homomorphic encryption (HE) protocols, with a speedup of more than 221x for training neural networks, and also traditional non-private algorithms and a state-of-the-art noise-based differential privacy (DP) solution in most settings.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175177