Semi-supervised Heterogeneous Domain Adaptation: Theory and Algorithm

Fang, Z; Jie, L; Feng, L; Guangquan, Z

Semi-supervised Heterogeneous Domain Adaptation: Theory and Algorithm

Fang, Z

Jie, L Feng, L Guangquan, Z

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, PP, (1), pp. 1087-1105
Issue Date:: 2022-01-27

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted versionAdobe PDF (778.92 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Fang, Z https://orcid.org/0000-0003-0602-6255
dc.contributor.author	Jie, L
dc.contributor.author	Feng, L
dc.contributor.author	Guangquan, Z
dc.date.accessioned	2023-03-02T07:01:38Z
dc.date.available	2023-03-02T07:01:38Z
dc.date.issued	2022-01-27
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, PP, (1), pp. 1087-1105
dc.identifier.issn	0162-8828
dc.identifier.issn	1939-3539
dc.identifier.uri	http://hdl.handle.net/10453/166639
dc.description.abstract	Semi-supervised heterogeneous domain adaptation (SsHeDA) aims to train a classifier for the target domain, in which only unlabeled and a small number of labeled data are available. This is done by leveraging knowledge acquired from a heterogeneous source domain. From algorithmic perspectives, several methods have been proposed to solve the SsHeDA problem; yet there is still no theoretical foundation to explain the nature of the SsHeDA problem or to guide new and better solutions. Motivated by compatibility condition in semi-supervised probably approximately correct (PAC) theory, we explain the SsHeDA problem by proving its generalization error that is, why labeled heterogeneous source data and unlabeled target data help to reduce the target risk. Guided by our theory, we devise two algorithms as proof of concept. One, kernel heterogeneous domain alignment (KHDA), is a kernel-based algorithm; the other, joint mean embedding alignment (JMEA), is a neural network-based algorithm. When a dataset is small, KHDA's training time is less than JMEA's. When a dataset is large, JMEA is more accurate in the target domain. Comprehensive experiments with image/text classification tasks show KHDA to be the most accurate among all non-neural network baselines, and JMEA to be the most accurate among all baselines.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/FL190100149
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence
dc.relation.isbasedon	10.1109/TPAMI.2022.3146234
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Semi-supervised Heterogeneous Domain Adaptation: Theory and Algorithm
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2023-03-02T07:01:36Z
pubs.issue	1
pubs.publication-status	Submitted
pubs.volume	PP
utslib.citation.issue	1

Abstract:

Semi-supervised heterogeneous domain adaptation (SsHeDA) aims to train a classifier for the target domain, in which only unlabeled and a small number of labeled data are available. This is done by leveraging knowledge acquired from a heterogeneous source domain. From algorithmic perspectives, several methods have been proposed to solve the SsHeDA problem; yet there is still no theoretical foundation to explain the nature of the SsHeDA problem or to guide new and better solutions. Motivated by compatibility condition in semi-supervised probably approximately correct (PAC) theory, we explain the SsHeDA problem by proving its generalization error that is, why labeled heterogeneous source data and unlabeled target data help to reduce the target risk. Guided by our theory, we devise two algorithms as proof of concept. One, kernel heterogeneous domain alignment (KHDA), is a kernel-based algorithm; the other, joint mean embedding alignment (JMEA), is a neural network-based algorithm. When a dataset is small, KHDA's training time is less than JMEA's. When a dataset is large, JMEA is more accurate in the target domain. Comprehensive experiments with image/text classification tasks show KHDA to be the most accurate among all non-neural network baselines, and JMEA to be the most accurate among all baselines.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/166639