Robust domain adaptation for relation extraction via clustering consistency

Nguyen, ML; Tsang, IW; Chai, KMA; Chieu, HL

Robust domain adaptation for relation extraction via clustering consistency

Nguyen, ML Tsang, IW

Chai, KMA Chieu, HL

Permalink

Publication Type:: Conference Proceeding
Citation:: 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, 2014, 1 pp. 807 - 817
Issue Date:: 2014-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (161.58 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Nguyen, ML	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.contributor.author	Chai, KMA	en_US
dc.contributor.author	Chieu, HL	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, 2014, 1 pp. 807 - 817	en_US
dc.identifier.isbn	9781937284725	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121621
dc.description.abstract	We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains. We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions. Our framework leverages on both labeled and unlabeled data in the target domain. First, we determine the relevance of each source domain to the target domain for each relation type, using the consistency between the clustering given by the target domain labels and the clustering given by the predictors trained for the source domain. To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain. Second, we trade-off between using relevance-weighted sourcedomain predictors and the labeled target data. Again, to overcome the imbalance distribution, the source-domain predictors operate on the unlabeled target data. Our method outperforms numerous baselines and a weakly-supervised relation extraction method on ACE 2004 and YAGO. © 2014 Association for Computational Linguistics.	en_US
dc.relation.ispartof	52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference	en_US
dc.title	Robust domain adaptation for relation extraction via clustering consistency	en_US
dc.type	Conference Proceeding
utslib.citation.volume	1	en_US
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	1	en_US

Abstract:

We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains. We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions. Our framework leverages on both labeled and unlabeled data in the target domain. First, we determine the relevance of each source domain to the target domain for each relation type, using the consistency between the clustering given by the target domain labels and the clustering given by the predictors trained for the source domain. To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain. Second, we trade-off between using relevance-weighted sourcedomain predictors and the labeled target data. Again, to overcome the imbalance distribution, the source-domain predictors operate on the unlabeled target data. Our method outperforms numerous baselines and a weakly-supervised relation extraction method on ACE 2004 and YAGO. © 2014 Association for Computational Linguistics.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121621