DAFS: a domain aware few shot generative model for event detection

Xia, N; Yu, H; Wang, Y; Xuan, J; Luo, X

DAFS: a domain aware few shot generative model for event detection

Xia, N Yu, H

Wang, Y Xuan, J

Luo, X

Permalink

Publisher:: SPRINGER
Publication Type:: Journal Article
Citation:: Machine Learning, 2022
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 4 Sep 2023

Adobe PDF

Download Accepted versionAdobe PDF (998.65 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xia, N
dc.contributor.author	Yu, H https://orcid.org/0000-0003-3444-9992
dc.contributor.author	Wang, Y
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908
dc.contributor.author	Luo, X
dc.date.accessioned	2023-01-30T03:52:49Z
dc.date.available	2023-01-30T03:52:49Z
dc.date.issued	2022-01-01
dc.identifier.citation	Machine Learning, 2022
dc.identifier.issn	0885-6125
dc.identifier.issn	1573-0565
dc.identifier.uri	http://hdl.handle.net/10453/165585
dc.description.abstract	More and more, large-scale pre-trained models show apparent advantages in solving the event detection (ED), i.e., a task to solve the problem of event classification by identifying trigger words. However, this kind of model depends heavily on labeled training data. Unfortunately, there is not enough such data for some particular areas, such as finance, due to the high cost of the data annotation process. Besides, the manually labeled training data has many problems like uneven sampling distribution, poor diversity, and massive long-tail data. Recently, some researchers have used the generative model to label data. However, training the generative models needs rich domain knowledge, which cannot be obtained from a Few-Shot resource. Therefore, we propose a Domain-Aware Few-Shot (DAFS) generative model that can generate domain based training data through a relatively small amount of labeled data. First, DAFS utilizes self-supervised information from various categories of sentences to calculate words’ transition probability under different domain and retain key triggers in each sentence. Then, we apply our joint algorithm to generate labeled training data that considers both diversity and effectiveness. Experimental results demonstrate that the training data generated by DAFS significantly improves the performance of ED in actual financial data. Especially when there are no more than 20 training data, DAFS can still ensure the generative quality to a certain extent. It also obtains new state-of-the-art results on ACE2005 multilingual corpora.
dc.language	English
dc.publisher	SPRINGER
dc.relation.ispartof	Machine Learning
dc.relation.isbasedon	10.1007/s10994-022-06198-5
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	This is a post-peer-review, pre-copyedit version of an article published in Xia, N., Yu, H., Wang, Y. et al. DAFS: a domain aware few shot generative model for event detection. Mach Learn (2022).. The final authenticated version is available online at: https://doi.org/10.1007/s10994-022-06198-5
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 1702 Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	DAFS: a domain aware few shot generative model for event detection
dc.type	Journal Article
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	1702 Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2023-09-04T00:00:00+1000Z
dc.date.updated	2023-01-30T03:52:45Z
pubs.publication-status	Published

Abstract:

More and more, large-scale pre-trained models show apparent advantages in solving the event detection (ED), i.e., a task to solve the problem of event classification by identifying trigger words. However, this kind of model depends heavily on labeled training data. Unfortunately, there is not enough such data for some particular areas, such as finance, due to the high cost of the data annotation process. Besides, the manually labeled training data has many problems like uneven sampling distribution, poor diversity, and massive long-tail data. Recently, some researchers have used the generative model to label data. However, training the generative models needs rich domain knowledge, which cannot be obtained from a Few-Shot resource. Therefore, we propose a Domain-Aware Few-Shot (DAFS) generative model that can generate domain based training data through a relatively small amount of labeled data. First, DAFS utilizes self-supervised information from various categories of sentences to calculate words’ transition probability under different domain and retain key triggers in each sentence. Then, we apply our joint algorithm to generate labeled training data that considers both diversity and effectiveness. Experimental results demonstrate that the training data generated by DAFS significantly improves the performance of ED in actual financial data. Especially when there are no more than 20 training data, DAFS can still ensure the generative quality to a certain extent. It also obtains new state-of-the-art results on ACE2005 multilingual corpora.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/165585