TCSST: Transfer classification of short &amp; sparse text using external data

Long, G; Chen, L; Zhu, X; Zhang, C

TCSST: Transfer classification of short & sparse text using external data

Long, G

Chen, L

Zhu, X Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: ACM International Conference Proceeding Series, 2012, pp. 764 - 772
Issue Date:: 2012-12-19

Closed Access

	Filename	Description	Size
	2011008002OK.pdf		1.07 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515	en_US
dc.contributor.author	Chen, L https://orcid.org/0000-0002-6468-5729	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2012-12-19	en_US
dc.identifier.citation	ACM International Conference Proceeding Series, 2012, pp. 764 - 772	en_US
dc.identifier.isbn	9781450311564	en_US
dc.identifier.uri	http://hdl.handle.net/10453/22288
dc.description.abstract	Short & sparse text is becoming more prevalent on the web, such as search snippets, micro-blogs and product reviews. Accurately classifying short & sparse text has emerged as an important while challenging task. Existing work has considered utilizing external data (e.g. Wikipedia) to alleviate data sparseness, by appending topics detected from external data as new features. However, training a classifier on features concatenated from different spaces is not easy considering the features have different physical meanings and different significance to the classification task. Moreover, it exacerbates the "curse of dimensionality" problem. In this study, we propose a transfer classification method, TCSST, to exploit the external data to tackle the data sparsity issue. The transfer classifier will be learned in the original feature space. Considering that the labels of the external data may not be readily available or sufficiently enough, TCSST further exploits the unlabeled external data to aid the transfer classification. We develop novel strategies to allow TCSST to iteratively select high quality unlabeled external data to help with the classification. We evaluate the performance of TCSST on both benchmark as well as real-world data sets. Our experimental results demonstrate that the proposed method is effective in classifying very short & sparse text, consistently outperforming existing and baseline methods. © 2012 ACM.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP1093762
dc.relation	http://purl.org/au-research/grants/arc/LP120100566
dc.relation.ispartof	ACM International Conference Proceeding Series	en_US
dc.relation.isbasedon	10.1145/2396761.2396859	en_US
dc.title	TCSST: Transfer classification of short & sparse text using external data	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Hawaii, USA	en_US
dc.location.activity	Auckland, New Zealand
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Short & sparse text is becoming more prevalent on the web, such as search snippets, micro-blogs and product reviews. Accurately classifying short & sparse text has emerged as an important while challenging task. Existing work has considered utilizing external data (e.g. Wikipedia) to alleviate data sparseness, by appending topics detected from external data as new features. However, training a classifier on features concatenated from different spaces is not easy considering the features have different physical meanings and different significance to the classification task. Moreover, it exacerbates the "curse of dimensionality" problem. In this study, we propose a transfer classification method, TCSST, to exploit the external data to tackle the data sparsity issue. The transfer classifier will be learned in the original feature space. Considering that the labels of the external data may not be readily available or sufficiently enough, TCSST further exploits the unlabeled external data to aid the transfer classification. We develop novel strategies to allow TCSST to iteratively select high quality unlabeled external data to help with the classification. We evaluate the performance of TCSST on both benchmark as well as real-world data sets. Our experimental results demonstrate that the proposed method is effective in classifying very short & sparse text, consistently outperforming existing and baseline methods. © 2012 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/22288

TCSST: Transfer classification of short &amp; sparse text using external data

TCSST: Transfer classification of short & sparse text using external data