Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM

Liu, Y; Wang, L; Shi, T; Li, J

Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM

Liu, Y Wang, L Shi, T Li, J

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Information Systems, 2022, 103, pp. 101865
Issue Date:: 2022-01-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0306437921000934-main.pdf		753.22 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, Y
dc.contributor.author	Wang, L
dc.contributor.author	Shi, T
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413
dc.date.accessioned	2023-06-30T04:21:07Z
dc.date.available	2023-06-30T04:21:07Z
dc.date.issued	2022-01-01
dc.identifier.citation	Information Systems, 2022, 103, pp. 101865
dc.identifier.issn	0306-4379
dc.identifier.uri	http://hdl.handle.net/10453/171034
dc.description.abstract	Spam reviews misguide decision makings of consumers and may seriously affect fair trading in the online markets. Existing methods for detecting spam reviews mainly focus on feature designs from linguistic and psychological clues, but they hardly reveal the potential semantics. Recent research works apply deep learning to capture semantics features, while these models fail to extract multi-granularity information of the text structures nor consider the mutual influence among the sentences. We propose a hierarchical attention network in which distinct attentions are purposely used at the two layers to capture important, comprehensive, and multi-granularity semantic information. At the first layer, we especially use an N-gram CNN to extract the multi-granularity semantics of the sentences. We then use a combination of convolution structure and Bi-LSTM to extract important and comprehensive semantics in a document at the second layer. Extensive experiments on public datasets demonstrate that our model has superior detection performance over the state-of-the-art baselines, improving F1 score in the mixed-domain to 89.3% (with 4.8 points absolute improvement), F1 score in the Doctor domain to 92.8% (with 9.9 points absolute improvement), F1 score in the Hotel domain to 86.1% (with 2.4 points absolute improvement) and F1 score in the cross-domain to 84.7% (with 10.4 points absolute improvement).
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Information Systems
dc.relation.isbasedon	10.1016/j.is.2021.101865
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0806 Information Systems
dc.subject.classification	Information Systems
dc.title	Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM
dc.type	Journal Article
utslib.citation.volume	103
utslib.for	0806 Information Systems
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Centre for Health Technologies (CHT)
utslib.copyright.status	closed_access	*
dc.date.updated	2023-06-30T04:21:06Z
pubs.publication-status	Published
pubs.volume	103

Abstract:

Spam reviews misguide decision makings of consumers and may seriously affect fair trading in the online markets. Existing methods for detecting spam reviews mainly focus on feature designs from linguistic and psychological clues, but they hardly reveal the potential semantics. Recent research works apply deep learning to capture semantics features, while these models fail to extract multi-granularity information of the text structures nor consider the mutual influence among the sentences. We propose a hierarchical attention network in which distinct attentions are purposely used at the two layers to capture important, comprehensive, and multi-granularity semantic information. At the first layer, we especially use an N-gram CNN to extract the multi-granularity semantics of the sentences. We then use a combination of convolution structure and Bi-LSTM to extract important and comprehensive semantics in a document at the second layer. Extensive experiments on public datasets demonstrate that our model has superior detection performance over the state-of-the-art baselines, improving F1 score in the mixed-domain to 89.3% (with 4.8 points absolute improvement), F1 score in the Doctor domain to 92.8% (with 9.9 points absolute improvement), F1 score in the Hotel domain to 86.1% (with 2.4 points absolute improvement) and F1 score in the cross-domain to 84.7% (with 10.4 points absolute improvement).

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171034