Sequential and unsupervised document authorial clustering based on hidden markov model

Aldebei, K; Farhood, H; Jia, W; Nanda, P; He, X

Sequential and unsupervised document authorial clustering based on hidden markov model

Aldebei, K Farhood, H Jia, W

Nanda, P

He, X

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 11th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Conference on Embedded Software and Systems, Trustcom/BigDataSE/ICESS 2017, 2017, pp. 379 - 385
Issue Date:: 2017-09-07

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (313.34 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Aldebei, K	en_US
dc.contributor.author	Farhood, H	en_US
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338	en_US
dc.contributor.author	Nanda, P https://orcid.org/0000-0002-5748-155X	en_US
dc.contributor.author	He, X https://orcid.org/0000-0001-8962-540X	en_US
dc.date.issued	2017-09-07	en_US
dc.identifier.citation	Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 11th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Conference on Embedded Software and Systems, Trustcom/BigDataSE/ICESS 2017, 2017, pp. 379 - 385	en_US
dc.identifier.isbn	9781509049059	en_US
dc.identifier.uri	http://hdl.handle.net/10453/115397
dc.description.abstract	© 2017 IEEE. Document clustering groups documents of certain similar characteristics in one cluster. Document clustering has shown advantages on organization, retrieval, navigation and summarization of a huge amount of text documents on Internet. This paper presents a novel, unsupervised approach for clustering single-author documents into groups based on authorship. The key novelty is that we propose to extract contextual correlations to depict the writing style hidden among sentences of each document for clustering the documents. For this purpose, we build an Hidden Markov Model (HMM) for representing the relations of sequential sentences, and a two-level, unsupervised framework is constructed. Our proposed approach is evaluated on four benchmark datasets, widely used for document authorship analysis. A scientific paper is also used to demonstrate the performance of the approach on clustering short segments of a text into authorial components. Experimental results show that the proposed approach outperforms the state-of-the-art approaches.	en_US
dc.relation.ispartof	Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 11th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Conference on Embedded Software and Systems, Trustcom/BigDataSE/ICESS 2017	en_US
dc.relation.isbasedon	10.1109/Trustcom/BigDataSE/ICESS.2017.261	en_US
dc.title	Sequential and unsupervised document authorial clustering based on hidden markov model	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CRIN - Realtime Information Networks
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US

Abstract:

© 2017 IEEE. Document clustering groups documents of certain similar characteristics in one cluster. Document clustering has shown advantages on organization, retrieval, navigation and summarization of a huge amount of text documents on Internet. This paper presents a novel, unsupervised approach for clustering single-author documents into groups based on authorship. The key novelty is that we propose to extract contextual correlations to depict the writing style hidden among sentences of each document for clustering the documents. For this purpose, we build an Hidden Markov Model (HMM) for representing the relations of sequential sentences, and a two-level, unsupervised framework is constructed. Our proposed approach is evaluated on four benchmark datasets, widely used for document authorship analysis. A scientific paper is also used to demonstrate the performance of the approach on clustering short segments of a text into authorial components. Experimental results show that the proposed approach outperforms the state-of-the-art approaches.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/115397