A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays

Naeni, LM; Craig, H; Berretta, R; Moscato, P

A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays

Naeni, LM

Craig, H Berretta, R Moscato, P

Permalink

Publication Type:: Journal Article
Citation:: PLoS ONE, 2016, 11 (8)
Issue Date:: 2016-08-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (2.58 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Naeni, LM https://orcid.org/0000-0002-3360-7680	en_US
dc.contributor.author	Craig, H	en_US
dc.contributor.author	Berretta, R	en_US
dc.contributor.author	Moscato, P	en_US
dc.date.available	2016-06-08	en_US
dc.date.issued	2016-08-01	en_US
dc.identifier.citation	PLoS ONE, 2016, 11 (8)	en_US
dc.identifier.uri	http://hdl.handle.net/10453/52841
dc.description.abstract	© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.	en_US
dc.relation.ispartof	PLoS ONE	en_US
dc.relation.isbasedon	10.1371/journal.pone.0157988	en_US
dc.subject.classification	General Science & Technology	en_US
dc.subject.mesh	Cluster Analysis	en_US
dc.subject.mesh	Algorithms	en_US
dc.title	A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays	en_US
dc.type	Journal Article
utslib.citation.volume	8	en_US
utslib.citation.volume	11	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Design, Architecture and Building
pubs.organisational-group	/University of Technology Sydney/Faculty of Design, Architecture and Building/School of Built Environment
utslib.copyright.status	open_access
pubs.issue	8	en_US
pubs.publication-status	Published	en_US
pubs.volume	11	en_US

Abstract:

© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/52841