Semi-automated extraction of new requirements from online reviews for software product evolution

Buchan, J; Bano, M; Zowghi, D; Volabouth, P

Semi-automated extraction of new requirements from online reviews for software product evolution

Buchan, J Bano, M

Zowghi, D

Volabouth, P

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - 25th Australasian Software Engineering Conference, ASWEC 2018, 2018, pp. 31 - 40
Issue Date:: 2018-12-24

Closed Access

	Filename	Description	Size
	ASWEC2018_paper_21.pdf	Accepted Manuscript version	1.18 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Buchan, J	en_US
dc.contributor.author	Bano, M https://orcid.org/0000-0002-1447-9521	en_US
dc.contributor.author	Zowghi, D https://orcid.org/0000-0002-6051-0155	en_US
dc.contributor.author	Volabouth, P	en_US
dc.date.issued	2018-12-24	en_US
dc.identifier.citation	Proceedings - 25th Australasian Software Engineering Conference, ASWEC 2018, 2018, pp. 31 - 40	en_US
dc.identifier.isbn	9781728112411	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127806
dc.description.abstract	© 2018 IEEE. In order to improve and increase their utility, software products must evolve continually and incrementally to meet the new requirements of current and future users. Online reviews from users of the software provide a rich and readily available resource for discovering candidate new features for future software releases. However, it is challenging to manually analyze a large volume of potentially unstructured and noisy data to extract useful information to support software release planning decisions. This paper investigates machine learning techniques to automatically identify text that represents users' ideas for new features from their online reviews. A binary classification approach to categorize extracted text as either a feature or non-feature was evaluated experimentally. Three machine learning algorithms were evaluated in the experiments: Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Variations on the configurations of k-fold cross validation, the use of n-grams and review sentiment were also experimentally evaluated. Based on binary classification of over a thousand separate reviews of two products, Trello and Jira, linear Support Vector Machines with review sentiment as an input, using n-gram (1,4) together with k-fold 10 cross validation gave the best performance. The results have confirmed the feasibility and accuracy of semi-automated extraction of candidate requirements from a large volume of unstructured and noisy online user reviews. The next steps planned are to experiment with machine supported grouping, prioritizing and visualizing the extracted features to best support release planners' work, as well as extending the sources of candidate requirements.	en_US
dc.relation.ispartof	Proceedings - 25th Australasian Software Engineering Conference, ASWEC 2018	en_US
dc.relation.isbasedon	10.1109/ASWEC.2018.00013	en_US
dc.title	Semi-automated extraction of new requirements from online reviews for software product evolution	en_US
dc.type	Conference Proceeding
utslib.for	0803 Computer Software	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (Research)
pubs.organisational-group	/University of Technology Sydney/DVC (Research)/Graduate Research School
pubs.organisational-group	/University of Technology Sydney/Faculty of Arts and Social Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Arts and Social Sciences/FASS Faculty Administration
pubs.organisational-group	/University of Technology Sydney/Strength - HCTD - Human Centred Technology Design
pubs.organisational-group	/University of Technology Sydney/Strength - STEM Education Futures
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2018 IEEE. In order to improve and increase their utility, software products must evolve continually and incrementally to meet the new requirements of current and future users. Online reviews from users of the software provide a rich and readily available resource for discovering candidate new features for future software releases. However, it is challenging to manually analyze a large volume of potentially unstructured and noisy data to extract useful information to support software release planning decisions. This paper investigates machine learning techniques to automatically identify text that represents users' ideas for new features from their online reviews. A binary classification approach to categorize extracted text as either a feature or non-feature was evaluated experimentally. Three machine learning algorithms were evaluated in the experiments: Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Variations on the configurations of k-fold cross validation, the use of n-grams and review sentiment were also experimentally evaluated. Based on binary classification of over a thousand separate reviews of two products, Trello and Jira, linear Support Vector Machines with review sentiment as an input, using n-gram (1,4) together with k-fold 10 cross validation gave the best performance. The results have confirmed the feasibility and accuracy of semi-automated extraction of candidate requirements from a large volume of unstructured and noisy online user reviews. The next steps planned are to experiment with machine supported grouping, prioritizing and visualizing the extracted features to best support release planners' work, as well as extending the sources of candidate requirements.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127806