Semi-automated Extraction of New Requirements from Online Reviews for Software Product Evolution
- Publication Type:
- Conference Proceeding
- Issue Date:
In order to improve and increase their utility, software products must evolve continually and incrementally to meet the new requirements of current and future users. Online reviews from users of the software provide a rich and readily available resource for discovering candidate new features for future software releases. However, it is challenging to manually analyze a large volume of potentially unstructured and noisy data to extract useful information to support software release planning decisions. This paper investigates machine learning techniques to automatically identify text that represents users’ ideas for new features from their online reviews. A binary classification approach to categorize extracted text as either a feature or non-feature was evaluated experimentally. Three machine learning algorithms were evaluated in the experiments: Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Variations on the configurations of k-fold cross validation, the use of n-grams and review sentiment were also experimentally evaluated. Based on binary classification of over a thousand separate reviews of two products, Trello and Jira, linear Support Vector Machines with review sentiment as an input, using n-gram (1,4) together with k-fold 10 cross validation gave the best performance. The results have confirmed the feasibility and accuracy of semi-automated extraction of candidate requirements from a large volume of unstructured and noisy online user reviews. The next steps planned are to experiment with machine supported grouping, prioritizing and visualizing the extracted features to best support release planners’ work, as well as extending the sources of candidate requirements.
Please use this identifier to cite or link to this item: