A topic modeling based approach to novel document automatic summarization
- Publication Type:
- Journal Article
- Expert Systems with Applications, 2017, 84 pp. 12 - 23
- Issue Date:
|A Topic Modeling based Approach to Novel Document Automatic Summarization.pdf||Accepted Manuscript||305.7 kB|
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is currently unavailable due to the publisher's embargo.
The embargo period expires on 31 Oct 2019
© 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality.
Please use this identifier to cite or link to this item: