Symbolic and Statistical Learning Approaches to Speech Summarization: A Scoping Review

Elsevier BV
Publication Type:
Journal Article
Computer Speech and Language, 2022, 72, pp. 101305
Issue Date:
Filename Description Size
Symbolic and Statistical Learning Approaches to Speech Summarization.pdfPublished version1.97 MB
Adobe PDF
Full metadata record
Speech summarization techniques take human speech as input and then output an abridged version as text or speech. Speech summarization has applications in many domains from information technology to health care, for example improving speech archives or reducing clinical documentation burden. This scoping review maps close to 2 decades of speech summarization literature, spanning from the early machine learning works up to ensemble models, with no restrictions on the language summarized, research method, or paper type. We reviewed a total of 110 papers out of a set of 188 found through a literature search and extracted speech features used, methods, scope, and training corpora. Most studies employ one of four speech summarization architectures: (1) Sentence extraction and compaction; (2) Feature extraction and classification or rank-based sentence selection; (3) Sentence compression and compression summarization; and (4) Language modelling. We also discuss the strengths and weaknesses of these different methods and speech features. Overall, supervised methods (e.g. Hidden Markov support vector machines, Ranking support vector machines, Conditional random fields) performed better than unsupervised methods. As supervised methods require manually annotated training data which can be costly, there was more interest in unsupervised methods. Recent research into unsupervised methods focusses on extending language modelling, for example by combining Uni-gram modelling with deep neural networks. This review does not include recent work in deep learning.
Please use this identifier to cite or link to this item: