Speech2EEG: Leveraging Pretrained Speech Model for EEG Signal Recognition.

Zhou, J; Duan, Y; Zou, Y; Chang, Y-C; Wang, Y-K; Lin, C-T

Speech2EEG: Leveraging Pretrained Speech Model for EEG Signal Recognition.

Zhou, J

Duan, Y

Zou, Y Chang, Y-C Wang, Y-K Lin, C-T

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Trans Neural Syst Rehabil Eng, 2023, 31, pp. 2140-2153
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (2.95 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhou, J https://orcid.org/0000-0002-6620-604X
dc.contributor.author	Duan, Y https://orcid.org/0000-0003-1517-994X
dc.contributor.author	Zou, Y
dc.contributor.author	Chang, Y-C
dc.contributor.author	Wang, Y-K
dc.contributor.author	Lin, C-T
dc.date.accessioned	2024-03-05T03:47:42Z
dc.date.available	2024-03-05T03:47:42Z
dc.date.issued	2023
dc.identifier.citation	IEEE Trans Neural Syst Rehabil Eng, 2023, 31, pp. 2140-2153
dc.identifier.issn	1534-4320
dc.identifier.issn	1558-0210
dc.identifier.uri	http://hdl.handle.net/10453/176126
dc.description.abstract	Identifying meaningful brain activities is critical in brain-computer interface (BCI) applications. Recently, an increasing number of neural network approaches have been proposed to recognize EEG signals. However, these approaches depend heavily on using complex network structures to improve the performance of EEG recognition and suffer from the deficit of training data. Inspired by the waveform characteristics and processing methods shared between EEG and speech signals, we propose Speech2EEG, a novel EEG recognition method that leverages pretrained speech features to improve the accuracy of EEG recognition. Specifically, a pretrained speech processing model is adapted to the EEG domain to extract multichannel temporal embeddings. Then, several aggregation methods, including the weighted average, channelwise aggregation, and channel-and-depthwise aggregation, are implemented to exploit and integrate the multichannel temporal embeddings. Finally, a classification network is used to predict EEG categories based on the integrated features. Our work is the first to explore the use of pretrained speech models for EEG signal analysis as well as the effective ways to integrate the multichannel temporal embeddings from the EEG signal. Extensive experimental results suggest that the proposed Speech2EEG method achieves state-of-the-art performance on two challenging motor imagery (MI) datasets, the BCI IV-2a and BCI IV-2b datasets, with accuracies of 89.5% and 84.07% , respectively. Visualization analysis of the multichannel temporal embeddings show that the Speech2EEG architecture can capture useful patterns related to MI categories, which can provide a novel solution for subsequent research under the constraints of a limited dataset scale.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation	http://purl.org/au-research/grants/arc/DP210101093
dc.relation	United States Department of the NavyN629091912058
dc.relation	http://purl.org/au-research/grants/arc/DP220100803
dc.relation.ispartof	IEEE Trans Neural Syst Rehabil Eng
dc.relation.isbasedon	10.1109/TNSRE.2023.3268751
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0903 Biomedical Engineering, 0906 Electrical and Electronic Engineering
dc.subject.classification	Biomedical Engineering
dc.subject.classification	4003 Biomedical engineering
dc.subject.classification	4007 Control engineering, mechatronics and robotics
dc.subject.mesh	Humans
dc.subject.mesh	Speech
dc.subject.mesh	Imagination
dc.subject.mesh	Brain-Computer Interfaces
dc.subject.mesh	Neural Networks, Computer
dc.subject.mesh	Electroencephalography
dc.subject.mesh	Algorithms
dc.subject.mesh	Humans
dc.subject.mesh	Electroencephalography
dc.subject.mesh	Speech
dc.subject.mesh	Imagination
dc.subject.mesh	Algorithms
dc.subject.mesh	Brain-Computer Interfaces
dc.subject.mesh	Neural Networks, Computer
dc.subject.mesh	Humans
dc.subject.mesh	Speech
dc.subject.mesh	Imagination
dc.subject.mesh	Brain-Computer Interfaces
dc.subject.mesh	Neural Networks, Computer
dc.subject.mesh	Electroencephalography
dc.subject.mesh	Algorithms
dc.title	Speech2EEG: Leveraging Pretrained Speech Model for EEG Signal Recognition.
dc.type	Journal Article
utslib.citation.volume	31
utslib.location.activity	United States
utslib.for	0903 Biomedical Engineering
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2024-03-05T03:47:36Z
pubs.publication-status	Published
pubs.volume	31

Abstract:

Identifying meaningful brain activities is critical in brain-computer interface (BCI) applications. Recently, an increasing number of neural network approaches have been proposed to recognize EEG signals. However, these approaches depend heavily on using complex network structures to improve the performance of EEG recognition and suffer from the deficit of training data. Inspired by the waveform characteristics and processing methods shared between EEG and speech signals, we propose Speech2EEG, a novel EEG recognition method that leverages pretrained speech features to improve the accuracy of EEG recognition. Specifically, a pretrained speech processing model is adapted to the EEG domain to extract multichannel temporal embeddings. Then, several aggregation methods, including the weighted average, channelwise aggregation, and channel-and-depthwise aggregation, are implemented to exploit and integrate the multichannel temporal embeddings. Finally, a classification network is used to predict EEG categories based on the integrated features. Our work is the first to explore the use of pretrained speech models for EEG signal analysis as well as the effective ways to integrate the multichannel temporal embeddings from the EEG signal. Extensive experimental results suggest that the proposed Speech2EEG method achieves state-of-the-art performance on two challenging motor imagery (MI) datasets, the BCI IV-2a and BCI IV-2b datasets, with accuracies of 89.5% and 84.07% , respectively. Visualization analysis of the multichannel temporal embeddings show that the Speech2EEG architecture can capture useful patterns related to MI categories, which can provide a novel solution for subsequent research under the constraints of a limited dataset scale.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176126