Trends and Limitations in Transformer-Based BCI Research

Pfeffer, MA; Wong, JKW; Ling, SH

Trends and Limitations in Transformer-Based BCI Research

Pfeffer, MA Wong, JKW Ling, SH

Permalink

Publisher:: MDPI
Publication Type:: Journal Article
Citation:: Applied Sciences Switzerland, 2025, 15, (20)
Issue Date:: 2025-10-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (1.35 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Pfeffer, MA
dc.contributor.author	Wong, JKW
dc.contributor.author	Ling, SH
dc.date.accessioned	2026-02-09T03:17:14Z
dc.date.available	2026-02-09T03:17:14Z
dc.date.issued	2025-10-01
dc.identifier.citation	Applied Sciences Switzerland, 2025, 15, (20)
dc.identifier.issn	2076-3417
dc.identifier.issn	2076-3417
dc.identifier.uri	http://hdl.handle.net/10453/193065
dc.description.abstract	Transformer-based models have accelerated EEG motor imagery (MI) decoding by using self-attention to capture long-range temporal structures while complementing spatial inductive biases. This systematic survey of Scopus-indexed works from 2020 to 2025 indicates that reported advances are concentrated in offline, protocol-heterogeneous settings; inconsistent preprocessing, non-standard data splits, and sparse efficiency frequently reporting cloud claims of generalization and real-time suitability. Under session- and subject-aware evaluation on the BCIC IV 2a/2b dataset, typical performance clusters are in the high-80% range for binary MI and the mid-70% range for multi-class tasks with gains of roughly 5–10 percentage points achieved by strong hybrids (CNN/TCN–Transformer; hierarchical attention) rather than by extreme figures often driven by leakage-prone protocols. In parallel, transformer-driven denoising—particularly diffusion–transformer hybrids—yields strong signal-level metrics but remains weakly linked to task benefit; denoise → decode validation is rarely standardized despite being the most relevant proxy when artifact-free ground truth is unavailable. Three priorities emerge for translation: protocol discipline (fixed train/test partitions, transparent preprocessing, mandatory reporting of parameters, FLOPs, per-trial latency, and acquisition-to-feedback delay); task relevance (shared denoise → decode benchmarks for MI and related paradigms); and adaptivity at scale (self-supervised pretraining on heterogeneous EEG corpora and resource-aware co-optimization of preprocessing and hybrid transformer topologies). Evidence from subject-adjusting evolutionary pipelines that jointly tune preprocessing, attention depth, and CNN–Transformer fusion demonstrates reproducible inter-subject gains over established baselines under controlled protocols. Implementing these practices positions transformer-driven BCIs to move beyond inflated offline estimates toward reliable, real-time neurointerfaces with concrete clinical and assistive relevance.
dc.language	English
dc.publisher	MDPI
dc.relation.ispartof	Applied Sciences Switzerland
dc.relation.isbasedon	10.3390/app152011150
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Trends and Limitations in Transformer-Based BCI Research
dc.type	Journal Article
utslib.citation.volume	15
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	University of Technology Sydney/UTS Groups
pubs.organisational-group	University of Technology Sydney/UTS Groups/Centre for Health Technologies (CHT)
pubs.organisational-group	University of Technology Sydney/Faculty of Design and Society
pubs.organisational-group	University of Technology Sydney/Faculty of Design and Society/School of Built Environment
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/Engineering and IT Related HDR Students
pubs.organisational-group	University of Technology Sydney/Faculty of Design and Society/School of Built Environment/Construction Discipline
utslib.copyright.status	open_access	*
dc.rights.license	This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
dc.date.updated	2026-02-09T03:17:13Z
pubs.issue	20
pubs.publication-status	Published
pubs.volume	15
utslib.citation.issue	20

Abstract:

Transformer-based models have accelerated EEG motor imagery (MI) decoding by using self-attention to capture long-range temporal structures while complementing spatial inductive biases. This systematic survey of Scopus-indexed works from 2020 to 2025 indicates that reported advances are concentrated in offline, protocol-heterogeneous settings; inconsistent preprocessing, non-standard data splits, and sparse efficiency frequently reporting cloud claims of generalization and real-time suitability. Under session- and subject-aware evaluation on the BCIC IV 2a/2b dataset, typical performance clusters are in the high-80% range for binary MI and the mid-70% range for multi-class tasks with gains of roughly 5–10 percentage points achieved by strong hybrids (CNN/TCN–Transformer; hierarchical attention) rather than by extreme figures often driven by leakage-prone protocols. In parallel, transformer-driven denoising—particularly diffusion–transformer hybrids—yields strong signal-level metrics but remains weakly linked to task benefit; denoise → decode validation is rarely standardized despite being the most relevant proxy when artifact-free ground truth is unavailable. Three priorities emerge for translation: protocol discipline (fixed train/test partitions, transparent preprocessing, mandatory reporting of parameters, FLOPs, per-trial latency, and acquisition-to-feedback delay); task relevance (shared denoise → decode benchmarks for MI and related paradigms); and adaptivity at scale (self-supervised pretraining on heterogeneous EEG corpora and resource-aware co-optimization of preprocessing and hybrid transformer topologies). Evidence from subject-adjusting evolutionary pipelines that jointly tune preprocessing, attention depth, and CNN–Transformer fusion demonstrates reproducible inter-subject gains over established baselines under controlled protocols. Implementing these practices positions transformer-driven BCIs to move beyond inflated offline estimates toward reliable, real-time neurointerfaces with concrete clinical and assistive relevance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/193065