Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition
- Publisher:
- IEEE
- Publication Type:
- Conference Proceeding
- Citation:
- 2023 IEEE International Conference on Multimedia and Expo (ICME), 2023, 2023-July, pp. 642-647
- Issue Date:
- 2023-01-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
1665709.pdf | Published version | 2.55 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Compared to feature or decision fusion hybrid fusion can beneficially improve audio visual speech recognition accuracy Existing works are mainly prone to design the multi modality feature extraction process interaction and prediction neglecting useful information on the multi modality and the optimal combination of different predicted results In this paper we propose a multi scale hybrid fusion network MSHF for mandarin audio visual speech recognition Our MSHF consists of a feature extraction subnetwork to exploit the proposed multi scale feature extraction module MSFE to obtain multi scale features and a hybrid fusion subnetwork to integrate the intrinsic correlation of different modality information optimizing the weights of prediction results for different modalities to achieve the best classification We further design a feature recognition module FRM for accurate audio visual speech recognition We conducted experiments on the CAS VSR W1k dataset The experimental results show that the proposed method outperforms the selected competitive baselines and the state of the art indicating the superiority of our proposed modules
Please use this identifier to cite or link to this item: