Multimodal Deep Learning Approach for Bangla Sign Language Recognition: Integrating Spatial and Geometric Features

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Conference Proceeding
Citation:
2025 2nd International Conference on Next Generation Computing Iot and Machine Learning Ncim 2025, 2025, 00, pp. 1-6
Issue Date:
2025-01-01
Full metadata record
In South Asian countries like Bangladesh, the deaf and hard-of-hearing communities predominantly use Bangla Sign Language (BdSL) for communication. However, existing sign language recognition systems often depend primarily on spatial data derived from images or geometric features extracted from hand landmarks, which limits their applicability and effectiveness. This study proposes a unique multimodal deep learning architecture that improves BdSL recognition by merging CNN-based spatial and landmark-based geometric information. The model is trained on the BdSL47 dataset, which contains 37,103 images and uses real-time data augmentation to improve generalizability. The proposed architecture consists of two concurrent streams: a CNN that extracts spatial characteristics from RGB images and a fully connected network that processes 63-dimensional hand position data. These representations are combined and enhanced with fully connected layers, resulting in robust categorization. Experimental evaluations using 10-fold cross-validation show that the proposed approach beats both classic machine learning classifiers and cutting-edge deep learning models, with a remarkable 99.96% accuracy. The inclusion of an NVIDIA Tesla P100 GPU assures computing power, making the model appropriate for real-time applications. The findings set a new standard for BdSL recognition and demonstrate the efficacy of multimodal learning in sign language classification.
Please use this identifier to cite or link to this item: