Sign Spotting via Multi-modal Fusion and Testing Time Transferring
- Publisher:
- Springer Nature
- Publication Type:
- Chapter
- Citation:
- Computer Vision – ECCV 2022 Workshops, 2023, 13808 LNCS, pp. 271-287
- Issue Date:
- 2023-01-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
978-3-031-25085-9_16.pdf | 1.64 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
This work aims to locate a query isolated sign in a continuous sign video. In this task, the domain gap between the isolated and continuous sign videos often handicaps the localization performance. To address this issue, we propose a parallel multi-modal sign spotting framework. In a nutshell, our framework firstly takes advantage of multi-modal information (including RGB frames, 2D key-points and 3D key-points) to achieve representative sign features. The multi-modal features are employed to complement each other and thus compensate for the deficiency of a single modality, thus leading to informative representations for sign spotting. Moreover, we introduce a testing time top-k transferring technique into our framework to reduce the aforementioned domain gap. Concretely, we first compare the query sign with extracted sign video clips, and then update the feature of the query sign with the features of the top-k best matching video clips. In this manner, the updated query feature will exhibit a smaller domain gap with respect to continuous signs, facilitating feature matching in the following iterations. Experiments on the challenging OSLWL-Test-Set benchmark demonstrate that our method achieves superior performance (0.559 F1-score) compared to the baseline (0.395 F1-score). Our code is available at https://github.com/bb12346/OpenSLR.
Please use this identifier to cite or link to this item: