MGRW-Transformer: Multigranularity Random Walk Transformer Model for Interpretable Learning.

Publisher:
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:
Journal Article
Citation:
IEEE Trans Neural Netw Learn Syst, 2023, PP, (99)
Issue Date:
2023-11-08
Full metadata record
Deep-learning models have been widely used in image recognition tasks due to their strong feature-learning ability. However, most of the current deep-learning models are "black box" systems that lack a semantic explanation of how they reached their conclusions. This makes it difficult to apply these methods to complex medical image recognition tasks. The vision transformer (ViT) model is the most commonly used deep-learning model with a self-attention mechanism that shows the region of influence as compared to traditional convolutional networks. Thus, ViT offers greater interpretability. However, medical images often contain lesions of variable size in different locations, which makes it difficult for a deep-learning model with a self-attention module to reach correct and explainable conclusions. We propose a multigranularity random walk transformer (MGRW-Transformer) model guided by an attention mechanism to find the regions that influence the recognition task. Our method divides the image into multiple subimage blocks and transfers them to the ViT module for classification. Simultaneously, the attention matrix output from the multiattention layer is fused with the multigranularity random walk module. Within the multigranularity random walk module, the segmented image blocks are used as nodes to construct an undirected graph using the attention node as a starting node and guiding the coarse-grained random walk. We appropriately divide the coarse blocks into finer ones to manage the computational cost and combine the results based on the importance of the discovered features. The result is that the model offers a semantic interpretation of the input image, a visualization of the interpretation, and insight into how the decision was reached. Experimental results show that our method improves classification performance with medical images while presenting an understandable interpretation for use by medical professionals.
Please use this identifier to cite or link to this item: