Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search

Publisher:
AAAI Press
Publication Type:
Conference Proceeding
Citation:
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), 2021, 1, pp. 706-714
Issue Date:
2021-05-18
Filename Description Size
16151-Article Text-19645-1-2-20210518.pdfPublished version268.51 kB
Adobe PDF
Full metadata record
Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification are rarely studied. Several lines of text attack methods have been proposed in the literature, such as character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word distortions necessary to induce misclassification, while simultaneously ensuring the lexical correctness, syntactic correctness, and semantic similarity. In this paper, we propose the Bigram and Unigram based Monotonic Heuristic Search (BU-MHS) method to examine the vulnerability of deep models. Our method has three major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless outputs. Secondly, we propose a hybrid method to replace the input words with both their synonyms and sememe candidates, which greatly enriches potential substitutions compared to only using synonyms. Lastly, we design a search algorithm, i.e., Monotonic Heuristic Search (MHS), to determine the priority of word replacements, aiming to reduce the modification cost in an adversarial attack. We evaluate the effectiveness of BU-MHS on IMDB, AG's News, and Yahoo! Answers text datasets by attacking four state-of-the-art DNNs models. Experimental results show that our BU-MHS achieves the highest attack success rate by changing the smallest number of words compared with other existing models.
Please use this identifier to cite or link to this item: