TY  - JOUR
AB  - Model stealing attacks involve creating copies of machine learning models that have similar functionalities to the original model without proper authorization. Such attacks raise significant concerns about the intellectual property of the machine learning models. Nonetheless, current defense mechanisms against such attacks tend to exhibit certain drawbacks, notably in terms of utility, and robustness. For example, watermarking-based defenses require victim models to be retrained for embedding watermarks, which can potentially impact the main task performance. Moreover, other defenses, especially fingerprinting-based methods, often rely on specific samples like adversarial examples to verify ownership of the target model. These approaches might prove less robust against adaptive attacks, such as model stealing with adversarial training. It remains unclear whether normal examples, as opposed to adversarial ones, can effectively reflect the characteristics of stolen models. To tackle these challenges, we propose a novel method that leverages a neural network as a decoder to inverse the suspicious model's outputs. Inspired by model inversion attacks, we argue that this decoding process will unveil hidden patterns inherent in the original outputs of the suspicious model. Drawing from these decoding outcomes, we calculate specific metrics to determine the legitimacy of the suspicious models. We validate the efficacy of our defense technique against diverse model stealing attacks, specifically within the domain of classification tasks based on deep neural networks.
AU  - Zhou, S
AU  - Zhu, T
AU  - Ye, D
AU  - Zhou, W
AU  - Zhao, W
DA  - 2024/01/01
DO  - 10.1109/TIFS.2024.3376190
EP  - 4145
JO  - IEEE Transactions on Information Forensics and Security
PB  - IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
PY  - 2024/01/01
SP  - 4130
TI  - Inversion-Guided Defense: Detecting Model Stealing Attacks by Output Inverting
VL  - 19
Y1  - 2024/01/01
Y2  - 2026/05/16
ER  -