Detection of Diffusion Model-Generated Faces by Assessing Smoothness and Noise Tolerance

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Conference Proceeding
Citation:
IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB, 2024, 00, pp. 1-6
Issue Date:
2024-01-01
Filename Description Size
2024_BMSB-Detection of Diffusion Model-Generated Faces by.pdfAccepted version1.16 MB
Adobe PDF
Full metadata record
The fast growth of artificial intelligence (AI) raises much concern about the misinformation brought by AI-generated content (AIGC), especially Deepfake techniques that generate fake human faces. The recent development of Diffusion Models (DMs) moves another critical step forward to generate high-resolution and realistic human faces, which has become a challenge for existing Deepfake detectors. In this paper, we propose a DM-generated image detector by looking into the generation pipeline of DMs and the details of DM-generated images. The detector is based on the observation that DM-generated human faces show over-smooth textures and do not contain details as real human faces. Through a comprehensive analysis of DM-generated faces in spatial and frequency domains, we noticed that the over-smoothness improves the tolerance of Gaussian noise since excessive smoothness mitigates some of the impact of noise. Inspired by the observations, we propose a Deepfake detector capable of recognizing challenging DM-generated faces. We mainly propose the Noise Residual Unit (NRD) in our framework to collect the frequency response of images to Gaussian noise as distinctive features for classification. In detail, for an input face image, we add Gaussian noise to it and get the noise-degraded image. Then, the NRU generates the Noise Residual Image (NRI) by calculating the residual of the high-pass-filtered original image and the high-pass-filtered degraded image. The NRI indicates the high-frequency impact brought by the Gaussian noise and, therefore, suggests the tolerance of the original image to noise degradation. The original image and NRI are encoded and fused to obtain the joint representation, which is then fed to a classifier to predict the binary label. We conducted comprehensive experiments to evaluate the effectiveness of the proposed detector. The results indicate that our proposed detector achieves state-of-the-art detection performance on DM-generated faces and generalizes well to unseen DM-generated and GAN-generated face datasets.
Please use this identifier to cite or link to this item: