MMOOC: A Multimodal Misinformation Dataset for Out-of-Context News Analysis

Springer Nature
Publication Type:
Information Security and Privacy, 2024, 14897 LNCS, pp. 444-459
Issue Date:
Full metadata record
Misinformation in the news media poses a significant challenge, particularly with the rise of manipulating real news into misleading contexts, i.e., Out-of-Context (OOC) media. Existing datasets for studying OOC media only have at most two modalities with limited source and topic scope. These datasets undermine the effectiveness of models trained to identify real-world misinformation. In this paper, we first introduce a comprehensive OOC media dataset compiled from various sources, dubbed the Multimodal Misinformation Dataset for Out-of-Context News Analysis (MMOOC). We collect 91K authentic multimodal news from 60 influential news outlets around the world, such as ABC News and BBC News. Then, we produce 364k OOC fabricated news data by recombining the authentic ones. Furthermore, we propose an MMOOC-Checker to check the OOC media by leveraging not only the semantic consistency among different modalities but also temporal consistency between the first released dates of the modalities. To be specific, we develop an internal OOC-checker to examine the semantic consistency between modalities. Meanwhile, we design an external OOC-checker to utilize the temporal closeness between the news in different modalities. Experiments on MMOOC demonstrate the effectiveness of the MMOOC-Checker. The dataset will be released soon.
Please use this identifier to cite or link to this item: