Dynamic and Explainable Multimodal Fake News Detection Using Transformer-Based Text–Image Fusion Framework
DOI:
https://doi.org/10.7492/kjvba932Abstract
The widespread use of social media and digital news platforms has made it easier for information to reach large audiences within seconds. However, the same platforms have also contributed to the rapid spread of misleading or false news content. Detecting such misinformation has become an important challenge for researchers and technology developers. This study presents a multimodal fake news detection system that evaluates both textual and visual information associated with online news. Instead of training a new model from scratch, the proposed system makes use of pre-trained transformer models for inference. Textual content is analyzed using the RoBERTa language model to understand contextual meaning, while images are examined using the Data-efficient Image Transformer (DeiT). To make the system more transparent for users, an explainability layer is included which identifies important keywords through TF-IDF analysis and evaluates the emotional tone of the text. The framework supports different input formats such as text, images, and URLs, allowing flexible evaluation of news content obtained from online sources. Functional testing of the system shows consistent prediction behavior across these input types. The results indicate that combining transformer-based text and image analysis with simple explainability techniques can provide a practical and understandable approach for detecting potentially misleading news content.








