ENHANCING PDF MALWARE CLASSIFICATION USING CTGAN-BASED DATA AUGMENTATION AND SUPERVISED LEARNING
- Faculty of Mathematics and Computer Science, University Felix Houphouet-Boigny, Cote dIvoire.
- National High School of Architecture and Urban Planning, University of Bondoukou, Cote divoire.
- Faculty of Sciences and Technologies, University Abdelmalek Essaadi, Tangier, Morocco.
- Abstract
- Keywords
- Cite This Article as
- Corresponding Author
The increasing sophistication of cyberattacks exploiting PDF files poses a critical challenge to digital security. This study presents an intelligent detection framework that combines synthetic data augmentation and cutting-edge machine learning techniques to identify malicious PDF documents with high precision. To address the issue of class imbalance often found in cybersecurity datasets, we employ Conditional Tabular GAN (CTGAN) to generate realistic synthetic samples, thereby enriching the training set and improving the generalization capability of classifiers.Six supervised models are assessed, Decision Tree, Random Forest, XGBoost, Support Vector Machine, Naive Bayes, and Neural Network, using the augmented dataset. Among them, XGBoost consistently delivers the most robust performance. To foster transparency and trust, the framework integrates SHapley Additive exPlanations (SHAP), enabling a clear interpretation of feature contributions to each classification decision.Overall, this work introduces a comprehensive and explainable approach to strengthening PDF document security, offering a promising path for deployment in sensitive organizational environments such as government, education, and enterprise systems.
[Amadou Diabagate, Yazid Hambally Yacouba, Adama Coulibaly and Abdellah Azmani (2025); ENHANCING PDF MALWARE CLASSIFICATION USING CTGAN-BASED DATA AUGMENTATION AND SUPERVISED LEARNING Int. J. of Adv. Res. (Sep). 738-757] (ISSN 2320-5407). www.journalijar.com
National High School of Architecture and Urban Planning, University of Bondoukou, Côte d’ivoire
Cote d