Paper Title
A Hybrid Machine Learning Framework for Adaptive Phishing Detection using NLP and Canopy Feature Selection
Abstract
Phishing continues to be a serious cyber security problem that preys on human as well as technical weaknesses to get hold of sensitive credentials and financial information. This article proposes a hybrid machine learning system that combines supervised and unsupervised learning models with Natural Language Processing (NLP)-based semantic analysis and Canopy feature selection to enhance phishing detection accuracy and efficiency. The proposed system extracts lexical, content based, and technical features from standard phishing datasets and uses a stacking ensemble learning method to integrate the predictions of individual models for strong classification. Canopy clustering is applied for dimensionality reduction and effective feature selection, whereas the NLP modules identify contextual and linguistic patterns in phishing emails. The experimental results indicate that the hybrid model attains a 98.8% accuracy, 98.9% precision, 98.6% recall, and a significantly decreased false positive rate of 1.4%, thus running ahead of the present single model methods. The findings validate that the hybrid architecture proposed is highly adaptable, scalable, and suitable for real-time use in enterprise-level email security and web-based anti-phishing systems.
Keywords - Phishing, Cyber Security, Machine Learning, Hybrid Model, NLP, Feature Selection, Email Security