Paper Title
Phish Guard: A Hybrid Multimodal Phishing Detection System Using XG Boost and Deep Learning Transformers
Abstract
Phishing attacks remain a critical cybersecurity threat, evolving constantly to bypass traditional filters. Early and accurate detection is crucial to prevent data breaches and financial loss. This project presents an automated, production-ready system for phishing detection using a Dual-Branch Ensemble architecture. The system combines the structural analysis of eXtreme Gradient Boosting (XGBoost) with the semantic understanding of DistilBERT. The XGBoost branch analyzes 38 structural URL features for ultra-fast inference, while the DistilBERT branch treats URLs as natural language to detect zero-day threats. Preprocessing involves feature extraction (entropy, character distribution) and tokenization. Implemented as a Flask-based API with a Chrome Extension for real-time protection, the model achieves high precision with an inference time of under 5ms for structural checks and 150ms for semantic analysis. This approach offers a scalable, efficient, and user-accessible diagnostic tool for modern web security.
Keywords - Phishing Detection, Cybersecurity, Machine Learning, Deep Learning, XGBoost, DistilBERT, Hybrid Ensemble Model, URL Analysis, Chrome Extension