Performance Analysis of Support Vector Classification and Random Forest in Phishing Email Classification

Chaerul Umam; Lekso Budi Handoko; Folasade Olubusola Isinkaye

doi:10.15294/sji.v11i2.3301

Authors

Chaerul Umam Universitas Dian Nuswantoro Author
Lekso Budi Handoko Universitas Dian Nuswantoro Author
Folasade Olubusola Isinkaye Ekiti State University Author

DOI:

https://doi.org/10.15294/sji.v11i2.3301

Keywords:

Email physing, Classification, Support Vectore Classification, Random Forest

Abstract

Purpose: This study aims to conduct a performance analysis of phishing email classification system using machine learning algorithms, specifically Random Forest and Support Vector Classification (SVC).

Methods/Study design/approach: The study employed a systematic approach to develop a phishing email classification system utilizing machine learning algorithms. Implementation of the system was conducted within the Jupyter Notebook IDE using the Python programming language. The dataset, sourced from kaggle.com, comprised 18,650 email samples categorized into secure and phishing emails. Prior to model training, the dataset was divided into training and testing sets using three distinct split percentages: 60:40, 70:30, and 80:20. Subsequently, parameters for both the Random Forest and Support Vector Classification models were carefully selected to optimize performance. The TF-IDF Vectorizer method was employed to convert text data into vector form, facilitating structured data processing.

Result/Findings: The study's findings reveal notable performance accuracies for both the Random Forest model and Support Vector Classification across varying data split percentages. Specifically, the Support Vector Classification consistently outperforms the Random Forest model, achieving higher accuracy rates. At a 70:30 split percentage, the Support Vector Classification attains the highest accuracy of 97.52%, followed closely by 97.37% at a 60:40 split percentage.

Novelty/Originality/Value: Comparisons with previous studies underscored the superiority of the Support Vector Classification model. Therefore, this research contributes novel insights into the effectiveness of this machine learning algorithms in phishing email classification, emphasizing its potential in enhancing cybersecurity measures.

Performance Analysis of Support Vector Classification and Random Forest in Phishing Email Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Article ID

Published

Issue

Section

How to Cite

Main-Sidebar

Keywords

Stat Counter