Recursive Journal of Informatics

Neural Network Optimization Using Hybrid Adaptive Mutation Particle Swarm Optimization and Levenberg-Marquardt in Cases of Cardiovascular Disease

Rima Ayu Cahyani, Aji Purwinarko — Mon, 30 Sep 2024 00:00:00 +0700

Abstract. Cardiovascular disease is a condition generally characterized by the narrowing or blockage of blood vessels, which can lead to heart attacks, chest pain, or strokes. It is the leading cause of death worldwide, accounting for approximately 31% or 17.9 million deaths each year globally. Deaths caused by cardiovascular disease are projected to continue increasing until 2030, with the number of patients reaching 23.3 million. As cases of death due to cardiovascular disease become more prevalent, early detection is crucial to reduce mortality rates.

Purpose: Many previous researchers have conducted studies on predicting cardiovascular disease using neural network methods. This study extends these methods by incorporating feature selection and optimization with Hybrid AMPSO-LMA. The research is designed to explore the implementation and predictive outcomes of Hybrid AMPSO-LMA in optimizing MLP for cases of cardiovascular disease.

Methods/Study design/approach: The first step in conducting this research is to download the Heart Disease Dataset from Kaggle.com. The dataset is processed through preprocessing by removing duplicates and transforming the data. Then, data mining processes are carried out using the MLP algorithm optimized with Hybrid AMPSO-LMA to obtain results and conclusions. This system is designed using the Python programming language and utilizes Flask for website access in HTML.

Result/Findings: The research results demonstrate that the method employed by the author successfully improves the accuracy of predicting cardiovascular disease. Predicting cardiovascular disease using the MLP algorithm yields an accuracy of 86.1%, and after optimization with Hybrid AMPSO-LMA, the accuracy increases to 86.88%.

Novelty/Originality/Value: This effort will contribute to the development of a more reliable and effective cardiovascular disease prediction system, with the goal of early identification of individuals exhibiting symptoms of cardiovascular disease.

Implementation of Random Forest with Synthetic Minority Oversampling Technique and Particle Swarm Optimization for Predicting Survival of Heart Failure Patients

Untsa Zaaidatunni'mah, Endang Sugiharti — Mon, 30 Sep 2024 00:00:00 +0700

Abstract. Heart failure is caused by a disruption in the heart’s muscle wall, which results in the heart’s inability to pump blood in sufficient quantities to meet the body’s demand for blood. The increasing prevalence and mortality rates of heart failure can be reduced through early disease detection using data mining processes. Data mining is believed to aid in discovering and interpreting specific patterns in decision-making based on processed information. Data mining has also been applied in various fields, one of which is the healthcare sector. One of the data mining techniques used to predict a decision is the classification technique.

Purpose: This research aims to apply SMOTE and PSO to the Random Forest classification algorithm in predicting the survival of heart failure patients and to determine its accuracy results.

Methods/Study design/approach: To predict the survival of heart failure patients, we utilize the Random Forest classification algorithm and incorporate data imbalance handling with SMOTE and feature selection techniques with PSO on the Heart Failure Clinical Records Dataset. The data mining process consists of three distinct phases.

Result/Findings: The application of SMOTE and PSO on the Heart Failure Clinical Records Dataset in the Random Forest classification process resulted in an accuracy rate of 93.9%. In contrast, the Random Forest classification process without SMOTE and PSO resulted in an accuracy rate of only 88.33%. This indicates that the proposed method combination can optimize the performance of the classification algorithm, achieving a higher accuracy compared to previous research.

Novelty/Originality/Value: Data imbalance and irrelevant features in the Heart Failure Clinical Records Dataset significantly impact the classification process. Therefore, this research utilizes SMOTE as a data balancing method and PSO as a feature selection technique in the Heart Failure Clinical Records Dataset before the classification process of the Random Forest algorithm.

Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm

Muhammad Fikri, Zaenal Abidin — Mon, 30 Sep 2024 00:00:00 +0700

Abstract.

This system was developed to determine the accuracy of sentiment analysis on Twitter regarding the COVID-19 issue using the Nazief-Adriani and Porter stemmers with TF-IDF weighting, along with a classification process using K-Nearest Neighbor (KNN) that resulted in a comparison of 48.24% for Nazief-Adriani and 48.24% for Porter.

Purpose: This research aims to determine the accuracy of the Nazief-Adriani and Porter stemmer algorithms in performing text preprocessing using a dataset from Indonesian-language Twitter. This research involves word weighting using TF-IDF and classification using the K-Nearest Neighbor (KNN) algorithm.

Methods/Study design/approach: The experimentation was conducted by applying the Nazief-Adriani and Porter stemmer algorithm methods, utilizing data sourced from Twitter related to COVID-19. Subsequently, the data underwent text preprocessing, stemming, TF-IDF weighting, accuracy testing of training and testing data using K-Nearest Neighbor (KNN) algorithm, and the accuracy of both stemmers was calculated employing a confusion matrix table.

Result/Findings: This study obtained reasonably accurate results in testing the Nazief-Adriani stemmer with an accuracy of 50.98%, applied to sentiment analysis of COVID-19-related Twitter data using the Indonesian language. As for the accuracy of the Porter stemmer, it achieved an accuracy rate of 48.24%.

Novelty/Originality/Value: Feature selection is crucial in stemmer accuracy testing. Therefore, in this study, feature selection is carried out using the Nazief-Adriani and Porter stemmers for testing purposes, and the accuracy data classification is conducted using the K-Nearest Neighbor (KNN) algorithm

Implementation of Raita Algorithm in Manado-Indonesia Translation Application with Text Suggestion Using Levenshtein Distance Algorithm

Novanka Agnes Sekartaji, Riza Arifudin — Mon, 30 Sep 2024 15:30:23 +0700

Abstract. Manado City is one of the multidimensional and multicultural cities, possessing assets that are considered highly potential for development into tourism and development attractions. The current tourism assets being developed by the Manado City government are cultural tourism, as they hold a charm and allure for tourists. Hence, a communication tool in the form of a translation application is necessary for facilitating communication between visiting tourists and the native community of North Sulawesi, even for newcomers who intend to reside in North Sulawesi, given that the Manado language serves as the primary communication tool within the community. This research employs a combination of the Raita algorithm and the Levenshtein distance algorithm in its creation process, along with the confusion matrix method to calculate the accuracy of translation results using the Levenshtein distance algorithm with a text suggestion feature. The research begins by collecting a dataset consisting of Manado language vocabulary and their translations in Indonesia language, sourced from literature studies and original respondents from North Sulawesi, which have been validated by a validator to prevent translation data errors. The subsequent stage involves preprocessing the dataset, converting the entire content of the dataset to lowercase using the case folding process, and removing spaces at the start and end of texts using the trim function. Next, both algorithms are implemented, with the Raita algorithm serving for translation and the Levenshtein distance algorithm providing text suggestions for typing errors during the translation process. The accuracy results derived from the confusion matrix calculations during the translation process of 100 vocabulary words, accounting for typing errors, indicate that the Levenshtein distance algorithm is capable of effectively translating vocabulary accurately and correctly, even in the presence of typing errors, resulting in a high accuracy rate of 94,17%.

Purpose: To determine the implementation of the Levenshtein distance and Raita algorithms in the process of using the Manado-Indonesian translation application, as well as the resulting accuracy level.

Methods/Study design/approach: In this study, a combination of the Raita and Levenshtein distance algorithms is utilized in the translation application system, along with the confusion matrix method to calculate accuracy.

Result/Findings: The accuracy achieved in the translation process using text suggestions from the Levenshtein distance algorithm is 94.17%.

Novelty/Originality/Value: This research demonstrates that the combination of the Raita and Levenshtein distance algorithms yields optimal results in the vocabulary translation process and provides accurate outcomes from the use of effective text suggestions. This is attributed to the fact that nearly all the data used was successfully translated by the system, even in the presence of typographical errors.

Optimizing Random Forest for Predicting Thoracic Surgery Success in Lung Cancer Using Recursive Feature Elimination and GridSearchCV

Deonisius Germandy Cahaya Putra, Anggyi Trisnawan Putra — Mon, 30 Sep 2024 00:00:00 +0700

Abstract. Lung cancer is one of the deadliest forms of cancer, claiming numerous lives annually. Thoracic surgery is a strategy to manage lung cancer patients; however, it poses high risks, including potential nerve damage and fatal complications leading to mortality. Predicting the success rate of thoracic surgery for lung cancer patients can be accomplished using data mining techniques based on classification principles. Medical data mining involves employing mathematical, statistical, and computational methods. In this study, the prediction of thoracic surgery success employs the random forest algorithm with recursive feature elimination for feature selection. The feature selection process yields the top 8 features. The 8 best features include 'DGN', 'PRE4', 'PRE5', 'PRE6', 'PRE10', 'PRE14', 'PRE30', and 'AGE'. Hyperparameter using GridSearchCV is then applied to enhance classification accuracy. The results of this method implementation demonstrate a predictive accuracy of 91.41%.

Purpose: The study aims to develop and evaluate a Random Forest model with a Recursive Feature Elimination feature selection and applies hyperparameter GridSearchCV for predicting thoracic surgery success rate.

Methods: This study uses the thoracic surgery dataset and applies various data preprocessing techniques. The dataset is then used for classification using the Random Forest algorithm and applies the Recursive Feature Elimination feature selection to obtain the best features. GridSearchCV is used in this study for hyperparameter.

Result: The accuracy using the Random Forest algorithm and Recursive Feature Elimination feature selection with hyperparameters tuning GridSearchCV resulted in an accuracy of 91,41%. The accuracy was obtained from the following parameters values: bootstrap set to false, criterion set to gini, n_estimator equal to 100, max_depth set to none, min_samples_split equal to 4, min_samples_leaf equal to 1, max_features set to auto, n_jobs set to -1, and verbose set to 2 with 10-fold cross validation.

Novelty: This study comparison and analysis of various dataset preprocessing methods and different model configurations are conducted to find the best model for predicting the success rate of thoracic surgery. The study also employs feature selection to choose the best feature to be used in classification process, as well as hyperparameter tuning to achieve optimal accuracy and discover the optimal values for these hyperparameters.

Sentiment Analysis on Twitter Social Media Regarding Covid-19 Vaccination with Naive Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT)

Angga Riski Dwi Saputra, Budi Prasetiyo — Mon, 30 Sep 2024 15:36:47 +0700

Abstract. The Covid-19 vaccine is an important tool to stop the Covid-19 pandemic, however, there are pros and cons from the public regarding this Covid-19 vaccine.

Purpose: These responses were conveyed by the public in many ways, one of which is through social media such as Twitter. Responses given by the public regarding the Covid-19 vaccination can be analyzed and categorized into responses with positive, neutral or negative sentiments.

Methods: In this study, sentiment analysis was carried out regarding Covid-19 vaccination originating from Twitter using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. The data used in this study is public tweet data regarding the Covid-19 vaccination with a total of 29,447 tweet data in English.

Result: Sentiment analysis begins with data preprocessing on the dataset used for data normalization and data cleaning before classification. Then word vectorization was performed with TF-IDF and data classification was performed using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. From the classification results, an accuracy value of 73% was obtained for the Naïve Bayes Classifier (NBC) algorithm and 83% for the Bidirectional Encoder Representations from Transformers (BERT) algorithm.

Novelty: A direct comparison between classical models such as NBC and modern deep learning models such as BERT offers new insights into the advantages and disadvantages of both approaches in processing Twitter data. Additionally, this study proposes temporal sentiment analysis, which allows evaluating changes in public sentiment regarding vaccination over time. Another innovation is the implementation of a hybrid approach to data cleansing that combines traditional methods with the natural language processing capabilities of BERT, which more effectively addresses typical Twitter data issues such as slang and spelling errors. Finally, this research also expands sentiment classification to be multi-label, identifying more specific sentiment categories such as trust, fear, or doubt, which provides a deeper understanding of public opinion.

Development of Digital Forensic Framework for Anti-Forensic and Profiling Using Open Source Intelligence in Cyber Crime Investigation

Muhamad Faishol Hakim, Alamsyah Alamsyah — Mon, 30 Sep 2024 00:00:00 +0700

Abstract. Cybercrime is a crime that increases every year. The development of cyber crime occurs by utilizing mobile devices such as smartphones. So it is necessary to have a scientific discipline that studies and handles cybercrime activities. Digital forensics is one of the disciplines that can be utilized in dealing with cyber crimes. One branch of digital forensic science is mobile forensics which studies forensic processes on mobile devices. However, in its development, cybercriminals also apply various techniques used to thwart the forensic investigation process. The technique used is called anti-forensics.

Purpose: It is necessary to have a process or framework that can be used as a reference in handling cybercrime cases in the forensic process. This research will modify the digital forensic investigation process. The stages of digital forensic investigations carried out consist of preparation, preservation, acquisition, examination, analysis, reporting, and presentation stages. The addition of the use of Open Source Intelligence (OSINT) and toolset centralization at the analysis stage is carried out to handle anti-forensics and add information from digital evidence that has been obtained in the previous stage.

Methods/Study design/approach: This research will modify the digital forensic investigation process. The stages of digital forensic investigations carried out consist of preparation, preservation, acquisition, examination, analysis, reporting, and presentation stages. The addition of the use of Open Source Intelligence (OSINT) and toolset centralization at the analysis stage is carried out to handle anti-forensics and add information from digital evidence that has been obtained in the previous stage. By testing the scenario data, the results are obtained in the form of processing additional information from the files obtained and information related to user names.

Result/Findings: The result is a digital forensic phase which concern on anti-forensic identification on media files and utilizing OSINT to perform crime suspect profiling based on the evidence collected in digital forensic investigation phase.

Novelty/Originality/Value: Found 3 new types of findings in the form of string data, one of which is a link, and 7 new types in the form of usernames which were not found in the use of digital forensic tools. From a total of 408 initial data and new findings with a total of 10 findings, the percentage of findings increased by 2.45%.