Classification of Spiral and Non-Spiral Galaxies using Decision Tree Analysis and Random Forest Model: A Study on the Zoo Galaxy Dataset

Lulut Alfaris(1), Ruben Cornelius Siagian(2), Aldi Cahya Muhammad(3), Ukta Indra Nyuswantoro(4), Nazish Laeiq(5), Froilan Delute Mobo(6),


(1) Department of Marine Technology, Politeknik Kelautan dan Perikanan Pangandaran, Indonesia
(2) Departement of Physics, Universitas Negeri Medan, Indonesia
(3) Department of Business Development, Radiant Utama Interinsco, Indonesia
(4) Departement of Structure Enginering, Asiatek Energi Mitratama, Indonesia
(5) Department of Computer Science, Institute of Technology and Management Aligarh, India
(6) Department of Research Development and Extension, Philippine Merchant Marine Academy, Philippines

Abstract

Purpose: The goal of this research is to create a precise prediction model that can differentiate between spiral and non-spiral galaxies using the Zoo galaxy dataset. Decision tree analysis and random forest models will be used to construct the model, and various conditions within the dataset will be employed to classify the data accurately. The model's performance will be evaluated using a confusion matrix, and the probability of predicting spiral galaxies will be analyzed. The research will also investigate the differences in Total Power among signal types and identify Peak Frequency and Bandwidth values consistent across all signal types. This study is expected to provide important insights into galaxy classification and signal characteristics, specifically in the fields of astronomy and astrophysics.

Methods: This study utilized the decision tree analysis research method to create a predictive model for identifying spiral galaxies using the Zoo galaxy dataset. The research approach focused on analyzing data before constructing a prediction model. The study did not involve random sampling, making it an observational study. Decision tree analysis was employed to classify galaxies into homogeneous groups, and a random forest model was used to classify galaxy types. This research provides insights into how decision tree analysis can be utilized to comprehend galaxy classification and can serve as a foundation for future research. To strengthen the conclusions, combining this research with other approaches such as experiments or random sampling can be considered.

Result: This study developed a predictive model for classifying galaxies based on their Spiral type using decision tree analysis on the Zoo galaxy dataset. The model divided the data into specific groups based on certain conditions, and the results demonstrated exceptional accuracy of the random forest model in categorizing galaxy types. In addition, the study investigated various signal types in galaxies and found variations in Total Power, but consistent values for Peak Frequency and Bandwidth at 2 in all signals. These findings provide valuable insights into galaxy classification and signal characteristics, which could have practical applications in communication, signal processing, and analysis. The utilization of decision tree analysis and random forest models for galaxy classification and signal analysis represents an innovative approach in this field.

Novelty: The novelty of this research lies in the new approach to categorizing galaxy types using decision tree and random forest models. Previously, the approach used to categorize galaxy types was through visual methods and observations via telescopes. This new approach provides a new and potentially more efficient way of processing galaxy image data, resulting in faster and more accurate categorization. Moreover, this research contributes to the development of signal analysis applications such as Total Power, Peak Frequency, and Bandwidth, which were previously only used in the fields of astronomy and astrophysics. However, they have the potential for wider applications in the fields of communication, signal processing, and analysis beyond astronomy

Keywords

Galaxy classification, Decision tree analysis, Random forest model, Spiral and non-spiral galaxies, Signal characteristics

Full Text:

PDF

References

K. Tadaki et al., “Spin parity of spiral galaxies II: a catalogue of 80 k spiral galaxies using big data from the Subaru Hyper Suprime-Cam survey and deep learning,” Monthly Notices of the Royal Astronomical Society, vol. 496, no. 4, pp. 4276–4286, 2020.

J. R. Webb, “External galaxies,” in Extragalactic Astrophysics (Second Edition), IOP Publishing, 2022.

J. J. Eldridge and E. R. Stanway, “New Insights into the Evolution of Massive Stars and Their Effects on Our Understanding of Early Galaxies,” Annual Review of Astronomy and Astrophysics, vol. 60, pp. 455–494, 2022.

R.-C. Chen, C. Dewi, S.-W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” Journal of Big Data, vol. 7, no. 1, p. 52, 2020.

D. Kocev, C. Vens, J. Struyf, and S. Džeroski, “Ensembles of multi-objective decision trees,” presented at the Machine Learning: ECML 2007: 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007. Proceedings 18, Springer, 2007, pp. 624–631.

S. Holm and P. K. Eide, “The frequency domain versus time domain methods for processing of intracranial pressure (ICP) signals,” Medical engineering & physics, vol. 30, no. 2, pp. 164–170, 2008.

J. Maria Navin and R. Pankaja, “Performance analysis of text classification algorithms using confusion matrix,” Int. J. Eng. Tech. Res. IJETR, vol. 6, pp. 75–78, 2016.

S. J. Lim, S. J. Jang, J. Y. Lim, and J. H. Ko, “Classification of snoring sound based on a recurrent neural network,” Expert Systems with Applications, vol. 123, pp. 237–245, 2019.

P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: can we do better?,” presented at the Proceedings of the 8th Working Conference on Mining Software Repositories, 2011, pp. 207–210.

R. V. McCarthy et al., “Predictive models using decision trees,” Applying Predictive Analytics: Finding Value in Data, pp. 123–144, 2019.

R. Abascal-Mena and E. López-Ornelas, “Author detection: Analyzing tweets by using a Naïve Bayes classifier,” Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2331–2339, 2020.

M.-J. Jun, “A comparison of a gradient boosting decision tree, random forests, and artificial neural networks to model urban land use changes: The case of the Seoul metropolitan area,” International Journal of Geographical Information Science, vol. 35, no. 11, pp. 2149–2167, 2021.

K. Wirsing, “Time frequency analysis of wavelet and Fourier transform,” Wavelet theory, 2020.

M. S. Alkhasawneh, U. K. Ngah, L. T. Tay, N. A. Mat Isa, and M. S. Al-Batah, “Modeling and testing landslide hazard using decision tree,” Journal of Applied Mathematics, vol. 2014, 2014.

L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers-a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 35, no. 4, pp. 476–487, 2005.

K. Kim, “A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree,” Pattern Recognition, vol. 60, pp. 157–163, 2016.

E. O. Brigham and R. Morrow, “The fast Fourier transform,” IEEE spectrum, vol. 4, no. 12, pp. 63–70, 1967.

D. Chicco and G. Jurman, “The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification,” BioData Mining, vol. 16, no. 1, pp. 1–23, 2023.

K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the sensitivity of support vector machines,” presented at the Proceedings of the international joint conference on AI, Stockholm, 1999, p. 60.

S. Visa, B. Ramsay, A. L. Ralescu, and E. Van Der Knaap, “Confusion matrix-based feature selection.,” Maics, vol. 710, no. 1, pp. 120–127, 2011.

A.-M. Šimundić, “Measures of diagnostic accuracy: basic definitions,” ejifcc, vol. 19, no. 4, p. 203, 2009.

S. Dhar and L. Shamir, “Systematic biases when using deep neural networks for annotating large catalogs of astronomical images,” Astronomy and Computing, vol. 38, p. 100545, 2022.

M. Reza, “Galaxy morphology classification using automated machine learning,” Astronomy and Computing, vol. 37, p. 100492, 2021.

C. Magri, U. Schridde, Y. Murayama, S. Panzeri, and N. K. Logothetis, “The amplitude and timing of the BOLD signal reflects the relationship between local field potential power at different frequencies,” Journal of Neuroscience, vol. 32, no. 4, pp. 1395–1407, 2012.

Refbacks

  • There are currently no refbacks.




Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: [email protected]

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.