Search for collections on University of Merdeka Malang Repository

Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest

Mustofa, Fachrul, Safriandono, Ahmad Nuruddin, Muslikh, Ahmad Rofiqul ORCID: https://orcid.org/0009-0000-2457-6803 and Setiadi, De Rosal Ignatius Moses (2023) Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest. Journal of Computing Theories and Applications (JCTA), 1 (1). pp. 41-49. ISSN 3024-9104

[thumbnail of Dataset and Feature Analysis for Diabetes Mellitus.pdf]
Preview
Text
Dataset and Feature Analysis for Diabetes Mellitus.pdf

Download (356kB) | Preview
[thumbnail of Similarity Report Dataset and Feature Analysis for Diabetes Mellitus.pdf]
Preview
Text
Similarity Report Dataset and Feature Analysis for Diabetes Mellitus.pdf

Download (443kB) | Preview

Abstract

: Diabetes Mellitus is a hazardous disease, and according to the World Health Organization (WHO), diabetes will be one of the main causes of death by 2030. One of the most popular diabetes datasets is PIMA Indians, and this dataset has been widely tested on various machine learning (ML) methods, even deep learning (DL). But on average, ML methods are not able to produce good accu-racy. The quality of the dataset and features is the most influential thing in this case, so deeper investment is needed to examine this dataset. This research will analyze and compare the PIMA Indians and Abelvikas datasets using the Random Forest (RF) method. The two datasets are imbalanced, in fact, the Abelvikas dataset is more imbalanced and has a larger number of classes so it is be more complex. The RF was chosen because it is one of the ML methods that has the best results on various diabetes datasets. Based on the test results, very contrasting results were obtained on the two datasets. Abelvikas had accuracy, precision, and recall, reaching 100%, and PIMA Indians only achieved 75% for accuracy, 87% for precision, and 80% for the best recall. Testing was done with 3, 5, 7, 10, and 15 tree number parameters. Apart from that, it was also tested with k-fold validation to get valid results. This determines that the features in the Abelvikas dataset are much better because more complete glucose features support them

Item Type: Article
Additional Information: Ahmad Rofikul Muslikh NIDN: 0724038903
Uncontrolled Keywords: Classification Diabetes Types; Comprehensive analysis for diabetes types classification; Prediction for health technology; Random Forest; Feature Analysis; Abelvikas Dataset.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > R Medicine (General)
Divisions: Fakultas Teknologi Informasi > S1 Sistem Informasi
Depositing User: Rita Juliani
Date Deposited: 13 Mar 2024 03:35
Last Modified: 13 Mar 2024 03:35
URI: https://eprints.unmer.ac.id/id/eprint/4097

Actions (login required)

View Item View Item