Coder Social home page Coder Social logo

yoris95 / classification-of-obesity-status-in-indonesia-using-xgboost-and-adasyn-n-method Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 71 KB

Classification of Obesity Status in Indonesia Using XGBoost & ADASYN-N Method

R 100.00%
machine-learning adasyn decision-tree feature-selection obesity rstudio xgboost boosting ensemble-machine-learning

classification-of-obesity-status-in-indonesia-using-xgboost-and-adasyn-n-method's Introduction

Introduction

Obesity is a condition due to excessive fat in the body, which can endanger health. Several risk factors that cause obesity in adult women are marital status, household income, domicile area, physical activity, energy and carbohydrate intake. Besides, genetic factors, psychological factors, improper lifestyle, bad eating habits, stress, and other trigger factors. The increasing availability of data and knowledge in the medical field has contributed to the rapid development in this field. Powerful machine learning is required to meet the pattern recognition needs of medical data, including obesity data. This study is aimed to determine the factors that influence obesity status in Indonesia. XGBoost (Extreme Gradient Boosting) is a classification method that is often used because it has many advantages over classical classification methods. Adaptive Synthetic Nominal Algorithm (ADASYN-N) can be used to improve the prediction accuracy of imbalanced data. Both methods will be applied to the Obesity data from the 2013 Indonesian Basic Health Research.

Data Source

The data used in this study is secondary data, namely Indonesian obesity data obtained through the 2013 Indonesian Basic Health Research. The obesity data had 722.329 observations and 12 variables. After filtering process, there were found 17.352 data with "NA" (missing values). The data were then reduced to 704.977.

Algorithm

  1. Data Preparation
  • Data filtering is conducted with the aim of obtaining complete and ready to-use data in research. The variables used are selected variables based on the relevant literature, which consists of categorical data.
  1. Data Partition
  • Data is divided into training data and testing data with the proportion of 80% and 20%.
  1. ADASYN Method
  • Balancing the imbalanced data using ADASYN-N.
  1. Modelling Stage
  • Using XGBoost to build model.
  1. Model Performance Evaluation
  • Evaluating the model built by calculating the values of accuracy, sensitivity, specifications, and AUC.
  1. Feature Importance
  • Ranking the features by using the best model obtained after comparing the XGBoost with ADASYN-N model and the XGBoost without ADASYN-N model.

Result

Model Comparison

Based on the image above, the best model obtained is XGBoost with ADASYN. We then conduct the feature importance based on the best model obtained.

Feature Importance

Based on the XGBoost model with ADASYN-N, the most important factor influencing obesity status based on the 2013 Basic Health Research data is gender (female) factor, meaning that gender (X1) can reduce the highest heterogeneity. Other influencing factors are age 35-54 years (X2), strenuous activity (X4), and eating vegetables for 6 days (X7).

Author

classification-of-obesity-status-in-indonesia-using-xgboost-and-adasyn-n-method's People

Contributors

yoris95 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.