This GitHub repository houses a comprehensive project aimed at predicting obesity levels based on various factors, including dietary habits, physical activities, and genetic predispositions. Utilizing machine learning models such as RandomForestClassifier, XGBoost, and LightGBM, the project encompasses data querying from Kaggle, preprocessing, feature engineering, and model evaluation to explore and predict obesity risk factors effectively.
Contains scripts and Jupyter notebooks for data retrieval, preprocessing, analysis, and visualization. This includes detailed code for setting up environment variables, using Kaggle API for data download, cleaning data, performing exploratory data analysis, and training machine learning models.
Holds the raw and processed datasets along with a detailed explanation of the data cleaning, preprocessing steps, and feature engineering techniques applied. A README within this folder provides insights into the data query process and the structure of the dataset.
Documents the methodologies adopted for data analysis, including the rationale behind choosing specific machine learning models, model training, evaluation strategies, and interpretation of results. This section aims to provide a clear understanding of the analytical approach and statistical techniques employed in the project.
Zhe Niu
Zhe Niu is a Bachelor of Science in Data Science student at Duke Kunshan University, expected to graduate in June 2024. He has a strong background in finance and AI, with experience in research and practical applications of data science in the financial industry.