This project is tO Investigates various models to predict short-term (24 hour advance) AIR pollution in the 3 towns.
Python
Pandas, Numpy, Jupyter
xgboost
sklearn
Prophet
Data cleaning Machine Learning Regression Neural Networks Predictive Modelling
data processing/cleaning process data to generate autoregressive features data exploration visualize statistical modeling (auto)correlation analysis of model features and feature selection
The regression model is a statistical procedure that allows a researcher to estimate the linear relationship that relates two or more variables. There are two variable namely independent variable and a dependent variable. Regression is used for forecasting, time series modeling and finding the relationship between the variables. Regression analysis is a very important tool in data analysis. There are multiple benefits of using regression analysis. They are as follows; i. It indicates the significant relationships between the dependent variable and independent variable. ii. It indicates the strength of the impact of multiple independent variables on a dependent variable. Regression analysis has many techniques that are used to make predictions. These techniques are mostly driven by three metrics (number of in- dependent variables, the type of dependent variables and the shape of the regression line).
Linear relationships determine the straight line relationship that connects X and Y. Y = a + BX Where a is called the Y-intercept, and b is the slope/ gradient of the line. Random Forests regression (RFR) Random Forests Regressor (RFR) introduced by Breiman [1] to augment the robustness of regression trees. RFR is the results of a combination of tree predictors where each tree depends on the value of a random vector sampled independently.
Gaussian process regression (GPR), is an efficient nonlinear modeling method that effectively forecasts and interprets data using covariance functions de- rived from base kernels.