This project focuses on predicting loan approval outcomes using machine learning. The dataset consists of various features such as applicant income, coapplicant income, loan amount, gender, marital status, dependents, education, self-employment status, property area, credit history, and loan amount term.
The following data preprocessing steps have been applied to prepare the data for machine learning models:
-
Handling Dependents Column:
- Replaced '3+' in the 'Dependents' column with the numeric value 3.
-
Removing Columns:
- Dropped the 'Loan_ID' column from the dataset.
-
Categorical Columns:
- Identified and printed the categorical columns in the dataset.
- Used these categorical columns for ordinal encoding.
-
Numerical Columns:
- Identified and printed the numerical columns in the dataset.
-
Train-Test Split:
- Split the data into training and testing sets using the
train_test_split
function.
- Split the data into training and testing sets using the
Three machine learning models have been implemented for loan approval prediction:
-
Logistic Regression:
- Utilized a logistic regression model with balanced class weights.
- Implemented a pipeline with ordinal encoding.
-
Naive Bayes:
- Implemented a Gaussian Naive Bayes model.
- Utilized ordinal encoding in a pipeline.
-
Decision Tree Classifier:
- Implemented a decision tree classifier with specified parameters (random_state, max_depth, min_samples_split).
- Used ordinal encoding in a pipeline.
Evaluated the performance of each model using the following metrics:
-
F1 Score:
- Logistic Regression: 0.8132
- Naive Bayes: 0.8718
- Decision Tree Classifier: 0.8377
-
Accuracy:
- Logistic Regression: 0.7236
- Naive Bayes: 0.7967
- Decision Tree Classifier: 0.7480
The machine learning models have been trained and evaluated for loan approval prediction. Naive Bayes achieved the highest F1 score, indicating good performance in terms of precision and recall. The models can be further fine-tuned, and additional features can be explored to improve predictive accuracy.