ecx-4.0-21-days-data-science-challenge's Introduction

ECX 4.0 21 Days Data Science Challenge: Iris Classification

This is a challenge organized by the Engineering Career Expo Unilag. A challenge to test our data science skills for 21 days. In this challenge, I dealt with the Iris dataset and built a model that predicts the specie of an Iris by its measurements.

Tools

The following tools were used for different areas of the project:

Python Libraries:
- Pandas: for data analysis and manipulation
- Seaborn: a library based on matplotlib and it provides a high-level interface for data visualization
- matplotlib: for data visualization
- Joblib: Saving our model for deployment
Scikit Learn (Python Machine Learning Library):
- GridSearchCV and RandomSearchCV: Hyperparameter tuning
- StandardScaler: for standardization of numeric features
- LabelEncoder: for encoding oyr categorical features
- RandomForestClassifier, SVC, LogisticRegression, DecisionTreeClassifier: ML algorithm for classification problems
Evaluation Metrics:
- Accuracy Score: Number of correctly predicted class over the total classes
- Precision: ratio of correctly predicted positive classes over the total positive classes
- Recall: ratio of correctly predicted positive class over the total classes
- Classification report: a report showing precision, recall and F-1 score
- ROC Curve: a plot showing the true positive rate(TPR) over false positive rate(FPR)
- Confusion matrix: a table for assessing the quality of our classification model prediction
Deployment: Streamlit