This is a challenge organized by the Engineering Career Expo Unilag. A challenge to test our data science skills for 21 days. In this challenge, I dealt with the Iris dataset and built a model that predicts the specie of an Iris by its measurements.
The following tools were used for different areas of the project:
-
Python Libraries:
Pandas
: for data analysis and manipulationSeaborn
: a library based on matplotlib and it provides a high-level interface for data visualizationmatplotlib
: for data visualizationJoblib
: Saving our model for deployment
-
Scikit Learn (Python Machine Learning Library):
GridSearchCV and RandomSearchCV
: Hyperparameter tuningStandardScaler
: for standardization of numeric featuresLabelEncoder
: for encoding oyr categorical featuresRandomForestClassifier, SVC, LogisticRegression, DecisionTreeClassifier
: ML algorithm for classification problems
-
Evaluation Metrics:
Accuracy Score
: Number of correctly predicted class over the total classesPrecision
: ratio of correctly predicted positive classes over the total positive classesRecall
: ratio of correctly predicted positive class over the total classesClassification report
: a report showing precision, recall and F-1 scoreROC Curve
: a plot showing the true positive rate(TPR) over false positive rate(FPR)Confusion matrix
: a table for assessing the quality of our classification model prediction
-
Deployment:
Streamlit