Coder Social home page Coder Social logo

mahmoudsallem / airline-passenger-satisfaction-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yogeshwaran-shanmuganathan/airline-passenger-satisfaction

0.0 0.0 0.0 4.67 MB

Determining the important factors that influences the customer or passenger satisfaction of an airlines using CRISP-DM methodology in Python and RapidMiner.

License: MIT License

Jupyter Notebook 100.00%

airline-passenger-satisfaction-1's Introduction

Airline-Passenger-Satisfaction

Objective

The objective or goal of this project is to guide an airlines company to determine the important factors that influences the customer or passenger satisfaction.
Customer satisfaction plays a major role in affecting the business of a company therefore analysing and improving the factors that are closely related to customer satisfaction is important for the growth and reputation of a company.

In this project, the CRISP-DM methodology is implemented to derive an appropriate solution for a business problem. It is carried out in six phases - Business understanding, Data understanding, Data preparation, Data Modelling, Evaluation and Deployment.

About Data

The dataset for this project is obtained from Kaggle which contains the data sourced from a survey conducted by airlines on the satisfaction level of passengers/customers based on various factors. The dataset consists of 25 columns such as Age, Gender, Travel class, Arrival and Departure delays and also features that influences customer satisfaction level such as On-board service, Cleanliness, Seat comfort, Baggage handling etc.
The dataset consists of a column or feature named ‘satisfaction’ which describes the overall satisfaction level of the customer. It has two values, ‘neutral or dissatisfied’ and ‘satisfied’. This satisfaction feature is considered as the label feature since it conveys the overall experience of the customer based on the ratings given for other features. The dataset consists of 103904 and 25976 records in train and test respectively.

Data Cleaning and Visualisation

Data cleaning plays a key role in deriving the output of a machine learning model. Usually data cleaning consists of processes like determining outliers and removing or imputing outliers, removing or replacing missing values, removing duplicate values, removing values with less or no importance.
In this project, the ‘Arrival Delay in Minutes’ column has 310 missing values in it. These missing values are imputed with the mean values of the non-missing values of the same column.
Data Visualisation plays an important role in understanding the data as it gives an overview of the data before the model implementation. Exploratory Data Analysis is done for the dataset.

Feature Selection

Correlation among the features are found by generating a correlation map. The top ten features are selected using Chi-Square method. The importance of features are determined using Wrapper method and feature permutation importance technique.

Models

Eight models are used in this project to check for maximum efficiency. They are,

  • Logistic Regression
  • Naive Bayes
  • KNN
  • Decision Tree
  • Neural Network
  • Random Forest
  • XGBoost
  • AdaBoost

Conclusion

Random Forest and AdaBoost have performed equally and produced high ROC_AUC score (~90%). But Random Forest took lesser amount of time compared to time taken by AdaBoost. Therefore, we can conclude that Random Forest as the best model.

Note: This was a part of my academic assignment for Data Mining module.

airline-passenger-satisfaction-1's People

Contributors

yogeshwaran-shanmuganathan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.