Coder Social home page Coder Social logo

johntoro-czaf / speeddating-sc1015-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 2.79 MB

This is a Mini-Project for SC1015(Introduction to Data Science and Artificial Intelligence) which focuses on speed-dating from Speed Dating Experiment.

Jupyter Notebook 100.00%

speeddating-sc1015-project's Introduction

Welcome to Speed-Dating-analysis repository

About

This is a Mini-Project for SC1015(Introduction to Data Science and Artificial Intelligence) which focuses on speed-dating from Speed Dating Experiment. For detailed information, please view the source code in order from:

  1. Preliminary Exploratory Analysis
  2. Attributes And satIncome
  3. Similarities
  4. Logistic Regression
  5. OtherModels(RandomForest, XGBoost, OLS)

Contributors

  1. Ng Zheng Kai - U2122921J - @nzkai
  2. Kelvin Pang - U2122086A - @kelpjr
  3. Phan Nhat Hoang - U2120111G - @JohnToro
Contribution List
Ng Zheng Kai Presentation Slides, EDA, Categories, Other Model
Kelvin Pang Presentation Slides, EDA, Other Model
Phan Nhat Hoang Data spliting, Similarities, Logistic Regression

Background

The choice of a marriage partner is one of the most serious decisions people face. In contemporary Western societies, this decision usually follows a long learning period during which people engage in more informal and often polygamous relation ships, i.e., dating. In particular, we analyze gender differences in dating preferences by analysing the speed-dating dataset.

As in all matching markets, determining dating preferences from equilibrium outcomes is difficult because a given correlation of attributes across partners is often consistent with various preference structures such as:

Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligence or ambition when it exceeds their own.

To overcome this problem, we use speed dating dataset, the dataset came from a survey which was conducted in carefully controlled dating environment by researchers. What we found in this dataset might consistent with known social structure theory or something we have not known yet.

Dataset

This dataset recored speed-dating dates which each one is a date between subject and his/her partner. Everything about the date was recorded including partner's and subject's dating preference(score on attributes they want in their partner), personal information: age, income, sat score. And the most important variable is match: indicating this date was successful or not. We will focus on this variable as a response.

Problem Definition

  • With so many different factors that can affect the result of a match which ones have the most impact?
  • How can we identify which variable as the most important to a match
  • Identify which variables appeared in dataset affect a male's or female's match separately. From there identify the difference in dating preference between male and female

Solving problem

  • By doing preliminary exploratory analysis, we observed that there are many variables that are irrelevant to the response. Also the dataset is prevalent with missing values. So we decided just take a look to into specific categories of variables: attributes, satIncome and similarity.
  • Cleaning data and do EDA on each of category of variables, by limiting our scope of exploring this dataset, we were able to control the sophisticated missing values in each category. Creating a better analysis for each category.
  • Aggregate existing variables to create variables that portrayed dating preference while the original can not.
  • Based on generalized linear probability model, the coefficient of appropriate predictive or regression model can be used as the magnitude of variable's importance. By using different models and normalizing technique, we were able to extract stastically significant dating preference of both male and female.

Problem Categories that we splitted into

  • 1st Category ( 6 Key Attributes )
    • Attractive, Sincerity, Intelligence, Fun, Ambition and Shared Interest
  • 2nd Category ( Intelligence & Income )
    • Sat Score & Income
  • 3rd Category ( Similarities )
    • Shared Interest, Race, Field Of Study, Region

Models Used

  1. Logistic Regression
  2. Random Forest
  3. XGBoost
  4. Ordinary Least Squares(OLS)

Conclusion

  • The ideal traits that Males look for in Females are:
    • Fun
    • Attractive
    • Intelligence
    • Similar interest
  • The ideal traits that Females look for in Males are:
    • Humurous
    • Inteligence
    • Similar field of study
  • Finally, we look at the importance of similarity.
    • Women strongly discriminate on the basis of race. They are more than 8% more likely to accept a partner of their own race. Given the underlying match rate is only around 38%, this is a large effect.
    • Men, on the other hand, do not exhibit a significant racial preference. Whether this difference stems from gender-specific dating goals or reflects a more fundamental gender difference is difficult to ascertain from our data.
    • Being in the same field of study has no predictive power, but both men and women prefer partners from the same region of the world.

What did we learn from this project?

  • Handling imbalance dataset using regEx, categorizing the similar data
  • Logistic Regression from sklearn
  • XGBoost & OLS
  • API Usage
  • Other packages such as pandasql
  • Collaborating using GitHub
  • Normalizing data

Future work

  • Exploring racial dating preference
  • Based on significant features, creating a high accuracy predicting a match. Benchmarking on different methods of sampling data

References

speeddating-sc1015-project's People

Contributors

johntoro-czaf avatar nzkai avatar

Watchers

 avatar

Forkers

nzkai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.