Coder Social home page Coder Social logo

sigmod2020's Introduction

SIGMOD PROJECT

Team members:

  • Andrea Veneziano
  • Georgios Fotiadis
  • Gerald Sula

Code information

You can find all of our code in the src/ folder. Following are the explanation of what each notebook does.

  • description_exploration.ipynb: See the 20 attributes that appear the most often per store, to select which ones we keep for cleaning.
  • cleaning_george/: Folder with notebooks that perform data cleaning for a subset of the stores and save the results to new csv files.
  • cleaning_gerald/: Folder with notebooks that perform data cleaning for a subset of the stores and save the results to new csv files.
  • data_cleaning_andrea: Notebook that performs data cleaning for a subset of the stores and saves the results to new csv files.
  • clustering_with_details.ipynb: This is our main notebook. Reads the cleaned data, performs some additional cleaning and standarization and calculates the matches. Saves the result to a csv.
  • clustering.ipynb: First naive approach (and one with highest score), we group by brand, keep only the title and clean it, and find similarities in each cluster (might not work now because data has changed/cleaned).
  • add_info_to_labeled.ipynb: Add all the information we had to the given labeled dataset for training.
  • ML_approach.ipynb: Our very unsuccessful machine learning approach.
  • alibaba_ebay_clustering.ipynb: Tried clustering per store but gave up because it was taking too long to run (>12 hours).
  • add_store_cluster_to_brands.ipynb: Merge the clusters per brand and clusters per store together and save the results to a csv.
  • one_last_ride.ipynb: Our very last effort to try a different approach and increase our score. We put all the cleaned attributes together in a list, group by all the subsets of the list of size two and mark as matches the elements that belong to the same cluster.

sigmod2020's People

Contributors

geofot96 avatar gondolav avatar byrek3d avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.