Coder Social home page Coder Social logo

podyapolskiy / where-is-waldo Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 8.37 MB

This repository keeps my solution for Task 1 in the Introduction to Machine Learning course in Innopolis University. The key technics here are data preprocessing and training ANN on highly imbalanced dataset.

Jupyter Notebook 100.00%
ann data-preprocessing imbalanced-learning keras ml ipynb json

where-is-waldo's Introduction

Task 1: Where's Waldo? (50%)

Fingerprinting

Browser fingerprinting is a technique used to identify and track individuals based on unique characteristics of their web browser configuration. These characteristics can include the browser type, version, installed plugins, and screen resolution, among others. By combining these attributes, websites can create a digital fingerprint that can be used to track user behavior across multiple sites, even if they clear their cookies or use different devices. This has raised concerns about privacy and the potential for this technology to be used for targeted advertising, surveillance, and other purposes.

Read more about Fingerprinting

What You Need to Do

In this task, you are required to employ a fully connected feed-forward Artificial Neural Network (ANN) to tackle a classification problem. This involves several key steps, each critical to the development and performance of your model:

  • Exploratory Data Analysis (EDA) (10%): Begin by conducting a thorough exploratory analysis of the provided dataset. Your goal here is to uncover patterns, anomalies, relationships, or trends that could influence your modeling decisions. Share the insights you gather from this process and explain how they informed your subsequent steps.

  • Data Preprocessing and Feature Engineering (10%): Based on your EDA insights, choose and implement the most appropriate data preprocessing steps and feature engineering techniques. This may include handling missing values, encoding categorical variables, normalizing data, and creating new features that could enhance your model's ability to learn from the data.

  • Model Design and Training (10%): Design a fully connected feed-forward ANN model. You will need to experiment with different architectures, layer configurations, and hyperparameters to find the most effective solution for the classification problem at hand.

  • Feature Importance Analysis (10%): After developing your model, analyze which features are most important for making predictions. Discuss how this analysis aligns with your initial EDA insights and what it reveals about the characteristics most indicative of specific user behaviors or identities.

  • Evaluation (10%): You will be required to submit your model prediction on a hidden data set.

Data

You will be using the data in Task_1.json to identify Waldo (user_id=0). The dataset includes:

  • "browser", "os" and "locale": Information about the software used.
  • "user_id": A unique identifier for each user.
  • "location": Geolocation based on the IP address used.
  • "sites": A list of visited URLs and the time spent there in seconds.
  • "time" and "date": When the session started in GMT.

Evaluation

After training, evaluate your model by printing the classification report on your test set. Then, predict whether each user in task_1_verify.json is Waldo or not, by adding the boolean is_waldo property to the task_1_verify.json:

  [
    {
+     "is_waldo": false,
      "browser": "Chrome",
      "os": "Debian",
      "locale": "ur-PK",
      "location": "Russia/Moscow",
      "sites": [
          // ...
      ],
      "time": "04:12:00",
      "date":"2017-06-29"
    }
    // ...
  ]

Learning Objectives

  • Exploratory Data Analysis: Apply suitable analysis techniques to gain insights and better understand the dataset.
  • Classification Approach: Identify the most appropriate method for the given problem.
  • Data Preprocessing: Select and execute proper preprocessing and encoding techniques.
  • Model Implementation: Utilize ANNs to address a classification problem, including training, validation, and testing phases.
  • Feature Importance Analysis: Determine and report which features are most critical for the model's predictions to uncover insights into specific user behaviors.

where-is-waldo's People

Contributors

podyapolskiy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.