Coder Social home page Coder Social logo

fetch_rewards_de_exercise's Introduction

fetch_rewards_de_exercise

Coding challenge for Fetch Rewards

Included in this git repo is a program that takes as inputs two texts and uses a metric to determine how similar they are using a 0.0 - 1.0 scale. A score of 0 meaning no similarity and a score of 1 meaning the two texts are exactly the same. The program is built using Python, Flask and Docker.

Getting Started

To run this program:

  • Create a folder to store this repo locally (i.e. mkdir applicant_program )

  • Clone this GitHub repo: git clone https://github.com/amp5/fetch_rewards_de_exercise.git

  • Determine how you want to run program: locally, locally with Docker or with Docker Hub

1. Run Locally

  • Ensure you are inside the web folder cd web
  • Install virtualenv if you don't already have this on your machine
  • Run python3 -m venv env
  • Activate your virtual environment source env/bin/activate
  • Install requirements for the program pip install -r requirements.txt
  • Run the flask app flask run
  • Open up an internet browser (i.e. Chrome) and go to the following page http://127.0.0.1:5000/ which is your localhost

2. Run via Docker Locally

  • Ensure you have Docker installed on your machine
  • Run the following to build the Docker Container docker-compose up
  • Open up an internet browser (i.e. Chrome) and go to the following page http://127.0.0.1:5000/ which is your localhost

3. Run via Docker Hub **

  • Ensure you have Docker installed on your machine
  • Pull the Docker IMage from Docker Hub docker pull amp555/flask-fetch
  • Run docker run -d -p 5000:5000 amp555/flask-fetch
  • Open up an internet browser (i.e. Chrome) and go to the following page http://127.0.0.1:5000/ which is your localhost

**Note: Options 1 and 2 definitely work. I am new to using Docker but I believe Option 3 works as the Docker image is currently in a public repo on Docker Hub.

Your browser should display the following program: Image of web app

Determining the metric for similarity

Given that the coding exercise directions talked only about how many words both texts had in common as the definition for similarity and doesn't get into the specifics on how this metric might be used, this program's calculation does not factor in the following:

  • The difference between upper and lower case text. (i.e "hello world" vs "Hello World" have 2 words in common).
  • Sentiment between texts (i.e. "I love pizza" vs "I hate pizza" both have 2/3 words in common. The difference between love and hate here is not a factor in similarity despite the fact that this would certainly be something to account for in production as these two texts would be on opposite spectrums of sentiment).
  • Punctuation between texts (i.e. "Have a great day?" vs "Have a great day!" both have 4 words in common. The difference between punctuation is often to convey certain emotions or sentiment and punctuation doesn't povide much useful information in terms of potential trends among the text inputs).
  • Order of words between texts (i.e. "Sally bought icecream for Rick" vs "Rick bought icecream for Sally" have 5/5 words in common but the meaning behind the texts is not the same. The same person did not buy icrecream for the other person) Thus this program is not implementing a sequence-based approach.

To calculate similarity this program ulitizes a token-based approach to find similar tokens in both sets. Two common algorithms for this approach are the Jaccard index and the Sorensen-Dice. I will be using the Jaccard index which finds the number of common tokens and divides it by the total number of unique tokens. The Sorensen-Dice algorithm is similar but often overestimates the similarity between two strings.

Below is the formula for the Jaccard index as referenced from this article:

Jaccard Index

Similarity metric 2.0

A more robust, perhaps future refactorization of this metric could include the removal of stop words (i.e. "I", "or", "the", "to") as well as stemming the inputs (i.e. "buy" and "buy(ing)" would be counted as buy).

Potential Use Cases

  • Identifying for example duplicate or fake reviews for Fetch Rewards on Google Play or the iTunes store.
  • Identifying similar receipts (this step would most likely be post image processing). For example identifying clothing receipts compared to grocery receipts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.