Coder Social home page Coder Social logo

ct-link-scout's Introduction

Link Scout

This is a simple application that takes in a person_id and searches all connections via two main criteria:

  1. A person is a connection if he/she worked in the same company and their timelines overlapped by at least 6 months (ie. 182.5 days)
  2. A person is a connection if either one has the others phone number in their list of contacts.

The data comes in the form of 2 JSON files: persons.json and contacts.json.

Libraries required (pre-provisioned)

This application utilizes the following libraries/technologies:

  • Pyspark = for all data processing via Apache Spark
  • Fuzzywuzzy = for fuzzy matching (ex. using Levenshtein ratio)
  • Pytest = for unit tests and integration tests

The docker container this application comes in with should already provision necessary installations. See Dockerfile.

As such, the only real requirement is that you have the docker engine.

Running the application

There are two ways to run this application, in the order of preference:

Run using the pre-built docker image

There is a pre-built docker image in Dockerhub ( gadm01/link-scout ) - which means you don't even need to pull this repository. In a command line run the following:

#Pull the docker image
docker pull gadm01/link-scout

#Run the docker container
docker run --detach --name link-scout -it gadm01/link-scout

#Go inside the container's shell to run the CLI of link-scout
docker exec -ti link-scout bash

Once you're inside the container, everything will be provisioned for you so you can simply use link-scout's CLI. Here are sample commands:

#Navigate to link-scout's root directory:
cd /home/scout/app
#Find all connections of person with id=4 (using default json paths, persons.json and contacts.json):
python3 link_scout.py -i 4

#Find all connections of person with id=3 with verbose printing:
python3 link_scout.py -i 3 -v

#Find all connections of person with id=2 and passing a new persons.json file:
python3 link_scout.py -p "test/fixture/persons_load_test.json" -i 2

#Print usage help
python3 link_scout.py -h

Run by building from source (ie. docker build)

If you really want to build from source, here are the steps:

#Make sure you are in the same directory as the Dockerfile, then run
docker build -t link-scout .

#Run the docker container
docker run --detach --name link-scout -it link-scout

#Go inside the container's shell to run the CLI of link-scout
docker exec -ti link-scout bash

Once you're inside the container, you can use link-scout's CLI as shown in the previous section.

Running the tests

Assuming all steps before this section was successful, you can simply run pytest to run both integration tests and unit tests:

#Go to application's home
cd /home/scout/app
#To run all tests:
python3 -m pytest
#To run just unit tests:
python3 -m pytest test/ls_unit_test.py
#To run just integration tests:
python3 -m pytest test/ls_integration_test.py

Areas of improvement

  • More unit tests and integration tests
  • Bigger test files of varying use cases

Questions/Clarifications

Please contact:

ct-link-scout's People

Contributors

kevinpalis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.