Coder Social home page Coder Social logo

amarhod / skyscanner-cheapest-day Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 14 KB

Analyzes the cheapest day to buy a flight ticket on Skyscanner before a trip. By utilizing Kafka, a live stream is simulated which contains price information on a number of routes. The information is saved to a Cassandra table which is then used to calculate the time point of the lowest price across all flights queried.

Python 100.00%
skyscanner kafka pyspark

skyscanner-cheapest-day's Introduction

Skyscanner cheapest day

Analyzes the best time point to buy flight tickets on Skyscanner. Developed in collaboration with @EleonoraBorzis for a course project.

Code functionality

  • skyscanner_consumer - Queries the Skyscanner Flight Search API for a number of routes and receives the minimum price for each route across multiple days. Useful information (e.g. cached timestamp, minimum price, origin, destination etc) for each route and date gets stored in a list.
  • kafka_producer - Calls skyscanner_consumer and produces each message in the list to a Kafka topic.
  • kafka_consumer - Consumes each message from the Kafka topic and UPSERTs it into a Cassandra table.
  • analyzer - Calculates the best day before departure, to buy a ticket, for a given range (e.g. 1-30 days). By querying the Cassandra table and calculating the best day for each route, the day that occures most frequent across all routes is found and printed.

How to run

How to run in the terminal:

1. Start Zookeeper
zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

1. Start Kafka
kafka-server-start.sh $KAFKA_HOME/config/server.properties

1. Create a topic named skyscanner_test
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic skyscanner_test

4. Start Cassandra 
cassandra -f

5. Create a virtualenv and install packages with requirements.txt
pip3 install virtualenv
virtualenv skyscanner
source skyscanner/bin/activate
pip3 install -r requirements.txt

6. Submit Spark job
$SPARK_HOME/bin/spark-submit --jars /.../spark-streaming-kafka-0-8-assembly_2.11-2.4.3.jar kafka_consumer.py

7. Start Kafka producer
source skyscanner/bin/activate
python3 kafka_producer.py

8. Start analyzer
source skyscanner/bin/activate
python3 analyzer.py

skyscanner-cheapest-day's People

Contributors

amarhod avatar eleonoraborzi avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

eleonoraborzi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.