Coder Social home page Coder Social logo

dit-bigdata-project3's Introduction

Page Rank

  • Project 3 of Big Data subject
  • Implemented by:
    • Theodoros Mandilaras
    • cs2.190018
  • MSc DI/EKPA 2019-2020

Implementation and evaluation of two different PageRank algorithms on the Google-web dataset.

Algorithm 1

Simple PageRank equation

p = M⋅p

where p is a vector of size Nx1 (where N is the number of nodes in the graph). p[i,1] represents the PageRank of page i. Initially every page’s PageRank is equal to 1. We repeat the above computation, i.e., multiplying M with p and storing in p for a pre-specified number of steps.

Algorithm 2

Improved version of PageRank

p = α⋅M⋅p + (1-α)⋅I​n

where α is a numeric value, N is the number of pages (nodes) in the Web graph, and In​ is a unary vector (i.e., all values are set to 1) of size Nx1.


About Implementation

The program can be executed in three different modes:

Default Mode

In the default mode, the program gets from the arguments the desired method, the path for the graph and the number of the iterations (if improved method has been selected, it also gets the a numeric value). Then, performs the selected algorithm for the desired number of iterations. Output: In the end, it stores the top and bottom 20 in a txt file in the results directory, and also it makes the histogram for the current ranks.

Evaluate Mode:

The evaluate mode implemented to execute and store the outputs(top20/bottom20 and the plots) for all the requested iterations (10, 50, 100, 200) with a single run.

Converge Mode:

In this mode, the selected algorithm iterates until it converge. That means that, it iterates until the sum of the absolute differences between last rank vector and the new one is less than a small value.

Results structure

The program creates a directory called “results” in the working directory. Inside it, also creates three folders:

  • results: In this folder the top and bottom 20 csv are stored. Files format: <method>_<a value>_<iterations>_<bottom20 or top20>.txt
  • plots: In this folder the histogram plots are stored. Files format: <method>_<a value>_<iterations>.png
  • ranks: That was a folder for development purpose. It is no more used.

How to run?

The program makes use of the Argparse module to parse the arguments. The required arguments are:

  • -p or --path: the path for the .txt file with the graph
  • -m or --method: to set the selected method. Options are 'simple', 'improved'

Optional arguments:

  • -i or --iterations: to set how many iterations the program to perform.
  • -a or --a: The numeric value used for the improved algorithm (it is required for this algorithm).
  • -c or --converge: To enable the mode with the convergence.
  • -e or --eval: To execute the evaluate mode..

Example:

Simple execution for the default mode with the simple algorithm for 100 iterations: python page_ranking.py -m simple -p web-Google.txt -i 100

Similarly for the improved algorithm python page_ranking.py -m improved -p web-Google.txt -a 0.2 -i 100

Evaluate mode: python page_ranking.py -m improved -p web-Google.txt -a=0.85 -e

Convergence mode. If iterations mentioned, they will be ignored.

python page_ranking.py -m simple -p web-Google.txt -c

dit-bigdata-project3's People

Contributors

teomandi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.