Coder Social home page Coder Social logo

pagerank's Introduction

PageRank

This repository provides 2 PageRank implementations for the internal wikipedia graph. One in Spark and one in Hadoop MapReduce

Original writers: Joseph Moukarzel, Constantinos Ioannidis (https://github.com/constantinos07)

This repository is for indicative purposes only and it allows the original creators to keep track of the code! Do not use it if you plan on cheating!

InputFormat

• REVISION: revision metadata, consisting of:

        o article_id: a large integer, uniquely identifying each page.

        o rev_id: a large number uniquely identifying each revision.

        o article_title: a string denoting the page’s title (and the last part of the URL of the page).

        o timestamp: the exact date and time of the revision, in ISO 8601 format; e.g., 13:45:00 UTC 30 September 2013 becomes 2013-09-12T13:45:00Z, where T separates the date from the time part and Z denotes the time is in UTC.

        o [ip:]username: the name of the user who performed the revision, or her DNS-resolved IP address (e.g., ip:office.dcs.gla.ac.uk) if anonymous.

        o user_id: a large number uniquely identifying the user who performed the revision, or her IP address as above if anonymous.

• CATEGORY: list of categories this page is assigned to.

• IMAGE: list of images in the page, each listed as many times as it occurs.

• MAIN, TALK, USER, USER_TALK, OTHER: cross-references to pages in other namespaces.

• EXTERNAL: list of hyperlinks to pages outside Wikipedia.

• TEMPLATE: list of all templates used by the page, each listed as many times as it occurs.

• COMMENT: revision comments as entered by the revision author.

• MINOR: a Boolean flag (0|1) denoting whether the edit was marked as minor by the author.

• TEXTDATA: word count of revision's plain text.

• An empty line, denoting the end of the current record.

Formula

PR(u)=0.15 + 0.85 * Sum(PR(v)/L(v)), ∀v: ∃(v,u) ∈S, where L(v) is the number of out-links of page v.

OutputFormat

Article_1 score1

Article_2 score2

Article_3 score3

pagerank's People

Contributors

joseph94m avatar

Stargazers

Maria-Cristiana Gîrjău avatar Constantinos Ioannidis avatar  avatar

Watchers

James Cloos avatar

Forkers

constantinos07

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.