Coder Social home page Coder Social logo

wiki-pageview-floor's Introduction

Trying to find and analyse the least viewed articles on English Wikipedia. See my blog for a writeup of this investigation, In search of the least viewed article on Wikipedia.

Data pipeline

In the course of this investigation, I looked at a few different sets of articles. In each case, the steps for processing them was basically the same.

The first step is to use Quarry to run a SQL query which generates a csv file with page metadata. The main datasets and corresponding queries were:

The next step is to run get_views.py, passing in the filename of the csv downloaded from quarry. This will create a csv having a column with article name, plus 12 columns having monthly page views in 2021 for that article, with a final convenience column having the total for the year.

merge.py merges the csv's from steps 1 and 2.

The subsequent analysis and visualization of the merged data is done in the included ipython notebooks.

wiki-pageview-floor's People

Contributors

colinmorris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sc0h0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.