Wikipedia Data Analysis Toolkit
Authors: Felipe Ortega, Aaron Halfaker. License: GPLv3 (http://www.gnu.org/licenses/gpl.txt).
The aim of WikiDAT is to create an extensible toolkit for Wikipedia Data Analysis, based on MySQL, Python and R.
Each module implements a different type of analysis, storing the output in subdirectories results, figs or traces, created in the module's directory. Module source code includes Python and R code to implement both the data preparation/cleaning and data analysis steps, including inline comments. An important goal is to illustrate different case examples of interesting analyses with Wikipedia data, following a didactic approach.
The long-term goal is to include more case examples progressively, in order to cover many of the usual examples of quantitative analyses that can be undertaken with Wikipedia data. In the future, this may also include the use of tools for distributed computing to support analysis of really huge data sets in high-resolution studies.