Coder Social home page Coder Social logo

matmarex / bio-index Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 59 KB

Set of scripts used to import data from Polish Wikipedia index of biographies to Wikidata and to generate and manage said index

License: Other

JavaScript 37.53% CSS 1.62% Ruby 59.94% Shell 0.91%

bio-index's Introduction

Synopsis

This is a set of scripts used for:

Written in Ruby and JavaScript.

License

The MIT License, partially dual-licesed under CC BY-SA to allow certain files to be freely pasted on pages of Wikimedia projects.

For details and list of contributors see LICENSE.

Libraries

A whole lot. Apart from the standard Ruby library some of the scripts require the following gems (in latest available versions as of 2013-09-27):

  • roman
  • json
  • nokogiri
  • parallel
  • sunflower
  • unicode_utils
  • unidecoder

The code has only been tested on Ruby 1.9.3. It will probably run on newer Rubies, too.

Details

Most of the text (in Polish) and configuration (for the Polish Wikipedia) is hardcoded in the .rb and .js files. Sorry 'bout that.

Brief description of each file:

  • Wikipedia gadget

    • bioindex-editor.css and bioindex-editor.js – a gadget that allows editors to modify the Wikidata descriptions and Wikipedia defaultsorts straight from the index itself.
    • bioindex-editor-bootstrap.js – minimal loader for the gadget, to be added to common.js.
  • Primary scripts

    • build-index.rb – aggregate data from all sources and upload them to the index. Takes a few hours to run; generates temporary 'savepoints' which will be used as starting point (this allows it to be terminated at will without losing all the work).
    • parse-index.rb – parse old index of biographies and dump the data in JSON format to current directory.
    • upload-index.rb – upload the data generated by the above script to Wikidata.
    • sprzeczne.rb – compare birth and death year data aggregated from categories and from the old index of biographies, return a pretty table.
  • Mini-libraries

    • intro-extractor.rb – extracts brief descriptions and lifetime information from given Wikipedia pages.
    • roman.rb – wrapper for the roman gem to fix its broken handling for negative numbers (used to deal with centuries BC).
    • savepoint.rb – short wrapper for Marshal.load and .dump from/to file.
  • Miscellanea

    • .gitignore – contains a list of temporary files running the Ruby scripts might generate.
    • LICENSE – MIT / CC BY-SA.
    • README.md – this file.

bio-index's People

Contributors

legoktm avatar matmarex avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.