Coder Social home page Coder Social logo

wikimedia / articlequality Goto Github PK

View Code? Open in Web Editor NEW
47.0 21.0 30.0 9.07 MB

Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)

Home Page: https://articlequality.readthedocs.io/en/latest/

License: MIT License

Python 84.91% Makefile 15.04% Shell 0.05%
artificial-intelligence

articlequality's Introduction

Wikipedia article quality classification

โš ๏ธ Warning: As of late 2023, the ORES infrastructure is being deprecated by the WMF Machine Learning team, please check https://wikitech.wikimedia.org/wiki/ORES for more info.

While the code in this repository may still work, it is unmaintained, and as such may break at any time. Special consideration should also be given to machine learning models seeing drift in quality of predictions.

The replacement for ORES and associated infrastructure is Lift Wing: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing

Some Revscoring models from ORES run on the Lift Wing infrastructure, but they are otherwise unsupported (no new training or code updates).

They can be downloaded from the links documented at: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Revscoring_models_(migrated_from_ORES)

In the long term, some or all these models may be replaced by newer models specifically tailored to be run on modern ML infrastructure like Lift Wing.

If you have any questions, contact the WMF Machine Learning team: https://wikitech.wikimedia.org/wiki/Machine_Learning

This library provides a set of utilities for performing automatic detection of assessment classes of Wikipedia articles. For more information, see the full documentation at https://articlequality.readthedocs.io .

Compatible with Python 3.x only. Sorry.

Basic usage

>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page.  I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
 'probability': {'stub': 0.27156163795807853,
                 'b': 0.14707452309674252,
                 'fa': 0.16844898943510833,
                 'c': 0.057668704007171959,
                 'ga': 0.21617801281707663,
                 'start': 0.13906813268582238}}

Install

Requirements

  • Python 3.5, 3.6 or 3.7
  • All the system requirements of revscoring

Installation steps

  1. clone this repository
  2. install the package itself and its dependencies python setup.py install
  3. You can verify that your installation worked by running make enwiki_models to build the English Wikipedia article quality model or make wikidatawiki_models to build the item quality model for Wikidata

Retraining the models

To retrain a model, run make -B MODEL e.g. make -B wikidatawiki_models. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.

To skip re-downloading the training labels and re-extracting the features, it is enough touch the files in the datasets/ directory and run the make command without the -B flag.

Running tests

Example:

pytest -vv tests/feature_lists/test_wikidatawiki.py

Authors

articlequality's People

Contributors

accraze avatar adamwight avatar aikochou avatar chtnnh avatar gloriany avatar gpaumier avatar guergana avatar haksoat avatar halfak avatar he7d3r avatar isaranto avatar kevinbazira avatar ladsgroup avatar mariushoch avatar mavrikant avatar micgro42 avatar nettrom avatar nikitavbv avatar notconfusing avatar pix1234 avatar ragesoss avatar sharingan-coder01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.