Coder Social home page Coder Social logo

plagiabot's Introduction

plagiabot

Plagiabot is a copyright violation detection bot.

Repository for Turntin-based plagiarism detection for Wikipedia. See https://en.wikipedia.org/wiki/Wikipedia:Turnitin for details.

Running the bot

The bot support standard pywikibot page generators - for most of them it check the latest revision. The bot also supports special generators to check specific edit based on the diff:

  • recentchanges (DB based)
  • recentchanges_api (api based)
  • live - recent changes using streaming or IRC

See command line help for more details

valhallasw@lisilwen:~/src/plagiabot$ python -i plagiabot.py
Logging in...
Finding folder to upload into, with name 'Wikipedia'...
Upload test text to iThenticate...
Polling iThenticate until document has been processed... . . .
Part #14558041 has a 62% match. Getting details...
Details are available on https://api.ithenticate.com/report/14557806/similarity
Sources found were:
 * I  62% 42 words at http://lrd.yahooapis.com/_ylc=X3oDMTVnb2
 * I  62% 42 words at http://lrd.yahooapis.com/_ylc=X3oDMTVncn
 * I  62% 42 words at http://lrd.yahooapis.com/_ylc=X3oDMTU4aD
 * I  62% 42 words at http://www.games2.about2006.com/aboutsit
 * I  62% 42 words at http://medlibrary.org/medwiki/All_your_b
 * I  62% 42 words at http://plumbot.com/All_your_base_are_bel
 * I  62% 42 words at http://lembolies.com/Your
 * I  62% 42 words at http://dvdradix.com/capture-flash-video-
 * I  62% 42 words at http://www.dvdradix.com/capture-flash-vi
 * I  62% 42 words at http://www.reachinformation.com/define/A
 * I  60% 41 words at http://www.reference.com/browse/wiki/se/
 * I  60% 41 words at http://www.buellersdownunder.com/archive
 * I  38% 26 words at http://lrd.yahooapis.com/_ylc=X3oDMTVnbm

API

You can query suspected diffs using the API available in: http://tools.wmflabs.org/eranbot/plagiabot/api.py

Examples:

i18n

The bot supports English, French, Portuguese and Hebrew.

For running the bot on new languages:

  1. Make sure ithenticate backend index pages in the desired language: http://www.ithenticate.com/products/faqs
  2. Add relevant messages to help the bot skip reverted edits

UI

The bot can either generate simple wiki report pages, or write to a database to be used by other tools.

See also: https://github.com/wikimedia/CopyPatrol

Useful links:

plagiabot's People

Contributors

eranroz avatar framawiki avatar jjmc89 avatar ladsgroup avatar musikanimal avatar niharika29 avatar ragesoss avatar valhallasw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plagiabot's Issues

suspected_diffs query should return more items

The current limit is 50. It would be much easier to consume if that number got increased to something that is well above the number of new issues that typically get added each day. That way, another system (such as dashboard.wikiedu.org, which has started consuming this data) could just do a single suspected_diffs query and get all the new diffs.

Can we try increasing it to like 500?

SyntaxError: invalid syntax on Earwigs copyvio tool

On https://tools.wmflabs.org/copyvios/?lang=en&project=wikipedia&title=Kevin+Byard&oldid=&action=search&use_engine=0&use_links=0&turnitin=1 I get the following error

Traceback (most recent call last):
  File "/data/project/copyvios/www/python/src/app.py", line 38, in inner
    return func(*args, **kwargs)
  File "/data/project/copyvios/www/python/src/app.py", line 103, in index
    query = do_check()
  File "./copyvios/checker.py", line 38, in do_check
    _get_results(query, follow=not _coerce_bool(query.noredirect))
  File "./copyvios/checker.py", line 74, in _get_results
    query.turnitin_result = search_turnitin(page.title, query.lang)
  File "./copyvios/turnitin.py", line 22, in search_turnitin
    return TurnitinResult(_make_api_request(page_title, lang))
  File "./copyvios/turnitin.py", line 35, in _make_api_request
    parsed_api_result = literal_eval(result.text)
  File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "", line 1
    Error encountered while handling the request. Please report a bug: https://github.com/valhallasw/plagiabot/issues
                    ^
SyntaxError: invalid syntax

Transfer repository to separate github organisation

Would it make sense to move this repository under its own organisation, e.g. https://github.com/plagiabot/plagiabot? Apart from the initial implementation many years ago I haven't really been involved.

I'm not entirely sure what this would do in terms of redirect (e.g. if an existing git checkout will still be able to pull), but this should be relatively easy to fix.

Another option is moving everything to Wikimedia's new Gitlab instance -> https://gitlab.wikimedia.org

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.