Coder Social home page Coder Social logo

database-reports's Introduction

Database Reports

Generates statistical reports which are used by community members to improve Wikipedia.

This project allows the Community Tech bot to make periodic updates to these reports on different language Wikipedias. As of now the project support report generation for the English (see here), Vietnamese, Korean and Hindi Wikipedias.

Specific statistics that the reports support:

  • Unused templates
  • Forgotten articles
  • Most used templates
  • New wiki projects
  • Talk pages by size
  • Orphaned talk pages
  • Unused file redirects
  • Forgotten articles
  • Page with most revisions
  • Page count by namespace
  • Most edited articles last month
  • PRODed articles with deletion logs
  • Active editors with the longest-established accounts

Installation

Copy config.py.example to config.py and fill in the credentials for your database user and bot account. Make sure to properly set permissions on the new file with chmod 600 config.py.

Virtualenv is recommended for installing the Python environment:

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

After installation, either activate virtualenv like above or use venv/bin/python to run scripts.

Generating a report

Run python3 main.py test orphaned_talk. It takes two arguments; in this example test refers to test.wikipedia.org and orphaned_talk is the type of statistics you're requesting. This command outputs the name of the page on which the report got dumped.

You can pass the --dry-run flag to print output to stdout rather than editing the wiki.

On Toolforge, the reports are defined in the jobs.yaml file. See the Toolforge jobs framework documentation for more information.

If you need to run a one-off job on Toolforge, find the corresponding command in jobs.yaml and schedule it using toolforge-jobs.

Adding support for a report

  • To add support for a specific statistics that you would like to see in a report, declare a function in main.py and define it in reports.py
  • To provide support for translations in a specific language, include the dictionary in i18n/i18n.py

Contributing

Bug reports, fixes, and new features are welcomed. If you'd like to contribute code please:

  • Fork the project
  • Start a branch named for your new feature or bug
  • Create a pull request

database-reports's People

Contributors

dayllanmaza avatar framawiki avatar kaldari avatar maxsem avatar musikanimal avatar niharika29 avatar revi avatar ritikbhandari avatar srish avatar swinxy avatar theresnotime avatar thparkth avatar usernamekiran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

database-reports's Issues

The script should retry after 'Lost connection to MySQL server' error

Wikipedia:Rapports/Modèles inclus sur le plus grand nombre de pages
orphaned_talk
Wikipedia:Rapports/Pages de discussion orphelines
unused_templates
Wikipedia:Rapports/Modèles inutilisés
article_by_size
Wikipedia:Rapports/Articles par taille
most_edited_page_last_month
Traceback (most recent call last):
  File "/shared/pywikipedia/core/pwb.py", line 263, in <module>
    if not main():
  File "/shared/pywikipedia/core/pwb.py", line 256, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "/shared/pywikipedia/core/pwb.py", line 121, in run_python_file
    main_mod.__dict__)
  File "main.py", line 69, in <module>
    main(sys.argv)
  File "main.py", line 17, in main
    method()
  File "main.py", line 63, in most_edited_page_last_month
    self.rep.most_edited_page_last_month()
  File "reports.py", line 375, in most_edited_page_last_month
    cur.execute( query )
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
CRITICAL: Closing network session.
Network session closed.
<class '_mysql_exceptions.OperationalError'>

Improve unused templates report

Hello ! A request on frwiki asks me to improve unused templates report.

  • allow to use a category that whitelist pages, to skip them from the report, something like "this template can't be deleted"
  • add both creator username and date of the creation

Thanks !

Use pywikibot ?

It's really hard to deal with special chars in page title with mwclient, #4 is still here. @Niharika29 Do you allow me to port the code to pywikibot ?

mwclient.errors.APIError: (u'invalidtitle', u'Bad title "Wikipedia:Rapports/Liens externes les plus utilis%C3%A9ss dans l%27espace principal")

Unicode in page title

Hi, I can't use special chars like "Pages oubliées" in page title for frwiki. It works on other fields.

Traceback (most recent call last):
  File "main.py", line 63, in <module>
    main(sys.argv)
  File "main.py", line 17, in main
    method()
  File "main.py", line 27, in forgotten_articles
    self.rep.forgotten_articles()
  File "/data/project/framabot/test/database-reports/reports.py", line 80, in forgotten_articles
    self.publish_report( 'forgotten-articles-page-title', text )
  File "/data/project/framabot/test/database-reports/reports.py", line 414, in publish_report
    page = self.site.Pages[ reports_base_url + report_title ]
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 197, in __getitem__
    return self.get(name, None)
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 209, in get
    namespace = self.guess_namespace(name)
  File "/usr/lib/python2.7/dist-packages/mwclient/listing.py", line 220, in guess_namespace
    if name.startswith(u'%s:' % self.site.namespaces[ns].replace(' ', '_')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 50: ordinal not in range(128)

Thanks

Add if page is protected

Useful for check that biggest includes templates in most_used_templates are protected for example

German text for forgotten-pages

The forgotten-pages report has been updated to use "last edit time" rather than "last touched" time, since the latter has been producing unhelpful results since 2012, when apparently all pages were "touched".

The intro text and column header names for enwiki have been updated accordingly, but the report is also generated against dewiki and that text needs to be updated. Also the dewiki intro refers to a different report which doesn't appear to be regularly updated, and which is now redundant with the new functionality of forgotten-pages.

Categories in reports

Peer a report, page with most edits (and probably others) are categorized. The link to categories in the table is not disable.
Perhaps add a double points ":" before categories
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.