Coder Social home page Coder Social logo

wiktionary's People

Contributors

alkamid avatar peterbowman avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wiktionary's Issues

in JS, warn the user if example not wikified

Czasem trudno zauważyć, że pozostało jakieś niepodlinkowane słowo. Może warto tak zmodyfikować skrypt, że jeśli nie wszystkie słowa są podlinkowane, nie da się włączyć zielonego przycisku zatwierdzenia, ew. z odpowiednim komunikatem? tscaodp 11:44, 27 lut 2016 (CET)

in orphaned_examples.check_if_wikified(), handle names

An example that should be accepted:

[[ważny|Ważnych]] [[argument]]ów [[na]] [[to]], [[że]] [[Gomułka]] [[powinien]] [[odejść]], [[mieć|miała]] [[dostarczyć]] [[radziecki]]emu [[przywódca|przywódcy]] [[rozmowa]] [[ambasador]]a Aristowa [[z]] [[szef]]em [[polski]]ej [[partia|partii]]. [[dojść|Doszło]] [[do]] [[ona|niej]] [[we]] [[wtorek]] [[15]] [[grudzień|grudnia]] [[po]] [[południe|południu]] [[w]] [[gmach]]u [[KC]]. [[Piotr]] [[Kostikow]] [[relacjonować|relacjonuje]], [[że]] [[Gomułka]] [[zwymyślać|zwymyślał]] Aristowa: - [[co|Co]] [[wy]] [[tu]] [[ja|mnie]] [[egzaminować|egzaminujecie]] - [[krzyknąć|krzyknął]] [[usłyszeć|usłyszawszy]] [[pytanie|pytania]] [[ambasador]]a. - [[mieć|Macie]] [[swój|swoje]] [[informacja|informacje]] [[i]] [[wiedzieć|wiecie]], [[jaki]] [[charakter]] [[mieć|mają]] [[demonstracja|demonstracje]] [[w]] [[Gdańsk]]u.

wymowa.py - edit conflict

ERROR: Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 374, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.4/socket.py", line 371, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.4/ssl.py", line 746, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.4/ssl.py", line 618, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 597, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py", line 310, in reraise
    raise value
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 376, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 304, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/shared/pywikipedia/core/pywikibot/data/api.py", line 1954, in submit
    body=body, headers=headers)
  File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
    return obj(*__args, **__kw)
  File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 322, in request
    r = fetch(baseuri, method, body, headers, **kwargs)
  File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 477, in fetch
    error_handling_callback(request)
  File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 395, in error_handling_callback
    raise request.data
  File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 374, in _http_process
    verify=not ignore_validation)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/adapters.py", line 433, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Read timed out. (read timeout=30)

WARNING: Waiting 5 seconds before retrying.
WARNING: API error editconflict: Edit conflict detected
Traceback (most recent call last):
  File "/shared/pywikipedia/core/pywikibot/site.py", line 4978, in editpage
    result = req.submit()
  File "/shared/pywikipedia/core/pywikibot/data/api.py", line 2189, in submit
    raise APIError(**result['error'])
pywikibot.data.api.APIError: editconflict: Edit conflict detected [help:See https://pl.wiktionary.org/w/api.php for API usage]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/wymowa.py", line 232, in <module>
    main()
  File "/data/project/alkamidbot/scripts/wymowa.py", line 189, in main
    output_main.put(final, comment = 'Aktualizacja listy', botflag=False)
  File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
    return obj(*__args, **__kw)
  File "/shared/pywikipedia/core/pywikibot/page.py", line 1291, in put
    **kwargs)
  File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
    return obj(*__args, **__kw)
  File "/shared/pywikipedia/core/pywikibot/page.py", line 1208, in save
    cc=apply_cosmetic_changes, quiet=quiet, **kwargs)
  File "/shared/pywikipedia/core/pywikibot/page.py", line 1233, in _save
    raise err
  File "/shared/pywikipedia/core/pywikibot/page.py", line 1219, in _save
    watch=watch, bot=botflag, **kwargs)
  File "/shared/pywikipedia/core/pywikibot/site.py", line 1329, in callee
    return fn(self, *args, **kwargs)
  File "/shared/pywikipedia/core/pywikibot/site.py", line 4998, in editpage
    raise self._ep_errors[err.code](page)
pywikibot.exceptions.EditConflict: Page [[pl:Wikipedysta:AlkamidBot/wymowa]] could not be saved due to an edit conflict
CRITICAL: Closing network session.

phrases_wikilink() not working for title case

In [5]: sjp.phrases_wikilink(sjp.wikilink('a także stało się to dziś w Stolicy Apostolskiej'))
Out[5]: '[[a także]] [[stać|stało]] [[się]] to [[dziś]] [[w]] [[stolica|Stolicy]] [[apostolski|Apostolskiej]]'

timeout on porzucone.py

File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 238, in request
r = fetch(baseuri, method, body, headers, **kwargs)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 354, in fetch
error_handling_callback(request)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 271, in error_handling_callback
raise request.data
ConnectionError: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Max retries exceeded with url: /w/api.php (Caused by <class 'httplib.BadStatusLine'>: '')

WARNING: Waiting 5 seconds before retrying.
Sleeping for 5.0 seconds, 2015-07-12 05:09:09

wikified_proportion() bug with numbers

In [2]: orphaned_examples.wikified_proportion('[[tłum|Tłumy]] [[gromadzić|gromadziły]] [[się]] [[przy]] [[ponad]] [[36-metrowy|36-metrowej]] [[daglezjowy|daglezjowej]] [[dłużyca|dłużycy]], [[z]] [[który|której]] [[powstać|powstała]] [[długi|najdłuższa]] [[deska]] [[świat]]a.')
Out[2]: 0.8666666666666667

add a function to compare two strings

useful for edit conflicts.

The match doesn't have to be perfect (right now I'm checking for perfect matches), because editors might change "-" to "—" or specific links.

List the most commonly wikised pairs and automate wikisation

Wikisation is not ideal right now (and will never be). There surely are some words that are never wikified as their alternative base forms, e.g. "lub" always comes from "lub", not from the imperative of "lubić".
A function should look at all verified "good_examples" and compare them with what the bot has put on pages.

List the worst sources

By looking at logs for "channel", "domain" and titles (compare those for good_examples and bad_exampes), find out if there are any really bad sources and exclude them.

new api bugs

Wikisłownik:Dodawanie przykładów/dane/002 "cichaczem"
Wikisłownik:Dodawanie przykładów/dane/001 "pozostać"

use logs to show stats

  • top verificators
  • number of added examples
  • number of adopted orphans
  • number of pages without examples

An idea to reduce orphans

The bot is already making a list of orphans: http://tools.wmflabs.org/alkamidbot/porzucone.html

It could also make a list of words without examples and then search NKJP (or somewhere else?) for sentences that include both the orphan and the word without examples.

Then make an interface for users to check these examples (unfortunately words have different meanings and the bot cannot really tell the difference between them*). It could be an external site where the user just sees the meaning and the example, and can "tick" if the example is OK.

  • well, in principle I could take N examples from the corpus and build a small ML algorithm to determine which meaning this is

UnicodeDecodeError on wymowa.py

==> wymowa.py.e1498670 <==
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/wymowa.py", line 233, in
main()
File "/data/project/alkamidbot/scripts/wymowa.py", line 28, in main
for line in f:
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 6: ordinal not in range(128)
CRITICAL: Closing network session.

not all reflective verbs are wikified properly

[[mariacki|Mariacki]] [[hejnał]] [[kojarzyć|kojarzył]] [[się]] [[z]] niedzielno-obiadowymi woniami, [[przed]]e wszystkim [[z]] [[zapach]]em [[rosół|rosołu]] [[bulgotać|bulgoczącego]] [[na]] kuchennej [[blacha|blasze]], kurzo-wołowego, [[pietruszkowy|pietruszkowego]], [[selerowy|selerowego]]...

UnicodeEncodeError on wymowa.py

Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/wymowa.py", line 232, in <module>
    main()
  File "/data/project/alkamidbot/scripts/wymowa.py", line 226, in main
    f.write('<br />' + item + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u0107' in position 15: ordinal not in range(128)
CRITICAL: Closing network session.

handler for submit button

implement a function similar to rebindFormActions: (https://pl.wiktionary.org/wiki/MediaWiki:Gadget-edit-form-ui.js)

<PeterBowman> mozesz konwertowac tylko raz, przed zapisaniem strony
<PeterBowman> $( '#editform' ).on( 'submit', handler )
<PeterBowman> gdzie handler to funkcja konwertujaca JSON na ciag znakow i wklejajaca go do okna edycji
<PeterBowman> dzialaloby tez dla podgladu zmian
<PeterBowman> nieco wyzej uzylem mw.confirmCloseWindow(), to jest na wypadej, gdyby edytor chcial zamknac okno bez zapisania edycji
<PeterBowman> wyswietla wtedy komunikat proszacy o potwierdzenie

in porzucone.py, make a temp file until finished

Right now porzucone.html is constantly overwritten, so when the script starts, it is blank. A better idea would be to write to porzucone.html.1 and then move to porzucone.html when the script is finished, as suggested by valhallasw on IRC:

valhallasw`cloud alkamid: the typical solution for these kinds of issues is to write to porzucone.html.1, then move porzucone.html.1 over porzucone.html

visits.py - UnboundLocalError

Traceback (most recent call last):
  File "/data/project/alkamidbot/scripts/visits.py", line 113, in <module>
    main()
  File "/data/project/alkamidbot/scripts/visits.py", line 65, in main
    for line in inp:
UnboundLocalError: local variable 'inp' referenced before assignment
CRITICAL: Closing network session.

rewrite porzucone.py to use pagelinks dump

Right now Page.getReferences() is used to count references to a page, and this is a method that operates on-line. This is an overkill and it takes forever to find orphaned pages.

Options:

  1. Use (adapt) pywikibot/lonelypages.py
  2. Use pagelinks dump, where are links are listed
  3. Use pywikibot's xmlreader?

timeout on wymowa.py

==> wymowa.py.e607950 <==
r = fetch(baseuri, method, body, headers, **kwargs)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 354, in fetch
error_handling_callback(request)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 271, in error_handling_callback
raise request.data
ConnectionError: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Max retries exceeded with url: /w/api.php?titles=File%3APl-genus.OGG&continue=&format=json&prop=imageinfo&iilimit=500&meta=userinfo&indexpageids=&action=query&maxlag=5&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmime%7Cmetadata%7Carchivename&uiprop=blockinfo%7Chasmsg (Caused by <class 'httplib.BadStatusLine'>: '')

WARNING: Waiting 5 seconds before retrying.

(but then continues and saves the page so might not be critical)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.