kawu / hist-pl Goto Github PK
View Code? Open in Web Editor NEWPrograms and libraries related to the historical dictionary of Polish
Programs and libraries related to the historical dictionary of Polish
Images should be shown instead of "W tył" and "W przód" buttons.
When a mouse hovers over a historical word, a tooltip with a definition (or an equivalent) of the word should appear.
Add new entries on the basis of the Linde dictionary, but only those entries which can be found also in historical texts.
Problem: we should think about a potencial problem of entry duplication -- while adding an entry from the Linde dictionary, it is possible that there is already an entry representing the same lexeme in our dictionary. We should be able to identify such a situation and merge both entries, accordingly.
The system doesn't perform any sentence-level segmentation right now. Tools assume, that text is already divided into sentences.
Are we gonna use SRX rules in the future?
The pop-up window, in which the description of the searched phrase is shown, should be always on top of the current window. Is it possible?
After #25 is closed, we should work on adding contexts of form occurences on the basis of the collection of historical documents.
We have a preliminary implementation of the BaseX API. We can use it to easily perform modifications on the LMF version of the dictionary.
Question: do we need the binary version of the dictionary here? If so, does the modification introduced on the LMF version needs to be immediatelly visible in the binary version? If so, this solution is completely impractical, since it is not possible to update the binary version on-the-fly (yet).
Otherwise, if there is no need to use the binary version (or at least to update it on-the-fly), we can use the BaseX-based solution.
There is no need to update the dictionary (either binary or LMF) on-the-fly. We can keep generated contexts in a key-value store (using e.g. http://hackage.haskell.org/package/cassy), and only in the end perform the update as a single pass on the LMF dictionary.
The extension should provide two kinds of functionality:
The easiest way to do that is to provide a single page, which -- depending on the type of argument -- will present a description of a word or a sentence with marked historical forms. The extension can be then based on Dictionary Tooltip
.
[Testing in Windows Firefox] In an entry description, occurence contexts are shown using a bigger font than headers! That's strange, it should be corrected.
It will be useful when contexts are automatically analysed. A user will be able to look at definitions and use links to get detailed information about individual words.
The current analysis UI is not very well adapated for long texts. There are several problem, among others:
Znakuj
button has to be clicked multiple times,It would be nice to implement some word-level segmentation rules. The question is, how such a segmentation should work given the historical dictionary structure? For example, the "chciałabyś" word is one word in the dictionary and in Morfeusz it consists of three segments. Perhaps, then, there won't be any ambiguities in the word-level historical segmentation?
Add support for links, which modify only the form
paramter of the current request.
Extension should have a more informative description (which is shown, e.g., when installing the extension in firefox).
In particular, due to the HTML formatting, spaces are not shown properly right now.
In order to be able to make references (in a form of the @SourceID attribute) we need a concise way of identifying documents. That's why we need a file with (doc path <-> doc ID) correspondence, which will allow us to use IDs as references.
LMF element types missing in the binary representation of the dictionary:
Low-level IO exceptions should be accesible for the user. Right now, generalized descriptions of individual exceptions are shown, for example:
load: failed to open entry with the Key {path = "z", uid = 1} key
The website cannot handle long chunks of text, most likely due to lazy IO -- the program opens too many file handles or somethinkg like that. The web handler shows, for example:
A web handler threw an exception. Details:
user error (load: failed to open entry with the Key {path = "z", uid = 1} key)
It is not directly related to the binary dictionary, but the problem is conspicuous: there are many word forms in the LMF verrsion of the dictionary which look like this:
<feat att="writtenForm" val="potrzebno&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;"/>
The title
attribute doesn't work in the embedded browser, apparently.
While there should be no forms with special characters (e.g &
), the site should support them anyway.
User should be able to change the address of the web service.
Add custom subpage for lexical entry presentation.
One of the website pages should provide an alphabetic index which will allow to view the entire dictionary entry by entry.
It should be possible to lookup entry by its identifier (stored as id attribute of the LexicalEntry element). The reason: we want to be able to follow pointers (which have a form of identifiers) occuring in some dictionary elements, e.g. in Related Forms.
It may be a good idea to make separate packages for:
The hist-pl-lexicon package would link the binary dictionary with a DAWG-based dictionary component.
Add localization for the Polish language
This will most likely resolve issue #13.
Example:
$ hist-pl-website srpsdp.bin -p 10019
hist-pl-website: user error (Pattern match failure in do expression at src/Main.hs:230:5-13)
The problem stems from the fact that we assume that the program takes exactly one argument in the main
function.
The extension doesn't work properly, when the selection includes the ';' character. Only part of the selection before this character is labeled. It seems related to the URL encoding of the query parameter, see #21.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.