Coder Social home page Coder Social logo

hist-pl's People

Contributors

gabriella439 avatar kawu avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

gabriella439

hist-pl's Issues

Add entries from Linde

Add new entries on the basis of the Linde dictionary, but only those entries which can be found also in historical texts.

Problem: we should think about a potencial problem of entry duplication -- while adding an entry from the Linde dictionary, it is possible that there is already an entry representing the same lexeme in our dictionary. We should be able to identify such a situation and merge both entries, accordingly.

Sentence-level segmentation

The system doesn't perform any sentence-level segmentation right now. Tools assume, that text is already divided into sentences.

Are we gonna use SRX rules in the future?

Extension: pop-up window on top

The pop-up window, in which the description of the searched phrase is shown, should be always on top of the current window. Is it possible?

Add contexts of form occurences

After #25 is closed, we should work on adding contexts of form occurences on the basis of the collection of historical documents.

Solution 1

We have a preliminary implementation of the BaseX API. We can use it to easily perform modifications on the LMF version of the dictionary.

Question: do we need the binary version of the dictionary here? If so, does the modification introduced on the LMF version needs to be immediatelly visible in the binary version? If so, this solution is completely impractical, since it is not possible to update the binary version on-the-fly (yet).

Otherwise, if there is no need to use the binary version (or at least to update it on-the-fly), we can use the BaseX-based solution.

Solution 2

There is no need to update the dictionary (either binary or LMF) on-the-fly. We can keep generated contexts in a key-value store (using e.g. http://hackage.haskell.org/package/cassy), and only in the end perform the update as a single pass on the LMF dictionary.

Extension: proposed functionality

The extension should provide two kinds of functionality:

  • Looking up an individual word
  • Annotating a phrase

The easiest way to do that is to provide a single page, which -- depending on the type of argument -- will present a description of a word or a sentence with marked historical forms. The extension can be then based on Dictionary Tooltip.

Website: problem with fonts

[Testing in Windows Firefox] In an entry description, occurence contexts are shown using a bigger font than headers! That's strange, it should be corrected.

Website: analyse contexts

It will be useful when contexts are automatically analysed. A user will be able to look at definitions and use links to get detailed information about individual words.

Website UI: viewing long chunks of text

The current analysis UI is not very well adapated for long texts. There are several problem, among others:

  • Part of the text is hidden,
  • The Znakuj button has to be clicked multiple times,
  • Focus goes to the top of the page when writing something at the bottom.

Word-level segmentation ambiguities

It would be nice to implement some word-level segmentation rules. The question is, how such a segmentation should work given the historical dictionary structure? For example, the "chciałabyś" word is one word in the dictionary and in Morfeusz it consists of three segments. Perhaps, then, there won't be any ambiguities in the word-level historical segmentation?

Website: local links

Add support for links, which modify only the form paramter of the current request.

Extension: add description

Extension should have a more informative description (which is shown, e.g., when installing the extension in firefox).

Build (doc path <-> doc ID) correspondence

In order to be able to make references (in a form of the @SourceID attribute) we need a concise way of identifying documents. That's why we need a file with (doc path <-> doc ID) correspondence, which will allow us to use IDs as references.

Binary dictionary: missing element types

LMF element types missing in the binary representation of the dictionary:

  • Equivalent (no such entries in LMF version at the moment)
  • Statement (no such entries in LMF version at the moment)
  • Sense Relation (no such entries in LMF version at the moment)

Lexicon: uncover lower-level IO exceptions

Low-level IO exceptions should be accesible for the user. Right now, generalized descriptions of individual exceptions are shown, for example:

load: failed to open entry with the Key {path = "z", uid = 1} key

Website: handle long chunks of text

The website cannot handle long chunks of text, most likely due to lazy IO -- the program opens too many file handles or somethinkg like that. The web handler shows, for example:

A web handler threw an exception. Details:
user error (load: failed to open entry with the Key {path = "z", uid = 1} key)

LMF dictionary: strange word forms

It is not directly related to the binary dictionary, but the problem is conspicuous: there are many word forms in the LMF verrsion of the dictionary which look like this:

<feat att="writtenForm" val="potrzebno&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;"/>

Website: href encoding

While there should be no forms with special characters (e.g &), the site should support them anyway.

Binary dictionary: lookup by ID

It should be possible to lookup entry by its identifier (stored as id attribute of the LexicalEntry element). The reason: we want to be able to follow pointers (which have a form of identifiers) occuring in some dictionary elements, e.g. in Related Forms.

Divide hist-pl-lexicon package

It may be a good idea to make separate packages for:

  • hist-pl-types -- types,
  • hist-pl-lmf -- LMF parsing and printing,
  • hist-pl-binary -- binary representation of the dictionary.

The hist-pl-lexicon package would link the binary dictionary with a DAWG-based dictionary component.

Website: handle snap-related command-line program options

Example:

$ hist-pl-website srpsdp.bin -p 10019
hist-pl-website: user error (Pattern match failure in do expression at src/Main.hs:230:5-13)

The problem stems from the fact that we assume that the program takes exactly one argument in the main function.

Website (extension): support for the ';' character

The extension doesn't work properly, when the selection includes the ';' character. Only part of the selection before this character is labeled. It seems related to the URL encoding of the query parameter, see #21.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.