Coder Social home page Coder Social logo

spraakbanken / korp-frontend Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 8.0 62.34 MB

Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).

Home Page: https://spraakbanken.gu.se/en/tools/korp

License: MIT License

JavaScript 74.18% CSS 1.00% HTML 3.06% TypeScript 17.81% SCSS 3.96%
cwb frontend korp

korp-frontend's People

Contributors

anne17 avatar arildm avatar g-thor avatar janiemi avatar jroxendal avatar majsan avatar martinhammarstedt avatar mmatthiesencsc avatar phildiderichsen avatar samiryousuf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

korp-frontend's Issues

Escaping of quotes should be done by doubling them

Currently quotes are escaped by prefixing them with backslash. This doesn't always work, and the following query will lead to a crash:

[word = "\""] [word = "och"]

Instead escaping should be done by doubling the quote characters:

[word = """"] [word = "och"]

Trend graph width wrong if opened in background tab

kscreenshot

The trend graph gets a default width of 400px if opened in a background tab (within the korp interface, not background browser tab).
This only occurs if the tab is inactive at the time the graph loads.

Trend diagram shows "no data" for periods with no hits

Currently the trend diagram shows "we have no data for this period" for periods covered by corpora with no hits by greying them out.

The problem seems to be that Korp omits corpora with no hits from the trend diagram query.

Example

GP 2001 has no hits for this query. Open the trend diagram and note that 2001 is greyed out due to GP2001 not being part of the parameters to the backend.

This is probably fixed by simply including all corpora in the query.

Auto-selecting a new KWIC tab broken sometimes

For example: clicking a link in statistics to open a new KWIC.

Before closing any tabs, everything works as expected. After closing a tab and then opening a new tab, the new tab will not be selected and the current tab becomes replaced with nothing.

Wrong dependency head when Swedish is the parallel language

There is a problem with the highlighting of dependency head, when the Swedish corpus is the second language. E.g. here, the word "sågspån" has highlighted the word "ser" as head, but it should be "spillt":

image

This is not a problem when Swedish is the first language. Here is the same example, and "spillt" is correctly marked as the head:

image

I've tested on the corpus "ASPAC svenska-engelska", but the problem is everywhere in that corpus, so I don't think it's a corpus problem but a Korp bug.

Trend diagram table export button does not work in Firefox

Pressing the “Export” button in the trend diagram table view seems to have no effect when using Firefox (version 66.0.5 on Linux, reportedly also versions on Windows and Mac). It works correctly in Chrome and reportedly Safari.

This bug affects at least Korp 7.0.0 at Språkbanken and Korp 5.0.10 at the Language Bank of Finland. I haven’t tried if it has been fixed in the development version.

(The bug was first reported by Tommi Jauhiainen of FIN-CLARIN.)

Add relative hits to map view

Currently we only use absolute hits, meaning that places from which we have a lot of material become over-represented in the map view, with big circles even when the search word is relatively unusual there. We should let the user switch between absolute and relative hits in the map view, and use the relative_to_struct parameter to get the relative frequencies from the backend. The relative view should possibly be the default one.

Example:
/count?...&group_by_struct=text__geoauthorhome&relative_to_struct=text__geoauthorhome
The relative numbers in this result are different from the same query without relative_to_struct=text__geoauthorhome.

Alphabetic sorting of statistics columns

Currently they are sorted by internal corpus names, making "Norstedtsromaner" come before "Bonniersromaner". This is confusing to the user. Alphabetic sorting based on the display names would be better.

Add text length data to ASU corpus

I ASU är det viktigt att kunna få uppgifter om textlängder, eftersom det är centralt att kunna jämföra frekvenser i textenheter av varierande längd och därför kunna ta fram relativa värden på antal ord i valda textdelar. Där är antalet egentliga ord ett relevantare mått än Korps antal token, som också räknar in skiljetecken. Transkriptionen i ASU har många markeringar för olika syntaktiska skiljetecken, pauser, pausfyllare och kodväxlingsmarkörer, och dessa varierar i antal mellan olika textenheter. Att räkna in dessa i textlängden kan ge betydande missvisningar, inte minst vid jämförelser över inlärningsstadier, vilket ju ofta blir aktuellt i ASU. Det finns därför ett behov att få uppgift om antalet verkliga ord i texterna.

KWIC downloading broken

KWIC downloading is partially broken.

  • Downloading KWIC from the context view results in an empty file.
  • Downloading KWIC when "in order" is disabled also results in an empty file, or, when selecting "one token per row", a crash .
  • TSV uses spaces instead of tabs.

Search history is broken

Selecting an old search from the list does nothing.
The code seems to be looking for "http://", but the world is using https://.

"Compile based on" empty after changing corpus

  1. Select a corpus with text attributes.
  2. "Compile based on" one of these text attributes.
  3. Select a new corpus that does not have this attribute.
  4. Deselect the first corpus.
  5. Perform a search.

Result: "Compile based on" is empty, and the statistics tab crashes.

Menu opens outside of window

This applies to the current dev branch.

  1. Open the menu in the top right.
  2. Close it.
  3. Change the size of the browser window.
  4. Open the menu again.

Result: Menu is positioned outside of the browser window.

Trend diagram table representation: add links to KWIC

It would be useful if you could get the KWIC of the hits for a certain period of time also from the table representation of the trend diagram, as well as from the graph representations. Each cell of the table would then need to be (or contain) a link to a search for the hits for a value (row) in a period of time (column).

This was wished for by a user of the Korp at the Language Bank of Finland. He found it difficult to choose an exact period of time in the graph representations.

Secondary KWIC broken in parallel mode

Opening KWIC from statistics does not work.

The corpus parameter to the backend is set to:
EUROPARL-EN,ASPACSVEN-EN
instead of the correct:
EUROPARL-EN|EUROPARL-SV,ASPACSVEN-EN|ASPACSVEN-SV

JSON button doesn't properly URL encode CQP queries

  1. Perform the following search: [word = "national.*" & word != "nationaliteter" %c]
  2. Click on the JSON button.

Result: You get an error. The query string can't be parsed due to the CQP query not being properly encoded.

Trend diagram table export options not correctly localized initially

In the trend diagram table export, the values in the selection lists for the type of frequencies and file format are not correctly localized initially: they always show Swedish texts at first, regardless of the UI language selection:

screenshot_Korp_7 0 0_trend_diagram_table_export

Changing the UI language while the trend diagram tab is open resets the values to correctly localized ones. And in fact, even the Swedish texts shown after changing the UI language are different from the values shown at first: the texts are at first “Relativa tal” and “CSV (kommaseparerade värden)”, but after changing the UI language, they become “Relativa frekvenser” and ”CSV (semikolonseparerade värden)”:

screenshot_Korp_7 0 0_trend_diagram_table_export_sv

The initial, non-localized texts seem to be shown always after opening a new trend diagram tab: changing the UI language while a tab is being shown does not affect tabs opened after the language change.

This bug affects at least Korp 7.0.0 at Språkbanken and Korp 5.0.10 at the Language Bank of Finland. I haven’t tested if it has been fixed in the development version.

(The bug was first reported by Tommi Jauhiainen of FIN-CLARIN.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.