spraakbanken / korp-frontend Goto Github PK
View Code? Open in Web Editor NEWFrontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).
Home Page: https://spraakbanken.gu.se/en/tools/korp
License: MIT License
Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).
Home Page: https://spraakbanken.gu.se/en/tools/korp
License: MIT License
Result: If you're unlucky it won't use the two you selected.
Currently quotes are escaped by prefixing them with backslash. This doesn't always work, and the following query will lead to a crash:
[word = "\""] [word = "och"]
Instead escaping should be done by doubling the quote characters:
[word = """"] [word = "och"]
Currently the trend diagram shows "we have no data for this period" for periods covered by corpora with no hits by greying them out.
The problem seems to be that Korp omits corpora with no hits from the trend diagram query.
GP 2001 has no hits for this query. Open the trend diagram and note that 2001 is greyed out due to GP2001 not being part of the parameters to the backend.
This is probably fixed by simply including all corpora in the query.
You can open a map tab even when there are no lines selected in the statistics table. This should not be possible.
For example: clicking a link in statistics to open a new KWIC.
Before closing any tabs, everything works as expected. After closing a tab and then opening a new tab, the new tab will not be selected and the current tab becomes replaced with nothing.
The label 1900 is missing completely, and in its place it says 1890. For 1890 it says 1880, and so on. Everything before the year 1910 is offset by 10 years.
This is unfortunately a bug in the library we use to draw the diagram: shutterstock/rickshaw#606
This is the first step towards modernising tooling, with the goal of ending up using TypeScript to improve refactoring support.
Always select the first keyword in the KWIC after doing a search or navigating between KWIC pages. Currently this is only done for the first search.
There is a problem with the highlighting of dependency head, when the Swedish corpus is the second language. E.g. here, the word "sågspån" has highlighted the word "ser" as head, but it should be "spillt":
This is not a problem when Swedish is the first language. Here is the same example, and "spillt" is correctly marked as the head:
I've tested on the corpus "ASPAC svenska-engelska", but the problem is everywhere in that corpus, so I don't think it's a corpus problem but a Korp bug.
The date format of the popup should reflect the current granularity. For example, when the granularity is set to "month", the popup should read "February 2018", not "2018-02-01 00:00:00".
When selecting a word in the KWIC for parallel corpora with word linking, the corresponding word(s) in the other language should be highlighted. This has stopped working.
It seems like the related words always pop up for this specific search:
But not for the standard selection of corpora:
https://spraakbanken.gu.se/korp/#?stats_reduce=word&cqp=[]&page=0&search=lemgram|hund\.\.nn\.1
The search should always be a lemgram for it to work. And check it works regardless of login status.
A heatmap mode would be very useful in our map view.
There are several plugins available for this:
https://leafletjs.com/plugins.html#heatmaps
The context view should (as it previously did) show sentences containing the keywords (i.e. the sentences from the KWIC) in black and grey out the rest. Currently everything is greyed out except for the actual keywords.
Clicking anywhere else and then refocusing the input field enables lemgram suggestions again.
Downloading results is broken on the secondary KWIC tabs you can open from word pictures etc.
Pressing the “Export” button in the trend diagram table view seems to have no effect when using Firefox (version 66.0.5 on Linux, reportedly also versions on Windows and Mac). It works correctly in Chrome and reportedly Safari.
This bug affects at least Korp 7.0.0 at Språkbanken and Korp 5.0.10 at the Language Bank of Finland. I haven’t tried if it has been fixed in the development version.
(The bug was first reported by Tommi Jauhiainen of FIN-CLARIN.)
Currently we only use absolute hits, meaning that places from which we have a lot of material become over-represented in the map view, with big circles even when the search word is relatively unusual there. We should let the user switch between absolute and relative hits in the map view, and use the relative_to_struct
parameter to get the relative frequencies from the backend. The relative view should possibly be the default one.
Example:
/count?...&group_by_struct=text__geoauthorhome&relative_to_struct=text__geoauthorhome
The relative
numbers in this result are different from the same query without relative_to_struct=text__geoauthorhome
.
Currently they are sorted by internal corpus names, making "Norstedtsromaner" come before "Bonniersromaner". This is confusing to the user. Alphabetic sorting based on the display names would be better.
Remove the command
-arguments to the backend since it is not needed anymore.
Currently only available in simple search.
I ASU är det viktigt att kunna få uppgifter om textlängder, eftersom det är centralt att kunna jämföra frekvenser i textenheter av varierande längd och därför kunna ta fram relativa värden på antal ord i valda textdelar. Där är antalet egentliga ord ett relevantare mått än Korps antal token, som också räknar in skiljetecken. Transkriptionen i ASU har många markeringar för olika syntaktiska skiljetecken, pauser, pausfyllare och kodväxlingsmarkörer, och dessa varierar i antal mellan olika textenheter. Att räkna in dessa i textlängden kan ge betydande missvisningar, inte minst vid jämförelser över inlärningsstadier, vilket ju ofta blir aktuellt i ASU. Det finns därför ett behov att få uppgift om antalet verkliga ord i texterna.
Result: Nothing happens.
Result: Page indicator updates, but page doesn't actually change.
Example: https://spraakbanken.gu.se/korplabb/?mode=siberian_german
Opening the corpus selector results in an error in the console.
When enough corpora are selected it becomes impossible to resize the columns of the statistics table, rendering the table useless in many cases as you can't read the content.
Example search where you can't see the whole content for the "word" column.
KWIC downloading is partially broken.
Opening the dependency tree just yields an empty popup. No errors in log.
E.g. https://spraakbanken.gu.se/korp/#?corpus=fisk should select the "Finlandssvenska texter" group, currently results in a broken GUI
Corpus name headers in the KWIC get cut off when all the rows are short:
Also when the hit is the first word of the sentence:
Selecting an old search from the list does nothing.
The code seems to be looking for "http://", but the world is using https://.
corpus=xyz
is supposed to select all corpora under the folder with the id xyz
. This does not currently work.
Result: "Compile based on" is empty, and the statistics tab crashes.
This applies to the current dev branch.
Result: Menu is positioned outside of the browser window.
It would be useful if you could get the KWIC of the hits for a certain period of time also from the table representation of the trend diagram, as well as from the graph representations. Each cell of the table would then need to be (or contain) a link to a search for the hits for a value (row) in a period of time (column).
This was wished for by a user of the Korp at the Language Bank of Finland. He found it difficult to choose an exact period of time in the graph representations.
ASU corpus use case:
Det vore värdefullt om taggen till varje träfford visades i en kolumn i konkordansen (som i ITG). Man har nytta av det om konkordansen omfattar träfford med olika taggar.
Opening KWIC from statistics does not work.
The corpus
parameter to the backend is set to:
EUROPARL-EN,ASPACSVEN-EN
instead of the correct:
EUROPARL-EN|EUROPARL-SV,ASPACSVEN-EN|ASPACSVEN-SV
[word = "national.*" & word != "nationaliteter" %c]
Result: You get an error. The query string can't be parsed due to the CQP query not being properly encoded.
In the trend diagram table export, the values in the selection lists for the type of frequencies and file format are not correctly localized initially: they always show Swedish texts at first, regardless of the UI language selection:
Changing the UI language while the trend diagram tab is open resets the values to correctly localized ones. And in fact, even the Swedish texts shown after changing the UI language are different from the values shown at first: the texts are at first “Relativa tal” and “CSV (kommaseparerade värden)”, but after changing the UI language, they become “Relativa frekvenser” and ”CSV (semikolonseparerade värden)”:
The initial, non-localized texts seem to be shown always after opening a new trend diagram tab: changing the UI language while a tab is being shown does not affect tabs opened after the language change.
This bug affects at least Korp 7.0.0 at Språkbanken and Korp 5.0.10 at the Language Bank of Finland. I haven’t tested if it has been fixed in the development version.
(The bug was first reported by Tommi Jauhiainen of FIN-CLARIN.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.