spraakbanken / korp-backend Goto Github PK
View Code? Open in Web Editor NEWBackend for Korp, Språkbanken's corpus search tool
Home Page: https://spraakbanken.gu.se/eng/korp
License: MIT License
Backend for Korp, Språkbanken's corpus search tool
Home Page: https://spraakbanken.gu.se/eng/korp
License: MIT License
If a request fails, there's (usually?) a response with information in ERROR
. Example: https://ws.spraakbanken.gu.se/ws/korp/v8/query?default_context=1%20sentence&start=0&end=24&corpus=ABOUNDERRATTELSER2012&cqp=[ It would be useful to cover this in the API docs. Can this happen for any request, or only some?
/struct_values
can list the values of positional attributes as well as structural attributes; for example:
https://ws.spraakbanken.gu.se/ws/korp/v8/struct_values?corpus=SWEACHUM&struct=text_subject%3Epos&count=true
If that’s not accidental, I think the names of the endpoint and the parameter struct
are somewhat misleading. Could the endpoint perhaps have an alias such as /attr_values
and the parameter struct
alias attr
? At the very least, I think the API documentation should mention this.
Responses from /query
, e.g. https://ws.spraakbanken.gu.se/ws/korp/v8/query?corpus=COCTAILL&cqp=[lemma%20contains%20%22.*%3Fspade.*%22] contain corpus_order
and query_data
but these are not documented on https://ws.spraakbanken.gu.se/docs/korp#tag/Concordance/paths/~1query/get
Currently we have the field group_statistics
in the frontend. It is a list of
attributes that can have :<something>
as a suffix, but the suffix is only used
to distinguish between different words in a multi-word unit, and also as a ranking
of analysis results.
In the statistics, the suffix isn't meaningful and we want to group the rows that have
the same value, but different suffix.
To fix this issue, we need to figure out a new name and add the new setting to each
attribute that has this kind of suffix. Currently:
Support omitting combined or per-corpus results in /count
and /count_time
, by using parameters combined=false
and per_corpus=false
, as in /struct_values
and /timespan
.
Some /relations
requests on many corpora sometimes terminate in a GreenletExit
error after processing some of the corpora. For example, this search first terminated after processing 6 of the 20 corpora in 41 seconds, then after 8 corpora in 152 seconds, then after 9 corpora in 74 seconds. The error with traceback was the following (pretty-printed here):
"ERROR": {
"type": "GreenletExit",
"value": "",
"traceback": [
"Traceback (most recent call last):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 146, in incremental_json",
" for response in ff:",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 232, in decorated",
" msg = q.get(block=True, timeout=timeout)",
" File \"/home/fkkorp/korp-backend/v8/venv/lib64/python3.6/site-packages/gevent/queue.py\", line 283, in get",
" return self.__get_or_peek(self._get, block, timeout)",
" File \"/home/fkkorp/korp-backend/v8/venv/lib64/python3.6/site-packages/gevent/queue.py\", line 260, in __get_or_peek",
" result = waiter.get()",
" File \"/home/fkkorp/korp-backend/v8/venv/lib64/python3.6/site-packages/gevent/hub.py\", line 898, in get",
" return self.hub.switch()",
" File \"/home/fkkorp/korp-backend/v8/venv/lib64/python3.6/site-packages/gevent/hub.py\", line 630, in switch",
" return RawGreenlet.switch(self)",
"greenlet.GreenletExit"
]
}
Does the GreenletExit
result from a timeout of some kind? Could that perhaps be avoided, or could the error message be somehow more informative? However, sometimes a similar search succeeds even if it takes much longer time, so what causes a GreenletExit
? Does the server load perhaps have an effect?
At least in the above case, it seemed that successive searches always got a bit further in the list of corpora before a GreenletExit
. Is that effect due to caching the result by corpus?
If you make a query with a very large result (at least with 280 million hits or more), repeating the query before the cache is cleared fails with an error. For example, performing the search
https://ws.spraakbanken.gu.se/ws/korp/v8/query?default_context=1%20sentence&corpus=FAMILJELIV-ALLMANNA-SAMHALLE&cqp=[]&default_within=sentence&cache=true&debug=true
twice (or more) in a row results in the following error on the second time:
{
"ERROR": {
"type": "CQPError",
"value": "ERROR: File length of subcorpus is <= 0",
"traceback": [
"Traceback (most recent call last):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 223, in error_catcher",
" g(*pargs, **kwargs)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 213, in f",
" for response in generator(args, *pargs, **kwargs):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 617, in query",
" start=hits[0], end=hits[1], **queryparams)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 1198, in query_and_parse",
" random_seed, no_results, expand_prequeries, free_search, use_cache)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 1006, in query_corpus",
" next(lines)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 3218, in run_cqp",
" raise CQPError(error)",
"korp.CQPError: ERROR: File length of subcorpus is <= 0"
]
},
"time": 0.6492030620574951
}
In the Korp frontend, this means that the first such query succeeds, but for example, changing KWIC page results in an error.
The error occurs when CQP tries to read the previously saved named query results (NQR) file. It seems that CQP can write an NQR file larger than 2 GiB but it cannot read such a file if its size is between 2 and 4 GiB. When CQP implicitly converts the file size to an int
on a system with 32-bit int
s, such a size is converted to a negative number, and CQP checks that the size of an NQR file is positive before reading it.
I reported this CWB bug or limitation via the CWB bug tracker: https://sourceforge.net/p/cwb/bugs/75/.
I don’t know if there is any way to avoid writing such a large NQR file, as you cannot make CQP commands conditional. Thus, as long as the limitation is present in CQP, the workaround I’d see possible in the Korp backend would be to check the size of the saved NQR file after executing cqp
and to remove it if it is 2 GiB or larger. (The maximum size could also be configurable.) Unless you do that first, I’ll try to implement such a workaround at some point and make a pull request.
This issue tries to describe the plugin facility I have implemented for the Korp backend used in the Language Bank of Finland and that I’d propose as the basis for a plugin facility to be included in main Korp backend code. All feedback on the proposal is welcome.
Disclaimer: I have virtually no prior experience in plugin architectures, and also my knowledge of Flask and other Web technologies used in the Korp backend is almost solely based on the Korp backend and on what I have learned when modifying it. The features of the plugin facility reflect the modifications we’ve made to the Korp for the Language Bank of Finland, so something might be implemented in a too specific way or be completely missing. Thus, I’d be glad in particular if you pointed out if something in the plugin facility should be done differently.
The code for the plugin facility is currently on top of Språkbanken’s Korp backend code at commit aad6381f of 2020-01-20, but I’ll port it on top of the current code in the near future.
The plugin facility has a readme file, which contains some more details on the plugin facility.
Also see my proposal for a plugin facility for the Korp frontend.
I’m sorry that this is rather long for a GitHub issue description.
The aim of the Korp backend plugin facility is to make it easier to tailor Korp for different sites without having to modify main Korp code. To make this possible, the main Korp code needs some support for plugins and callback hook points in appropriate places in the Python code. The plugin support code is in currently in the package korppluginlib
, but it will be moved under the korp
package, probably named pluginlib
.
The Korp backend supports two kinds of plugins:
korp.py
when handling a request, to filter data or to perform an action.Plugins are defined as Python modules or subpackages, by default within the package korpplugins
(customizable via the configuration variable PACKAGES
).
Both WSGI endpoint plugins and callback plugins can be defined in the same plugin module.
Korp’s config.py
contains the following plugin-related variables:
PLUGINS
: A list of names of plugins (modules or subpackages) to be used, in the order they are to be loaded.INFO_SHOW_PLUGINS
: What information on loaded plugins the response of the /info
command should contain: None
, "names"
or"info"
.korppluginlib
The configuration of korppluginlib
is in the module korppluginlib.config
. Currently, the following configuration variables are recognized:
PACKAGES
: A list of packages which may contain plugins.SEARCH_PATH
: A list of directories in which to search for plugins (the packages listed in PACKAGES
) in addition to default ones.HANDLE_NOT_FOUND
: What to do when a plugin is not found: "error"
, "warn"
or "ignore"
.LOAD_VERBOSITY
: What korppluginlib
outputs when loading plugins: 0
(nothing), 1
(plugin names only), 2
: (plugin names, configurations, view functions, callback methods)HANDLE_DUPLICATE_ROUTES
: What to do with duplicate endpoints for a routing rule added by plugins: "override"
,"override,warn"
, "ignore"
, "warn"
or "error"
.Alternatively, the configuration variables may be specified in the top-level module config
within PLUGINLIB_CONFIG
; for example:
PLUGINLIB_CONFIG = dict(
HANDLE_NOT_FOUND = "warn",
LOAD_VERBOSITY = 1,
)
The values specified in the top-level config
override those in korppluginlib.config
.
Values for the configuration variables of individual plugin modules or subpackages can be specified in three places:
PLUGINS
in Korp’s top-level config
module can be a pair (
plugin_name,
config)
, where config is either a dictionary- or namespace-like object containing configuration variables.config
module can define the variable PLUGIN_CONFIG_
PLUGINNAME, whose value is either a dictionary- or namespace-like object with configuration variables.config
within the subpackage, consisting of configuration variables.The value for a configuration variable is taken from the first of the above in which it is set.
To get values from these sources, the plugin module needs to call korppluginlib.get_plugin_config
with default values of configuration variables:
pluginconf = korppluginlib.get_plugin_config(
CONFIG_VAR = "value",
)
The configured value of CONFIG_VAR
can be then accessed as pluginconf.CONFIG_VAR
.
Endpoint routes (routing rules) defined by a plugin can be renamed by setting an appropriate value to the configuration variable RENAME_ROUTES
of the plugin in question. This may be needed if two plugins have endpoints with the same route, or if it is otherwise desired to change the routes specified by a plugin. The value of RENAME_ROUTES
can be a format string, a dict
or a function of one argument mapping the original route to a renamed route. For more information, please see the documentation.
A plugin module or package may define dict
PLUGIN_INFO
containing pieces of information on the plugin. Alternatively, a plugin package may contain a module named info
and a non-package plugin module plugin may be accompanied by a module named plugin_info
containing variable definitions that are added to PLUGIN_INFO
with the lower-cased variable name as the key. For example:
PLUGIN_INFO = {
"name": "korppluginlib_test_1",
"version": "0.1",
"date": "2020-12-10",
"description": "korppluginlib test plugin 1",
"author": "FIN-CLARIN",
"author_email": "fin-clarin at helsinki dot fi",
}
The information on loaded plugins is accessible in korppluginlib.loaded_plugins
.
To implement a new WSGI endpoint, you first create an instance of korppluginlib.KorpEndpointPlugin
(a subclass of flask.Blueprint
) as follows:
test_plugin = korppluginlib.KorpEndpointPlugin()
You can also specify a name for the plugin, overriding the default that is the calling module name __name__
:
test_plugin = korppluginlib.KorpEndpointPlugin("test_plugin")
You can also pass other arguments recognized by flask.Blueprint
.
The actual view function is a generator function decorated with the route
method of the created instance; for example:
@test_plugin.route("/test", extra_decorators=["prevent_timeout"])
def test(args):
"""Yield arguments wrapped in "args"."""
yield {"args": args}
The decorator takes as its arguments the route of the endpoint, and optionally, an iterable of the names of possible additional decorators as the keyword argument extra_decorators
, and other options of route
. extra_decorators
lists the names in the order in which they would be specified as decorators (topmost first), that is, in the reverse order of application. The generator function takes a single dict
argument containing the parameters of the call and yields the result.
A single plugin module can define multiple new endpoints.
Even though Korp endpoints should in general return JSON data, it may be desirable to implement endpoints returning another type of data, for example, if the endpoint generates a file for downloading. That can be accomplished by adding use_custom_headers
to extra_decorators
. An endpoint using use_custom_headers
should yield a dict
with the following keys recognized:
"content"
: the actual content;"mimetype"
(default: "text/html"
): possible MIME type; and"headers"
: possible other headers as a list of pairs (header, value).For example, the following endpoint returns an attachment for a plain-text file listing the arguments to the endpoint, named with the value of filename
(args.txt
if not specified):
@test_plugin.route("/text", extra_decorators=["use_custom_headers"])
def textfile(args):
"""Make downloadable plain-text file of args."""
yield {
"content": "\n".join(arg + "=" + repr(args[arg]) for arg in args),
"mimetype": "text/plain",
"headers": [
("Content-Disposition", "attachment; filename=\"" + args.get("filename", "args.txt") + "\"")
],
}
Neither the endpoint argument incremental=true
nor the decorator prevent_timeout
has any practical effect on endpoints with use_custom_headers
.
By default, the endpoint decorator functions whose names can be listed in extra_decorators
include only prevent_timeout
and use_custom_headers
, as the endpoints defined in this way are always decorated with main_handler
as the topmost decorator. However, additional decorator functions can be defined by decorating them with korppluginlib.KorpEndpointPlugin.endpoint_decorator
; for example:
# test_plugin is an instance of korppluginlib.KorpEndpointPlugin, so this
# is equivalent to @korppluginlib.KorpEndpointPlugin.endpoint_decorator
@test_plugin.endpoint_decorator
def test_decor(generator):
"""Add to the result an extra layer with text_decor and payload."""
@functools.wraps(generator)
def decorated(args=None, *pargs, **kwargs):
for x in generator(args, *pargs, **kwargs):
yield {"test_decor": "Endpoint decorated with test_decor",
"payload": x}
return decorated
Callbacks to be called at specific plugin hook points in korp.py
are defined within subclasses of korppluginlib.KorpCallbackPlugin
as instance methods having the name of the hook point. The arguments and return values of a callback method are specific to each hook point.
In the first argument request
, each callback method gets the actual Flask request object (not a proxy for the request) containing information on the request. For example, the endpoint name is available as request.endpoint
.
korp.py
contains two kinds of hook points:
For filter hook points, the value returned by a callback method is passed as the first non-request
argument to the callback method defined by the next plugin, similar to function composition or method chaining. However, a callback for a filter hook point need not modify the value: if it returns None
either explicitly or implicitly, the value is ignored and the argument is passed as is to the callback method in the next plugin.
At present, filter hook points and the signatures of their callback methods are
the following:
filter_args(self, request, args)
: Modifies the arguments dict
args
to any endpoint (view function) and returns the modified value.filter_result(self, request, result)
: Modifies the result dict
result
returned by any endpoint (view function) and returns the modified value. Note that when the arguments (query parameters) of the endpoint contain incremental=true
, filter_result
is called separately for each incremental part of the result.filter_cqp_input(self, request, cqp)
: Modifies the raw CQP input string cqp
, typically consisting of multiple CQP commands, already encoded as bytes
, to be passed to the CQP executable, and returns the modified value.filter_cqp_output(self, request, (output, error))
: Modifies the raw output of the CQP executable, a pair consisting of the standard output and standard error encoded as bytes
, and returns the modified values as a pair.filter_sql(self, request, sql)
: Modifies the SQL statement sql
to be passed to the MySQL/MariaDB database server and returns the modified value.filter_protected_corpora(self, request, protected_corpora)
: Modifies (or replaces) the list protected_corpora
of ids of protected corpora, the use of which requires authentication and authorization.filter_auth_postdata(self, request, postdata)
: Modifies (or replaces) the POST request parameters in postdata
, to be passed to the authorization server (config.AUTH_SERVER
) in the endpoint /authenticate
.filter_auth_response(self, request, auth_response)
: Modifies the response auth_response
returned by the authorization server in the endpoint /authenticate
.Callback methods for event hook points do not return a value.
At present, event hook points and the signatures of their callback methods are the following:
enter_handler(self, request, args, starttime)
: Called near the beginning of a view function for an endpoint. args
is a dict
of arguments to the endpoint and starttime
is the current time as seconds since the epoch.exit_handler(self, request, endtime, elapsed_time, result_len)
: Called just before exiting a view function for an endpoint (before yielding a response). endtime
is the current time as seconds since the epoch, elapsed_time
is the time spent in the view function as seconds, and result_len
the length of the response content.error(self, request, error, exc)
: Called after an exception has occurred. error
is the dict
to be returned in JSON as ERROR
and exc
contains exception information.An example of a callback plugin containing a callback method to be
called at the hook point filter_result
:
class Test1b(korppluginlib.KorpCallbackPlugin):
def filter_result(self, request, result):
"""Wrap the result dictionary in "wrap" and add "endpoint"."""
return {"endpoint": request.endpoint,
"wrap": result}
Each plugin class is instantiated only once, so the possible state stored in self
is shared by all invocations (requests). However, see the next subsection for an approach of keeping request-specific state across hook points.
A single plugin class can define only one callback method for each hook point, but a module may contain multiple classes defining callback methods for the same hook point.
If multiple plugins define a callback method for a hook point, they are called in the order in which the plugin modules are listed in config.PLUGINS
. If a plugin module contains multiple classes defining a callback method for a hook point, they are called in the order in which they are defined in the module.
If the callback methods of a class should be applied only to certain kinds of requests, for example, to a certain endpoint, the class can override the class method applies_to(cls, request)
to return True
only for requests to which the plugin is applicable.
Request-specific data can be passed from one callback method to another within the same callback plugin class by using a dict
attribute (or similar) indexed by request objects (or their ids). In general, the enter_handler
callback method (called at the first hook point) should initialize a space for the data for a request, and exit_handler
(called at the last hook point) should delete it. For example:
from types import SimpleNamespace
class StateTest(korppluginlib.KorpCallbackPlugin):
_data = {}
def enter_handler(self, request, args, starttime):
self._data[request] = data = SimpleNamespace()
data.starttime = starttime
print("enter_handler, starttime =", starttime)
def exit_handler(self, request, endtime, elapsed):
print("exit_handler, starttime =", self._data[request].starttime, "endtime =", endtime)
del self._data[request]
New hook points can be added to plugins (as well as to korp.py
) by invoking callbacks with the name of the hook point by using the appropriate methods. For example, a logging plugin could implement a callback method log
that could be called from other plugins, both callback and endpoint plugins.
Given the Flask request object (or the global request proxy) request
, callbacks for the (event) hook point hook_point
can be called as follows, with *args
and **kwargs
as the positional and keyword arguments and discarding the return value:
korppluginlib.KorpCallbackPluginCaller.raise_event_for_request("hook_point", *args, **kwargs, request=request)
or, equivalently, getting a caller object for a request and calling its instance method (typically when the same function or method contains several hook points):
plugin_caller = korppluginlib.KorpCallbackPluginCaller.get_instance(request)
plugin_caller.raise_event("hook_point", *args, **kwargs)
If request
is omitted or None
, the request object referred to by the global request proxy is used.
Callbacks for such additional hook points are defined in the same way as for those in korp.py
. The signature corresponding to the above calls is
hook_point(self, request, *args, **kwargs)
All callback methods need to have request
as the first positional argument (after self
).
Three types of call methods are available in KorpCallbackPluginCaller:
raise_event_for_request
(and instance method raise_event
): Call the callback methods and discard their possible return values (for event hook points).filter_value_for_request
(and filter_value
): Call the callback methods and pass the return value as the first argument of the next callback method, and return the value returned by the last callback emthod (for filter hook points).get_values_for_request
(and get_values
): Call the callback methods, collect their return values to a list and finally return the list.Only the first two are currently used in korp.py
.
The values of selected global variables, constants and functions in the main application module korp.py
are available to plugin modules as korppluginlib.app_globals.
name. In this way, for example, a plugin can access the Korp MySQL database and the Memcached cache and use assert_key
to assert the format of arguments.
The current implementation has at least the following limitations and deficiencies, which might be subjects for future development, if needed. Some more information on the issues is in the documentation).
filter_args
and filter_result
.config.PLUGINS
. The plugins themselves cannot specify that they should be loaded before or after another plugin, or that one callback of a plugin should be called before those of other plugins (such as filter_args
) and another after those of others (such as filter_result
).PLUGIN_INFO
or an info
module requires manual updating whenever the plugin is changed.korp.py
via korppluginlib.app_globals
is somewhat cumbersome. It could be simplified by moving the helper functions to a separate library module that could be imported by plugins.main_handler
and prevent_timeout
cannot decorate an instance method.Many Python plugin frameworks or libraries exist, but they did not appear suitable for Korp plugins as such. In particular, we wished to have both callback plugins and endpoint plugins.
Using a metaclass for registering callback plugins in korppluginlib
was inspired by and partially adapted from Marty Alchin’s A Simple Plugin Framework.
The terms used in conjunction with callback plugins were partially influenced by the terminology for WordPress plugins.
The Flask-Plugins Flask extension might have been a natural choice, as Korp is a Flask application, but it was not immediately obvious if it could have been used to implement new endpoints. Moreover, for callback (event) plugins, it would have had to be extended to support passing the result from one plugin callback as the input of another.
Using Flask Blueprints for endpoint plugins was hinted at by @MartinHammarstedt.
Calling /corpus_config
with a non-existent mode gives the TypeError
“'NoneType' object does not support item assignment”, which I don’t find descriptive of the actual error. For example: https://ws.spraakbanken.gu.se/ws/korp/dev/corpus_config?mode=test
This happens because get_mode
returns None
on FileNotFoundError
:
Lines 3414 to 3418 in 229b274
corpus_config
tries to assign to a key of the returned value: Lines 3365 to 3366 in 229b274
I think this would be easy to fix, but I didn’t yet create a pull request as I don’t know if you have your preferences on what kind of an error message to return.
This will do the call to Karp's /autocomplete
which are currently done in the frontend combined with /lemgram_count
.
Example call:
https://ws.spraakbanken.gu.se/ws/karp/v4/autocomplete?mode=external&q=att&resource=
The /count
endpoint returns an IndexError: list index out of range
when trying to search certain Flashback or Familjeliv subcorpora with (certain) group_by
and group_by_struct
parameters. For example:
https://ws.spraakbanken.gu.se/ws/korp/v8/count?group_by=deprel&group_by_struct=thread_title&cqp=%3Cthread%3E+%5Bpos%20%3D%20%22DT%22%5D&corpus=FLASHBACK-DATOR&default_within=sentence&debug=true
results in the following:
{
"ERROR": {
"type": "IndexError",
"value": "list index out of range",
"traceback": [
"Traceback (most recent call last):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 223, in error_catcher",
" g(*pargs, **kwargs)",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 213, in f",
" for response in generator(args, *pargs, **kwargs):",
" File \"/home/fkkorp/korp-backend/v8/korp.py\", line 1569, in count",
" if group_by[i][0] in split:",
"IndexError: list index out of range"
]
},
"time": 26.713754177093506
}
Does the corpus data perhaps contain something unexpected by /count
? Anyway, I think it would be better if the code were able to handle that without such an internal-looking error.
I got the error with a number of different parameters, though I haven’t tried all combinations:
group_by
: pos
, deprel
, msd
, word
group_by_struct
: thread_title
, text_username
; but not forum_title
cqp
: []
, [pos="VB"]
, [pos="DT"]
, [msd=".*+.*"]
, but not [pos="RO"]
; with or without anchoring to <text>
or <thread>
, but not when anchoring to <forum>
corpus
: FLASHBACK-DATOR
, FLASHBACK-HEM
, FLASHBACK-POLITIK
, FLASHBACK-SAMHALLE
, FAMILJELIV-FORALDER
, FAMILJELIV-KANSLIGA
; but not FLASHBACK-LIVSSTIL
, FLASHBACK-EKONOMI
, FLASHBACK-FORDON
, FLASHBACK-DROGER
, FLASHBACK-KULTUR
, FAMILJELIV-ALLMANNA-KROPP
, FAMILJELIV-GRAVID
, TWITTER
, TWITTER-2015
(with group_by_struct=user_username
), WIKIPEDIA-SV
(with group_by_struct=text_title
)It would seem that larger corpora are more likely to cause the error, but that’s not completely consistent, at least if you only take token count into account. And I couldn’t get the error from other than Flashback and Familjeliv subcorpora.
(I came across this issue by accident when testing different combinations of statistics attributes in the frontend.)
When making some modifications to function get_mode
called by /corpus_config
, I noticed two features that appear currently undocumented:
corpora
, a corpus id may be prefixed by a path to the configuration file (subdirectories under CORPUS_CONFIG_DIR/corpora
): Lines 3441 to 3444 in 229b274
dict
) value with the value of preset
specifying the name of the preset to whose content the rest of the values of the mapping are added. For exapmle:
pos_attributes:
- lemma:
preset: lemma
stringify: lemma
Lines 3485 to 3491 in 229b274
Lines 3517 to 3520 in 229b274
Are these features undocumented because of being experimental or obsolete, or has their documentation been omitted because they are (currently) not used in Språkbanken’s corpus configuration?
Make it possible to get the number of a certain structural element that contains hits, e.g. number of sentences or texts.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.