Extension to handle multilingual content in MediaWiki. The content is represented in an abstract notation. Language-specific renderers translate the abstract content to natural language.
This is not an officially supported Google product.
This is a prototype. Do not use in a public installation. This prototype has severe security issues.
This prototype is meant as a technology exploration for Wikilambda. Wikilambda is described in the following paper:
The easiest intro is probably reading the walkthrough.
An alternate implementation for eneyj: graaleneyj
The simplest example for testing this is from the command line. Try it out:
> node eneyj/src/eneyj.js --lang:en 'negate(false)'
true
> node eneyj/src/eneyj.js --lang:en 'subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, English)'
Wikipedias are encyclopedias.
> node eneyj/src/eneyj.js --lang:en 'subclassification_string_from_n_n_language(n_wikipedia, n_encyclopedia, German)'
Wikipedien sind Enzyklopädien.
The canonical and easiest way to run abstracttext is to use docker as described here: docker support
AbstractText is a light-weight wrapper to allow access to eneyj (see there). AbstractText and eneyj are both not very polished. eneyj is the JavaScript code that actually evaluates the the functions. If you want to get a feel for the code, try eneyj from the command line first.
If using Vagrant:
Need to add:
config.vm.boot_timeout = 600
in line 54 or so in Vagrantfile
Installation: Drop the files in the extensions folder. Also add the files from UniversalLanguageSelector.
Also:
vagrant roles enable codeeditor
Add to LocalSettings:
include_once '/vagrant/LocalSettings.php';
$wgCacheEpoch = max( $wgCacheEpoch, gmdate( 'YmdHis' ) );
wfLoadExtension( 'UniversalLanguageSelector' );
wfLoadExtension( 'AbstractText' );
to start:
cd ~/vagrant
vagrant up
vagrant ssh
To load the data that is alreday available:
php mediawiki/maintenance/importTextFiles.php -s "Import data" --prefix "M:" --overwrite abstracttext/eneyj/data/Z*
See logs:
tail /vagrant/logs/mediawiki-wiki-debug.log
grep AbstractText /vagrant/logs/mediawiki-wiki-debug.log | tail
Run tests (currently there are no tests for the extension):
sudo -u www-data hhvm /vagrant/mediawiki/tests/phpunit/phpunit.php --wiki wiki /vagrant/mediawiki/extensions/AbstractText/tests/phpunit/
Run specific test:
sudo -u www-data hhvm /vagrant/mediawiki/tests/phpunit/phpunit.php --wiki wiki --filter testConcatenateCallFallback /vagrant/mediawiki/extensions/AbstractText/tests/phpunit/
(see also the README in eneyj)
abstracttext's People
Forkers
arthurpsmith thorcik1704 jlcx neotim arminbw lucaswerkmeister cknoll standardgalactic isabella232 ghas-results ftwbetxhabstracttext's Issues
Tests - status?
I've been exploring the "test" features in functions (the lists under Z8K3). I tentatively added a section to display the tests in AbstractTextContent.php for example. However, I'm trying to figure out if how to run a test against an implementation - it looks like they are all run in eneyj/src/scripts/measure.js? I'm thinking of setting up something to try calling a test (or all tests) in the UI so is that a good place to start?
new Dataset type (to apply functions over sets of tables)
Does it make sense to have representation for Datasets ?
Join functions against Datasets(sets of tables(Z200)) ?
REF: https://schema.org/Dataset
Some Wiki pages have multiple tables with further aggregate summary tables (joined, aggregated, summarized, etc.) Functions to allow creating them automatically?
editing?
So, there's still some cleanup to do on the work I've done so far, but the next major thing I was thinking of looking at was the editing UI. The JsonContentHandler we're deferring to right now doesn't seem to do much; I guess I'll look around if there's existing php JSON editors that might be usable as a starting point. Of course what would really be nice is being able to enter zobject's and keys by name via auto-complete etc. Don't know how far I can get on this, but any pointers on what to look for (or avoid) would be appreciated!
Infinite recursion in normal representation of type (Z1K1)?
I’m trying to parse the normal JSON serialization, as described in the specification, and realized I don’t know what the type of an object is supposed to look like in that serialization.
In the JSON files shipped with eneyj, which are in canonical representation, the value of the Z1K1 key (the type) is a string literal like "Z2"
, which IIUC means that it’s a reference – if it was a string value, it would have to be represented as { "Z1K1": "Z6", "Z6K1": "Z2" }
. And in the normal representation, Z1K1 is not a special key (only Z1K2 and Z6K1 are), so the reference (Z9) must be an object. Presumably, the reference ID (Z9K1) of that reference must be an object representing a string value… but what does the type of the reference look like?
{
"Z1K1": {
"Z1K1": {
"Z1K1": {
// ... infinite representation of a reference to Z9?
},
"Z9K1": {
"Z1K1": /* reference to Z6 */,
"Z6K1": "Z9"
}
},
"Z9K1": {
"Z1K1": /* reference to Z6 */,
"Z6K1": "Z2" // actual type name goes here
},
// other keys of Z2 or other type go here
}
I suspect the specification needs to be amended to break this cycle; probably make Z1K1 another key that is serialized as a string literal.
I might be misunderstanding something, though. I tried to see if eneyj has a facility to show the normal serialization of a value, but couldn’t find it.
Cannot evaluate unlinearized (JSON) version of value(project_name) call
The next implementation step for GraalEneyj is probably to implement function calls, and I figured that value(project_name)
seems like a good first call to target: it uses a builtin (so no need to implement custom functions yet) and should be relatively simple overall. GraalEneyj also doesn’t support anything other than the canonical JSON representation yet, so I needed the JSON version of a function call. Eneyj can print that, fortunately:
> .evaluation off
evaluation is off
> .linearization off
linearization is off
> value(project_name)
{
"Z1K1": "Z7",
"Z7K1": "Z36",
"K1": "Z28"
}
However, it looks like Eneyj is itself unable to evaluate that function call:
> {"Z1K1": "Z7", "Z7K1": "Z36", "K1": "Z28"}
error_in_function:
by_key(error(error_in_function, error(zobject_has_no_type, "Z36", "val"), nothing), "Z5K2")
So far, I was under the impression that the Eneyj CLI also accepted the canonical JSON representation, so this should work. Am I doing something wrong, or is this a bug in Eneyj?
value(true) and value(false) produce errors
I assumed that value(true)
and value(false)
would return their argument, but they produce key_not_found errors instead:
$ node eneyj/src/eneyj.js
eneyj v0.1.0
language is set to English
Enter .help for help
> value(true)
{
"Z1K1": "Z15",
"Z1K2": "Z443",
"Z1K3": {
"Z1K1": "Z12",
"Z12K1": [
{
"Z1K1": "Z11",
"Z11K1": "Z251",
"Z11K2": "key_not_found"
}
]
}
}
> value(false)
{
"Z1K1": "Z15",
"Z1K2": "Z443",
"Z1K3": {
"Z1K1": "Z12",
"Z12K1": [
{
"Z1K1": "Z11",
"Z11K1": "Z251",
"Z11K2": "key_not_found"
}
]
}
}
I noticed this when I was implementing “if” in GraalEneyj, figured that if(value(true), "then", "else")
would be a nice test case of a non-constant reference condition, and was surprised to find that eneyj produced "else"
rather than "then"
as I expected:
> {"Z1K1": "Z7", "Z7K1": "Z31", "K1": {"Z1K1": "Z7", "Z7K1": "Z36", "K1": "Z54"}, "K2": "then", "K3": "else"}
else
Special pages?
Hi Denny - I'm wondering if you've had a chance to think about/look at what some of the Special pages should do? I'm particularly interested in:
- WhatLinksHere - show the Zobjects that make use of this object in any way
- Lists by type - list all functions with labels in a particular language, for example
- Other search functionality maybe?
I've taken a peak at the Mediawiki extension documentation on this, but it will take a bit more time to figure out how best to do this I think, so if you have any advice I'd appreciate it!
JSON data table stays empty (Invalid argument supplied for foreach() in JsonContent.php)
Context: At first building the docker container failed on my machine. The culprit was importTextFiles.php which is broken in MediaWiki 1.35.0. Downgrading to MediaWiki 1.34.4 helped, but now I got another problem:
When looking at a Z-Object in the browser, the "JSON data" table stays empty. After a very long loading period (maybe a timeout), I get a warning:
Warning: Invalid argument supplied for foreach() in /var/www/html/includes/content/JsonContent.php on line 142
The raw JSON data table renders just fine. I guess my docker setup needs some more fine-tuning. Which MediaWiki version are you using? And is there anything else I should look into? Do I have to prepare anything specific in the local copy of AbstractText?
Document requirements for minimum implementation/kernel
It sounds like it’s supposed to be possible to write different implementations, or kernels, for eneyj. (The specification talks about implementations, the README calls the contents of src/
the kernel; I assume the two are roughly equivalent.) It would be nice to have some sort of guide, or list of needed things, to get started with a rudimentary kernel.
From my current understanding, this would have to include:
- A parser for the normal JSON serialization. (Depending on your internal representation, this can be just a JSON parser.)
- A parser for the canonical JSON serialization: not a strict requirement by the spec, but without it, you won’t be able to use the objects in
eneyj/data/
, which seem to be using this serialization. (Again, depending on your internal representation, this can be just a JSON parser.) - An emitter for the normal and/or canonical JSON serialization. (This too could be fairly simple depending on your internal representation.) One or the other is probably more useful in that it will allow you to reuse eneyj’s tests to some degree, I haven’t checked yet.
- Implementations for the required builtins.
As a first approximation, the JS-implemented builtins that only have that builtin implementation in the data file:$ for builtin_js in eneyj/src/builtin/*.js; do builtin_basename=${builtin_js#eneyj/src/builtin/}; builtin_z=${builtin_basename%.js}; if [[ $(jq '(.Z8K4 | length) == 1 and .Z8K4[0].Z14K1.Z1K1 == "Z19"' "eneyj/data/$builtin_z.json") == true ]]; then printf '% 4s\n' "$builtin_z"; fi; done Z100 Z26 Z33 Z36 Z37 Z38 Z62
That said, while this list excludes the JS builtins Z64 (head) and Z65 (tail), because they also have non-builtin implementations based on Z190 (by_key), Denny already mentioned that Z190 might in turn depend on builtin Z64 and Z65, so maybe the above list is too short and all (or at least more) of the JS-implemented builtins are required after all.- Some sort of entry point… the simplest version probably accepts/reads a single Z object, evaluates it until reaching the fix point, and then returns/prints the result?
- …
- Profit? No, there’ll certainly be more things.
Question about using named arguments with named references
GraalEneyj is getting closer to calling the native Z56/negate function, but the issue I’m encountering now is a little odd. The native (non-code) implementation of Z56/negate looks like this:
{ "Z1K1": "Z7", "Z7K1": "Z104", "Z31K1": { "Z1K1": "Z18", "Z18K1": "Z56K1" }, "Z31K2": "Z55", "Z31K3": "Z54" }
That is, call Z104/if_boolean with the first argument as the condition, Z55/false as the consequent, and Z54/true as the alternative.
negate(x) => x ? false : true
. The function being called, Z104/if_booolean, is a Z9/reference to the Z31/if function.I assume that in eneyj, this is done more or less by placing Z31K1...Z31K3 in some kind of “stack” of contexts, and when Z31 is ultimately called, it “walks” up this stack until it finds its arguments (Z31K1...Z31K3). (At some point in between, alpha conversion happens, see also #3.) But in GraalEneyj, function calls are (currently) parsed rather differently: we recognize a Z7/function_call at parse time, collect the function being called (Z7K1/function) and all of its arguments (either K1...Kn, or, if the Z7K1/function is a reference Zabc, ZabcK1...ZabcKn), and then have a purely positional function call with n argument nodes in the AST. This means that the above Z56/negate implementation can’t be parsed: since the parser doesn’t know the relationship between Z104/if_boolean and Z31/if, it has no idea that Z31K1...Z31K3 are argument to the function calls and not just arbitrary JSON keys, and Z104 / Z31 will ultimately be called with no arguments.
I’m sure it’s possible to make GraalEneyj support this pattern, and it can probably be made efficient, too, if you know a bit more about Truffle than I do at the moment (mumble mumble frame slots mumble mumble). My question is basically, do I need to support this, or can I avoid it 😆
One curious consequence of the eneyj behavior is that a function can call another function with different arguments (probably even with a different number of arguments), depending on that function’s identity. Consider the following anonymous function:
{"Z1K1": "Z8", "Z8K1": [{"Z1K2": "K1", "Z17K1": "Z1"}], "Z8K2": "Z1", "Z8K4": [{"Z1K1": "Z14", "Z14K1": {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z18", "Z18K1": "K1"}, "Z36K1": "Z28", "Z56K1": "Z54"}}]} // ^ one argument ^ one implementation ^ call ^ the first argument ^ w/ proj. name ^ or w/ true
This function receives one argument, and calls it as a function. If that function is Z36/value, it will be called with Z28/project_name; if it’s Z56/negate, it will see Z54/true as the single argument; otherwise, it will see no arguments and the result will be a lambda (the unapplied function, but having lost its identity).
Observe (the three inputs differ only at the very end):
> {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z8", "Z8K1": [{"Z1K2": "K1", "Z17K1": "Z1"}], "Z8K2": "Z1", "Z8K4": [{"Z1K1": "Z14", "Z14K1": {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z18", "Z18K1": "K1"}, "Z36K1": "Z28", "Z56K1": "Z54"}}]}, "K1": "Z36"} eneyj > {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z8", "Z8K1": [{"Z1K2": "K1", "Z17K1": "Z1"}], "Z8K2": "Z1", "Z8K4": [{"Z1K1": "Z14", "Z14K1": {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z18", "Z18K1": "K1"}, "Z36K1": "Z28", "Z56K1": "Z54"}}]}, "K1": "Z56"} false > {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z8", "Z8K1": [{"Z1K2": "K1", "Z17K1": "Z1"}], "Z8K2": "Z1", "Z8K4": [{"Z1K1": "Z14", "Z14K1": {"Z1K1": "Z7", "Z7K1": {"Z1K1": "Z18", "Z18K1": "K1"}, "Z36K1": "Z28", "Z56K1": "Z54"}}]}, "K1": "Z53"} λ(boolean Z53K1, boolean Z53K2) → boolean
Is this a feature? Is this something we actually want?
ReferenceError in measure.js
I'm not sure yet why
measure.js
is running into an issue on one of my systems, but it does seem like line 153 is referencing something that hasn't been defined.Z157 Z157T1 Z157C1 1 ms (Ø 0 ms in 0 runs) Z157T1 Z157C2 18 ms (Ø 0 ms in 0 runs) Z157 Z157T1 Z157C2 /home/jamie/src/abstracttext/eneyj/src/scripts/measure.js:153 console.log(write(call)) ^ ReferenceError: call is not defined at Object.<anonymous> (/home/jamie/src/abstracttext/eneyj/src/scripts/measure.js:153:27) at Module._compile (internal/modules/cjs/loader.js:1200:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:1220:10) at Module.load (internal/modules/cjs/loader.js:1049:32) at Function.Module._load (internal/modules/cjs/loader.js:937:14) at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12) at internal/main/run_main_module.js:17:47
Discussion - namespace of functions and other types
Hello,
Great work. I do not know if it is a study question, but I was wondering: why functions and other types end up in the same namespace?
It seems to me that these two groups of objects are very different:
- ZID1 -> "type": something (where something's type is type)
- ZID2 -> "type":function
Also, this seems as another special kind of object:
- ZID3 -> "type":type
I was thinking about how to make it clearer. One option is different namespaces for types, functions and other classes.
Other is to have an additional property/relation/key:
- ZID1 -> "primitive class": "instance of a type" ; type: something
- ZID2 -> "primitive class": "type" ; type:type
- ZID3 -> "primitive class": "instance of a function" ; type: function
The idea is to avoid mixing of primitive and non-primive types.
(I am still getting to understand the details of the project. If this does not make sense, please, disregard it).
Document meaning of alpha/beta functions
I’m trying to understand how eneyj is implemented, and a lot of things seem straightforward enough (builtin Z33, same: stringify operands then check string equality; evaluate Z10, list: evaluate head and tail), but the “alpha” and “beta” family of functions look intimidating, and I haven’t found any documentation or comments for them yet. @vrandezo is there some way to summarize what they do or which purpose they serve, to get me started?
Representing types
I talked with @cyrus- and he recommended a few books and papers that might help with getting the eneyj system right, so I started reading "Types and Programming Languages" by @bcpierce00
Good thing, reading the first 100 pages, everything it was talking about was implemented. Yay!
But then it came to typing functions, and it seems that just typing them as Z8 Function is insufficient (which did bite me a few times already). So what we need is to type them in a generic way, including the return type and the argument types.
And then, if we need generic types anyway, well, we can also use them for Z10 List etc.
So, how to do generic types? Here's the suggestion.
Turn all generic types (which includes Z2 Pair, Z10 List, and Z8 Function) into functions that return a Z4. And then the type is something like Function(Boolean, [Boolean, Boolean]) for And.
But now if the signature includes the type, what about the argument declaration? They are still useful for giving the arguments names. But they are not really necessary anymore?
Any thoughts?
How to handle transliteration in some languages?
Pinyin (Pin Yin "spell sound") is a transliteration to handle Romanization for Chinese Mandarin.
Example: https://www.wikidata.org/wiki/Property:P1721
The option of transliteration (in BOLD) is shown in the following examples:
water -> shuǐ -> 水
liquid water -> yètài shuǐ -> 液态水水 -> shuǐ -> water
液态水 -> yètài shuǐ -> liquid waterPerhaps it's best that this is read from mappings already directly applied to Chinese Lexeme Senses as demonstrated here:
https://www.wikidata.org/wiki/Lexeme:L8219#S1
Translations are covered by Wikidata's Sense Statements as evidenced here:
https://www.wikidata.org/wiki/Lexeme:L3302But Transliterations (Romanizations) are not documented well on Wikidata, it seems currently.
This is probably a documentation improvement that is needed on Wikidata's side for "How best to apply transliteration for Lexemes and Senses"?References:
"water" en Sense https://www.wikidata.org/wiki/Lexeme:L3302
"liquid water" en Concept https://www.wikidata.org/wiki/Q29053744
"liquid" en Concept https://www.wikidata.org/wiki/Q11435Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.