Coder Social home page Coder Social logo

stanford-oval / genie-toolkit Goto Github PK

View Code? Open in Web Editor NEW
186.0 23.0 34.0 42.51 MB

The Genie open source kit for voice assistant (formerly known as Almond)

License: Apache License 2.0

JavaScript 30.93% Shell 0.55% HTML 1.46% Makefile 1.34% Python 0.27% TypeScript 65.45%
hacktoberfest natural-language semantic-parsers nlp voice-assistant

genie-toolkit's Introduction

Genie

Build Status Coverage Status Dependency Status Language grade: JavaScript Discord Discourse status

This repository hosts Genie, a toolkit which allows you to quickly create new semantic parsers that translate from natural language to a formal language of your choice.

Genie was described in the paper:

Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
Giovanni Campagna (*), Silei Xu (*), Mehrad Moradshahi, Richard Socher, and Monica S. Lam
In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019), Phoenix, AZ, June 2019.

If you use Genie in any academic work, please cite the above paper.

Installation

Genie depends on additional libraries, including the ThingTalk library and the GenieNLP machine learning library. See doc/install.md for details and installation instructions.

License

This package is covered by the Apache 2.0 license. See LICENSE for details. Note that this package depends on several nodejs modules by third-parties, each with their own license. In particular, some modules might have licensing requirements that are more restrictive than Genie's. It is your responsability to comply with Genie's copyright license, as well as all licenses of included dependencies.

Reproducing The Results In Our Papers

To reproduce the machine learning results in Stanford papers that use Genie (including the PLDI 2019 paper and the ACL 2020 paper), please use the associated artifacts, available for download from our website. The artifact includes all the necessary datasets (including ablation and case studies), pretrained models and evaluation scripts. Please follow the instructions in the README file to reproduce individual experiments.

Using Genie

Genie Concepts

Genie is a synthesis-based tool to build dialogue agents. Genie is based on the Genie template language, which succintly defines a space of synthesized sentences. Genie can use the template language to generate a dataset, then sample a subset of sentences to paraphrase using crowdsourcing. Commonly, the template language is paired with a skill definition, entered in a repository like Thingpedia, which defines the APIs available to the dialogue agent.

A Turnkey Solution For Genie+Almond

A all-in-one solution to use Genie to extend ThingTalk with new skills and new templates is provided by almond-cloud. Please refer to its documentation for installation instructions.

After installation, administrators can create new natural language models, trigger automated training and deploy the trained models to any Almond system.

Manual Genie Usage

If one wants to avoid the complexity of setting up a database and web server, it is possible to invoke Genie manually from the command-line, and have it manipulate datasets stored as TSV/CSV files.

A number of tutorials are included in the doc/ folder, describing common Genie usage.

NOTE: Genie assumes all files are UTF-8, and ignores the current POSIX locale (LC_CTYPE and LANG enviornment varialbes). Legacy encodings such as ISO-8859-1 or Big5 are not supported and could cause problems.

Modifying ThingTalk

If you want to also extend ThingTalk (with new syntax or new features) you will need to fork and modify the library, which lives at https://github.com/stanford-oval/thingtalk. After modifying the library, you can use npm link to point the almond-cloud installation to your library. You must make sure that only one copy of the ThingTalk library is loaded (use npm ls thingtalk to check).

genie-toolkit's People

Contributors

aashnagarg avatar ad31c0 avatar anggriosutopo avatar dependabot-preview[bot] avatar dependabot[bot] avatar elvisyjlin avatar emilyjchang avatar euirim avatar gcampax avatar greenkeeper[bot] avatar greentfrapp avatar hselin avatar hzyjerry avatar jack57lee avatar jgd5 avatar jmhw0123 avatar johnnychhsu avatar jqxue avatar krishjainx avatar maitrella avatar mehrad0711 avatar nrser avatar rakeshr1 avatar rickygv99 avatar ryachen01 avatar ryanothnielkearns avatar s-jse avatar satojk avatar sileix avatar valkjsaaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genie-toolkit's Issues

Typos in Thingpedia Dataset

I grabbed the TT dataset by

genie download-dataset -o dataset.tt

Found a typo in it:

  • action := @com.imgur.upload(to_gallery=false, picture_url=$undefined, title=$undefined)
    "upload a orivate picture to Imgur" should be "upload a private picture to Imgur"

Heuristics for slot-filling measures & currencies

If the dialog agent asks for a currency or measure and the user provides a number, we should heuristically convert it.
Potentially, this could include a translatable annotation #_[default_unit] associated with the parameter indicating the default unit to use (which would be locale-dependent).

An in-range update of coveralls is breaking the build 🚨

The devDependency coveralls was updated from 3.0.3 to 3.0.4.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

coveralls is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

Commits

The new version differs by 5 commits.

  • 8ac4325 version bump
  • 9d9c227 Bump extend from 3.0.1 to 3.0.2 (#226)
  • 33119a7 Bump js-yaml from 3.11.0 to 3.13.1 (#225)
  • f5549c7 Bump handlebars from 4.1.0 to 4.1.2 (#224)
  • 4df732b Style fix (#211)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Datasets with context

Very soon, sentences will stop being interpreted in the vacuum, and will start being interpreted in context.
Hence, our dataset will grow from [id, sentence, program] to [id, context, sentence, program].

We need to:

  • Define the language of contexts; probably a combination of ThingTalk, some stuff to talk about the current result, and the ability to concatenate multiple entries
  • Extend file formats, parsers and stringifiers accordingly
  • Add data augmentation passes to take non-contextual datasets and turn them into contextual datasets (naively, by sticking random contexts in front - maybe we can do better?)

Can't generate synthetic sentences

I follow [README]( the https://github.com/stanford-oval/genie-toolkit/blob/master/README.md.)
The environment: linux . Nodejs: 10.4.1
The command I used
yarn global add genie-toolkit
pip install genienlp
cd Our_Almond
git clone https://github.com/stanford-oval/genie-toolkit.git
cd genie-toolkit
npm install
genie download-dataset -o dataset.tt
genie download-snapshot -o thingpedia.tt --entities entities.json

genie generate --locale en --template /home/user/Our_Almond/genie-toolkit/languages/thingtalk/en/thingtalk.genie --thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt -o synthesized.tsv

The error I faced:

stats: size(charts[0][out_param_Array__String]) = 4
stats: size(charts[0][both_prefix]) = 2
Error expanding rule NT[preposition_filter] -> here
{ AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(pname instanceof Ast.Value.VarRef)

    at Object.makeFilter (/home/user/Our_Almond/genie-toolkit/languages/thingtalk/utils.js:40:5)
    at Object.makeFilter (/home/user/Our_Almond/genie-toolkit/languages/thingtalk/ast_manip.js:160:18)
    at $grammar.addRule.$runtime.simpleCombine (eval at parse (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/index.js:122:25), <anonymous>:428:85)
    at Function.combine (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:209:29)
    at /home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:255:35
    at expandRule (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:985:18)
    at Grammar.generate (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:806:25)
    at SentenceGenerator.generate (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:47:23)
    at _initialization.then (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:183:29)
  generatedMessage: true,
  name: 'AssertionError [ERR_ASSERTION]',
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '==' }
/home/user/.config/yarn/global/node_modules/genie-toolkit/tool/genie.js:13
process.on('unhandledRejection', (up) => { throw up; });
                                           ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(pname instanceof Ast.Value.VarRef)

    at Object.makeFilter (/home/user/Our_Almond/genie-toolkit/languages/thingtalk/utils.js:40:5)
    at Object.makeFilter (/home/user/Our_Almond/genie-toolkit/languages/thingtalk/ast_manip.js:160:18)
    at $grammar.addRule.$runtime.simpleCombine (eval at parse (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/index.js:122:25), <anonymous>:428:85)
    at Function.combine (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:209:29)
    at /home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:255:35
    at expandRule (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:985:18)
    at Grammar.generate (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/runtime.js:806:25)
    at SentenceGenerator.generate (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:47:23)
    at _initialization.then (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:183:29)
Emitted 'error' event at:
    at _initialization.then.catch (/home/user/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:188:18)

I would appreciate it if you could take a moment to explain something to me

An in-range update of mmap-io is breaking the build 🚨

The dependency mmap-io was updated from 1.0.0 to 1.1.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

mmap-io is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

Commits

The new version differs by 6 commits.

  • 29070c9 Rewrite LS to TS. offs_t => size_t. Rm "make" dep
  • ad5752d Merge pull request #15 from bmarkiv/master
  • 6b6c73c node::Buffer::New allows to map buffers up to 2GB (2GB - 64KB in my test)
  • 8b3fb45 In order to support file size over 4GB type off_t used for offset parameter has to be changed to size_t.
  • dde403a Speculative changes for #14
  • 62dc36c Travis CI adjustments

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Enable "extended_timers" flag in tests

Having a flag that is off in tests means we have code we never test, and therefore is broken by definition. We should turn the flag on, and fix the resulting test failures.

Add progress estimation for sentence generation

On a large Thingpedia, sentence generation can take a while if cranked up to the optimal parameters. It would be nice to have some measure of progress while generation is happening, so we can estimate ETAs and give updates on the website.

Stop preprocessing locations

Empirically, Stanford Corenlp NER model is awful. Plus, if we get rid of the NER we can shave a lot of processing from almond-tokenizer and maybe switch to a simpler implementation of the tokenizer that does not use Java.

Example of template.genie and thingpedia.tt

Hi, in the docs, it shows how to synthesize a set of sentences using

genie generate --locale en --template template.genie
--thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt
-o synthesized.tsv

however, is there an example template.genie and thingpedia.tt how it looks like?

Primitive templates with aggregates are not supported

Error: NOT IMPLEMENTED: Aggregation(Filter(Invocation(Invocation(Device(org.thingpedia.cardiology.doctor, , ), readings
at betaReduceTable (/home/almond-training/almond-cloud/node_modules/genie-toolkit/lib/sentence-generator/ast_manip.js:157:11)
at program.rules.map (/home/almond-training/almond-cloud/node_modules/genie-toolkit/lib/sentence-generator/ast_manip.js:182:39)
at Array.map ()
at betaReduceProgram (/home/almond-training/almond-cloud/node_modules/genie-toolkit/lib/sentence-generator/ast_manip.js:177:36)

An in-range update of thingpedia is breaking the build 🚨

The dependency thingpedia was updated from 2.6.0-alpha.1 to 2.6.0-alpha.2.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

thingpedia is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

Commits

The new version differs by 15 commits.

  • e6950d0 v2.6.0-alpha.2
  • 791e293 Update dependencies
  • e64627e Merge pull request #42 from stanford-oval/wip/import-device-factory
  • 9ff085c FileThingpediaClient: fix getExamplesByKinds
  • f069edc Rename DeviceFactoryUtils to DeviceConfigUtils
  • 592f3f2 Merge pull request #41 from stanford-oval/wip/jsdoc
  • 3b03cec Add more tests for the device factory utils module
  • 1337835 Simplify interface for getDeviceFactory
  • bc7fefc Import utilities to create device factories from almond-cloud
  • 72855be doc: use @namespace not @module
  • 8c860d9 Make documentation look better
  • bfa51db Fix test to match the documentation
  • 0d57bfd Add jsdocs to the helpers
  • 04a43f4 Add a pass of jsdocs
  • 5a65df3 Add infrastructure to generate jsdoc

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Can't generate synthetic sentences

Hi, I follow the guidelines on: https://github.com/stanford-oval/genie-toolkit/blob/master/README.md.
But I was stuck at Step 1: which is generating the synthetic sentences.
The command I used: genie generate --locale en --template ~/genie-toolkit/languages/thingtalk/en/thingtalk.genie --thingpedia thingpedia.tt --entities entities.json --dataset dataset.tt -o synthesized.tsv
The environment: MacOS Mojave version 10.14.6. Nodejs: v10.19.0
The error I faced:

Batched schema-meta request for org.thingpedia.builtin.thingengine.builtin

Loaded 139 devices
Loaded 997 templates
Loaded type Entity__almond.sports__am_soccer_leagues as generic entity
Loaded type Entity__almond.sports__eu_soccer_leagues as generic entity
Loaded type Entity__almond.sports__nhl_team as generic entity
Loaded type Entity__com.facebook__id as id type
Loaded type Entity__com.gmail__email_id as non-constant type
Loaded type Entity__com.gmail__thread_id as non-constant type
Loaded type Entity__com.google.contacts__contact_name as non-constant type
Loaded type Entity__com.google.drive__file_id as non-constant type
Loaded type Entity__com.icanhazdadjoke__id as id type
Loaded type Entity__com.imgur__album_id as non-constant type
Loaded type Entity__com.instagram__filter_ as generic entity
Loaded type Entity__com.live.onedrive__file_id as non-constant type
Loaded type Entity__com.live.onedrive__user_id as non-constant type
Loaded type Entity__com.steampowered__user_id as non-constant type
Loaded type Entity__com.thecatapi__image_id as non-constant type
Loaded type Entity__com.twitter__id as id type
Loaded type Entity__com.youtube__channel_id as non-constant type
Loaded type Entity__com.youtube__video_id as non-constant type
Loaded type Entity__dogapi__image_id as non-constant type
Loaded type Entity__gov.nasa__asteroid_id as non-constant type
Loaded type Entity__gov.nasa__curiosity_rover_camera as generic entity
Loaded type Entity__imgflip__meme_id as generic entity
Loaded type Entity__instagram__media_id as non-constant type
Loaded type Entity__omlet__feed_id as non-constant type
Loaded type Entity__org.arxiv__category as generic entity
Loaded type Entity__org.freedesktop__app_id as generic entity
Loaded type Entity__org.wikidata__human as generic entity
Loaded type Entity__org.wikidata__league as generic entity
Loaded type Entity__rest.kanye__id as id type
Loaded type Entity__sportradar__eu_soccer_team as generic entity
Loaded type Entity__sportradar__eu_tournament as generic entity
Loaded type Entity__sportradar__mlb_team as generic entity
Loaded type Entity__sportradar__nba_team as generic entity
Loaded type Entity__sportradar__ncaafb_team as generic entity
Loaded type Entity__sportradar__ncaambb_team as generic entity
Loaded type Entity__sportradar__nfl_team as generic entity
Loaded type Entity__sportradar__us_soccer_team as generic entity
Loaded type Entity__sportradar__us_tournament as generic entity
Loaded type Entity__tt__country as generic entity
Loaded type Entity__tt__cryptocurrency_code as generic entity
Loaded type Entity__tt__currency_code as generic entity
Loaded type Entity__tt__iso_lang_code as generic entity
Loaded type Entity__tt__mime_type as generic entity
Loaded type Entity__tt__stock_id as generic entity
Loaded type Entity__tt__contact as non-constant type
Loaded type Entity__tt__contact_name as non-constant type
Loaded type Entity__tt__device as non-constant type
Loaded type Entity__tt__email_address as non-constant type
Loaded type Entity__tt__flow_token as non-constant type
Loaded type Entity__tt__function as non-constant type
Loaded type Entity__tt__hashtag as non-constant type
Loaded type Entity__tt__path_name as non-constant type
Loaded type Entity__tt__phone_number as non-constant type
Loaded type Entity__tt__picture as non-constant type
Loaded type Entity__tt__program as non-constant type
Loaded type Entity__tt__url as non-constant type
Loaded type Entity__tt__username as non-constant type
{ SyntaxError: Expected "$", "$root", "context", "for", "if", "import", comment, end of input, end of line, identifier, or whitespace but "{" found.
    at peg$buildStructuredError (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:610:12)
    at Object.peg$parse [as parse] (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:7180:11)
    at parse (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/index.js:84:34)
  message:
   'Expected "$", "$root", "context", "for", "if", "import", comment, end of input, end of line, identifier, or whitespace but "{" found.',
  expected:
   [ { type: 'other', description: 'whitespace' },
     { type: 'other', description: 'end of line' },
     { type: 'other', description: 'comment' },
     { type: 'other', description: 'identifier' },
     { type: 'literal', text: '$root', ignoreCase: false },
     { type: 'literal', text: '$', ignoreCase: false },
     { type: 'other', description: 'identifier' },
     { type: 'literal', text: '$root', ignoreCase: false },
     { type: 'literal', text: '$', ignoreCase: false },
     { type: 'literal', text: 'context', ignoreCase: false },
     { type: 'literal', text: 'for', ignoreCase: false },
     { type: 'literal', text: 'if', ignoreCase: false },
     { type: 'literal', text: 'import', ignoreCase: false },
     { type: 'end' } ],
  found: '{',
  location:
   { start: { offset: 2014, line: 84, column: 1 },
     end: { offset: 2015, line: 84, column: 2 } },
  name: 'SyntaxError' }
{ SyntaxError: Expected "$", "$root", "context", "for", "if", "import", comment, end of input, end of line, identifier, or whitespace but "{" found.
    at peg$buildStructuredError (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:610:12)
    at Object.peg$parse [as parse] (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:7180:11)
    at parse (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/index.js:84:34)
  message:
   'Expected "$", "$root", "context", "for", "if", "import", comment, end of input, end of line, identifier, or whitespace but "{" found.',
  expected:
   [ { type: 'other', description: 'whitespace' },
     { type: 'other', description: 'end of line' },
     { type: 'other', description: 'comment' },
     { type: 'other', description: 'identifier' },
     { type: 'literal', text: '$root', ignoreCase: false },
     { type: 'literal', text: '$', ignoreCase: false },
     { type: 'other', description: 'identifier' },
     { type: 'literal', text: '$root', ignoreCase: false },
     { type: 'literal', text: '$', ignoreCase: false },
     { type: 'literal', text: 'context', ignoreCase: false },
     { type: 'literal', text: 'for', ignoreCase: false },
     { type: 'literal', text: 'if', ignoreCase: false },
     { type: 'literal', text: 'import', ignoreCase: false },
     { type: 'end' } ],
  found: '{',
  location:
   { start: { offset: 2014, line: 84, column: 1 },
     end: { offset: 2015, line: 84, column: 2 } },
  name: 'SyntaxError' }

/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/tool/genie.js:13
process.on('unhandledRejection', (up) => { throw up; });
                                           ^
SyntaxError: Expected "$", "$root", "context", "for", "if", "import", comment, end of input, end of line, identifier, or whitespace but "{" found.
    at peg$buildStructuredError (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:610:12)
    at Object.peg$parse [as parse] (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/grammar.js:7180:11)
    at parse (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/genie-compiler/index.js:84:34)
Emitted 'error' event at:
    at _initialization.then.catch (/Users/ThoNguyen/.config/yarn/global/node_modules/genie-toolkit/lib/sentence-generator/index.js:660:18)

Can you please point me where i'm wrong and how to fix it. Thanks in advanced

Add templates for the bookkeeping language

Bookkeeping commands (train, help, cancel, stop, go back, etc.) are necessary to any virtual assisstant, but so far training them has been annoying convoluted, requiring manual messing with the dataset from curl or mysql.
We should enable writing them quickly in template form, and then load on them.

All users of Genie will benefit, and it will be our first target language that is not ThingTalk.

Fix generation of table joins

With the fix in commit b95bcd0, we disallow joins with a filter containing an argument from the lhs that also appears in the rhs. In particular, this makes it impossible to do self joins.

gen_sentences: handle projections correctly

If a Thingpedia example is a projection, it should be included in the "projection_XXX" grammar category.

Also, the NN should be allowed to synthesize projections for functions with many fields (> 2 ? > 3 ? )

Errors while trying to train

Hi, I'm trying to train a model to reproduce the results of the PLDI 2019 paper. If followed the instructions in the "Reproducing the results the PLDI 2019 paper" page and I downloaded the dataset, but when I started training using the genie train --datadir dataset --outputdir model --workdir temp --config-file data/full.json commad, I got the following error:

  0%  [..............................] - ETA: 0s/Users/user/.config/yarn/global/node_modules/genie-toolkit/tool/genie.js:13
process.on('unhandledRejection', (up) => { throw up; });

Error: Command exited with code 2
    at ChildProcess.child.on (/Users/user/.config/yarn/global/node_modules/genie-toolkit/lib/training/exec-utils.js:37:28)
    at ChildProcess.emit (events.js:197:13)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:254:12)

Did I do something wrong?
Thanks!

Tokenizer protocol must change

Keeping a single long-running connection to the tokenizer means we don't get any effective load balancing between tokenizer instances, so if one is overrun with the requests, a poor NLP server that is also connected to that tokenizer instance will go down, while other tokenizers sit idle.

Augmentation is failing in almond-cloud staging

TypeError: Cannot read property 'schema' of null
at ParameterReplacer._getParamListKey (/opt/almond-cloud/node_modules/genie-toolkit/lib/replace_parameters.js:224:26)
at ParameterReplacer._sampleParam (/opt/almond-cloud/node_modules/genie-toolkit/lib/replace_parameters.js:282:39)
at ParameterReplacer._replaceTokensInSentence (/opt/almond-cloud/node_modules/genie-toolkit/lib/replace_parameters.js:376:44)
at promises.push (/opt/almond-cloud/node_modules/genie-toolkit/lib/replace_parameters.js:534:54)
at ParameterReplacer.process (/opt/almond-cloud/node_modules/genie-toolkit/lib/replace_parameters.js:560:15)
at process._tickCallback (internal/process/next_tick.js:68:7)

"Take inspiration" from Mycroft skills for system-oriented skills

Mycroft supports a very limited number of skills (listed here) yet it is very popular.

Some of the skills are just a way to interact with the system, like controlling the volume, recording audio, setting alarms and reminders.
Others are easter eggs and playful things that we should have regardless, like laughing.

This issue is about looking at what skills Mycroft has, figuring out which one are simple and make sense for Almond across the board, implementing them and making sure the experience on Almond is as close as that of a standard assistant (no leaking of when-get-do or other Almondities).

The exact matcher does not work for Entity and Location parameters

You cannot quote those parameters, because if you quote them, they become QUOTED_STRING not Entity or Location. And you cannot just recognize them using the exact matcher.

The exact matcher should, when loading from the database, recognize the use of GENERIC_ENTITY_ and LOCATION_ tokens (which the tokenizer no longer produces), replace them with QUOTED_STRING_ and renumber.

(This applies to the part of the exact matcher that comes of the synthetic set. From Train Almond, we map it to a wildcard token in the trie so we have no problem)

Incorrect generation for sentences with example parameters in filter

From the staging logs:

Aug 13 18:24:28 parmesan node[12097]: job 198: hey almond who got given name QUOTED_STRING_0 .
Aug 13 18:24:28 parmesan node[12097]: job 198: [object Object]
Aug 13 18:24:28 parmesan node[12097]: job 198: undefined
Aug 13 18:24:28 parmesan node[12097]: job 198: now => (@org.wikidata.person()), (id == x && P735 =~ __const_QUOTED__STRING_0) => notify;
Aug 13 18:24:28 parmesan node[12097]: job 198: TypeError: No variable x in schema, have P735,P734,P1477,P1449,P21,P27,P569,P570,P19,P20,P1196,P509,P119,P18,P109,P22,P25,P3373,P26,P40,P1971,P103,P106,P108,P39,P166,P1411,P69,P512,P551,P937,P102,P1576,P172,P140,P2048,P2067,P1853,P1050,P1429,P552,P410,P856,P737,P641,P54,P647,P1618,P286,P1344,P1830,P3828,P136,P264,P1303,P184,P185,P2002,P2003,P2013,P2847,P3265,P2397,P4265,P4411,P4013,P3984,id
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.valueToNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:302:23)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.cnfFilterToNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:430:155)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.tableToNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:620:32)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.commandToNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:747:46)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.programToNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:801:29)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at ToNNConverter.toNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/tonn_converter.js:843:25)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at Object.toNN (/opt/thingengine/node_modules/thingtalk/lib/nn-syntax/index.js:34:22)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at BasicSentenceGenerator._output (/opt/thingengine/node_modules/genie-toolkit/lib/sentence-generator/index.js:527:33)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at BasicSentenceGenerator._minibatch (/opt/thingengine/node_modules/genie-toolkit/lib/sentence-generator/index.js:514:23)
Aug 13 18:24:28 parmesan node[12097]: job 198:     at _initialization.then (/opt/thingengine/node_modules/genie-toolkit/lib/sentence-generator/index.js:496:46)

Check availability of devices before executing a command

If a device is not available (e.g. it is a physical device and it is off or disconnected, or the authentication is no longer valid), we should inform the user nicely and maybe prompt the user to configure the device again (= refresh the access token), instead of trying the command and failing miserably.

This is something I've meant to do for a long time, and it is also the reason why devices have a checkAvailable method. I'm opening this issue so we don't forget again...

Add localization info to internationalization support

Information such as default measurement units, default currency, time and date style, etc. are locale specific but are not language specific. They should be available as parameters to templates, so the same templates can be used for, say, British, Irish and American English, without rewriting from scratch.

Add a contact manager

In Web Almond, we don't have a single primary address book to refer to.
Instead, we might have devices that can provide address book functionality, such as Google Contacts.

We should multiplex between them, like we multiplex between messaging devices.

What does --template mean?

Hi,
In your example for generating synthetic set you passed --template template.genie to the generate command. What is the template flag? I'm finding it hard to figure out what needs to be in the template.genie file.

When I used the --help flag I got the following output:
--template TEMPLATE Path to file containing construct templates, in Genie syntax.,
but I still didn't understand what the template flag means. Do I need to create the file myself and if so what does it need to contain, or can it be generated automatically?

Thanks!

Contextual templates

While templates probably won't be the end-all solution to data generation for contextual, at least we know some stuff can be represented in template form, for example filters after a policy request, follow up actions, or turning an immediate program into a monitor.

We should let developers express contextual templates, with good sampling that does not bias the current context.

This will be a follow up of #5

Handle thingpedia.json from thingpedia api

The format Genie asks for is different from the thingpedia.json retrieved from thingpedia api.
We could either
(1) add an option to api to return the needed format (also with proper indents for better readability)
(2) allow Genie to take both formats

Parallelize sentence generation

Sentence generation is turning into a bottleneck with the production Thingpedia, with a full generation and augmentation cycle taking several hours.

While we might be able to tweak the code and make it run faster on a single processor, it won't be easy, whereas I think it would be a lot more beneficial to parallelize the generation across multiple CPUs.

Add "chatty" builtin queries

People like to chat with their virtual assistants.
We do not need to do anything fancy like end-to-end trained chatbots, just recognize a few common intents and provide a couple of hard-coded responses.

Examples:

  • "who are you?"
  • "how old are you?"
  • "i love you"
  • "are you smart?"
  • "are you happy?"

etc.

Review Request: Traditional Chinese Dataset

Hi all,

I've made a small Traditional Chinese dataset, which contains 110+ utterances. And I ran the synthetic sentence generation on it. For those who understand Chinese, please have a look here (https://goo.gl/PeoWuY).

  • dataset.zh-tw.part0.tt is the Traditional Chinese dataset.
  • synthetic.zh-tw.part0.tsv is the synthetic sentences.

The translation of all sentences will be continued in the following days.

An in-range update of thingtalk is breaking the build 🚨

The dependency thingtalk was updated from 1.9.0-alpha.4 to 1.9.0-beta.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

thingtalk is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

Commits

The new version differs by 19 commits.

  • dfc678e v1.9.0-beta.1
  • 917b255 Merge pull request #162 from stanford-oval/wip/device-selector
  • 1f8cca3 program: deep clone attributes in selectors
  • d32eaa2 ArrayIndexSlot: use "." to separate name from index
  • 635777d typecheck: allow passing a "null" scope to typeCheckInputArgs
  • e9fa3a9 Unify the two copies of cleanKind
  • 883d839 Fix lint
  • 0255957 typecheck: actually enforce that we do not parameter pass into a device attribute
  • 7759ad3 Add compiler support for device attributes
  • 341a7aa Add a couple more tests
  • a7ffe83 Add describe support for device selectors
  • a9cb280 Add iteration APIs for device attributes
  • 7b5037e Add NN syntax for device selectors
  • 3aa9bd6 Add "all" device attribute
  • f9953a4 Add device attributes to device selectors

There are 19 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Allow disabling rules

@monicalam says:

"we have to give the choice of making the rule inactive. I want to able to stop it, without having to rewrite the whole rule again."

Some of the code to dynamically enable and disable rules, but currently all rules are automatically enabled. We might need database changes too to persist this information.

An in-range update of eslint is breaking the build 🚨

The devDependency eslint was updated from 6.2.0 to 6.2.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

Release Notes for v6.2.1
  • 8c021b5 Upgrade: eslint-utils 1.4.2 (#12131) (Toru Nagashima)
  • e82388b Sponsors: Sync README with website (ESLint Jenkins)
  • 4aeeeed Docs: update docs for ecmaVersion 2020 (#12120) (silverwind)
  • 6886148 Docs: Add duplicate keys limitation to accessor-pairs (#12124) (Milos Djermanovic)
Commits

The new version differs by 6 commits.

  • 9cadb59 6.2.1
  • 22b7802 Build: changelog update for 6.2.1
  • 8c021b5 Upgrade: eslint-utils 1.4.2 (#12131)
  • e82388b Sponsors: Sync README with website
  • 4aeeeed Docs: update docs for ecmaVersion 2020 (#12120)
  • 6886148 Docs: Add duplicate keys limitation to accessor-pairs (#12124)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Recognize "preference-setting" commands

Users like to say:

  • "call me ___(name)"
  • "speak in ___(language)"

and similar

These should map to appropriate actions in @builtin to change the preferences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.