Coder Social home page Coder Social logo

tokenflow's People

Contributors

cicorias avatar mikehopcroft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tokenflow's Issues

Diff algorithm gives incorrect match in long sequences containing a token.

query = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 8589934592, 11, 2, 13, 14, 15, 16];
prefix = [ 1, 2, 3, 4, 5, 6 ];
expected match = [ 1, 2, 3, 4, 5, 6 ];
observed match = [1, 2, 13, 14, 15, 16];

The problem here is that a sequence of replacements after token 8589934592 involving terms [11, 2, 13, 14,1 15, 16] can be cheaper than deleting [8589934592, 11, 2, 13, 14, 15, 16].

Remove readline-sync dependency

Remove readline-sync dependency. Let's just use the async version for the repl. This will make it easier for webpack to consume this package.

PatternRecognizer should take an Iterable, not a Map

The PatternRecognizer constructor should take an Iterable, not a Map. The Map functionality is only used by the CreateFooRecognizer functions. PatternRecognizer just needs an Iterable of .

Might also want to come up with a less generic name for the Item interface.

Recognizers should transform streams of Tokens

Right now, a Recognizer splits a single, UndefinedToken into an array of Tokens. Consider two changes:

  1. Input and output are both sequences of Token. This allows Recognizers that combine tokens, replace tokens, split tokens, etc. This would help in making a more modular pipeline.
  2. Recognizers could operate on Iterable.
  3. Recognizers should be generators.
  4. Perhaps consider changing the name from Recognizer to something more appropriate like TokenStreamProcessor.

Crash in NumberRecognizer

pipelineDemo('can I get four four cars') causes NumberRecognizer to throw because it checks to see if the output of wordsToNumbers('four four') is a number. The output is '4 4' which is a string.

Predicate for contributed terms

Currently contributed words are provided as sets that can be unioned together. It would be nice if they could be provided as predicates that can be chained. A predicate would be better for NumberRecognizer, since its contributed terms is technically the set of all integers.

One issue with this approach is that the matcher is based on term hashes, not term text. The tokenizers would need to utilize the same stemming and hashing if they were to provide a hash-based predicate, instead of a text-based predicate.

Without this fix, NumberRecognizer is forced to return a small, hard-coded list of integers.

Rename badWords concept

Rename badWords concept. Consider using contributedTerms terminology.

Also rename Recognizer.terms().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.