Coder Social home page Coder Social logo

dawg-set's Introduction

Dawg-Set

Javascript directed acyclic word graph (DAWG) library. DAWGs allow efficient storage of large sets of strings by compressing the storage graph.

$ npm install dawg-set --save
import Dawg from 'dawg-set';

Documentation

A DAWG stores a set of strings of letters in a graph, reusing nodes where possible. Dawg-set can be used to store regular strings where each node in the graph is a single character or, more generally, "strings" where each component of the string is another string. The latter approach is useful because it allows you to store sentences in the DAWG at a word level, instead of storing them as individual characters.

When constructing the DAWG, you must insert items in lexicographic order. For simple strings, this is just dictionary order. You can .sort() a Javascript array to put it in lexicographic order if you are not sure.


new Dawg(stringIterator, options)

Top level DAWG class.

  • stringIterator - Optional iterator of strings used to populate initial set. Strings must be in lexicographic order.
  • options - Optional object specifying behavior of the data structure. Must defined a single member, compare which, performs a lexicographic comparison of two strings.
const empty_dawg = new Dawg();

const basic_char_dawg = new Dawg(['poodle', 'puli', 'pug']);

const basic_string_dawg = new Dawg([
    ['bull', 'dog'],
    ['bull', 'terrier']]);

Dawg.from(stringIterator, options)

Same as new Dawg(stringIterator, options), but freezes the dawg before returning. See Dawg.prototype.freeze


Dawg.prototype.count()

Get the number of entries stored in the DAWG. DAWGs cannot contain duplicate entires.

Dawg.from([]).count() === 0

Dawg.from(['poodle', 'puli', 'pug']).count() === 3

new Dawg([
    ['bull', 'dog'],
    ['bull', 'terrier']]
).count() === 2

Dawg.prototype.add(string)

Add a string to the DAWG. Returns the DAWG.

add must be called in lexicographic order. If you fail to do so, an error is thrown. You cannot update a frozen DAWG.

const d = new Dawg()
d.add('collie').add('corgi')

d.count() === 2
d.has('collie') === true
d.has('lab') === false
d.has('co') === false

d.add('lab')
d.count() === 3
d.has('lab') === true

// Trying to insert the same key again has no effect
d.add('lab')
d.count() === 3

// Trying to insert out of order throws an error
d.add('afghan hound') === throws since 'Afghan Hound' comes before 'lab'

// Trying to add to a frozen DAWG throws an error
const set = Dawg.from([])
set.add('collie') === throws since `set` is frozen by `from`.

Dawg.prototype.freeze()

Locks the DAWG and prevents further mutation. Once you are done mutating the DAWG, freezing it can free up some memory.

Dawg.from automatically freezes its result.

const d = new Dawg()
d.add('collie').add('corgi').freeze()

d.count() === 2
d.has('collie') === true

d.add('lab') === throws since graph is frozen now

Dawg.prototype.has(string)

Does the set contain an entry for string?

const d = Dawg.from(['poodle', 'puli', 'pug']);

d.has('puli') === true
d.has('mastiff') === false

// Storage is sensitive
d.has('PULI') === false

Dawg.prototype.match(string)

Finds the longest string in the DAWG that is part of string. The result will be between 0 string.length long. Only matches complete words in the DAWG.

const d = Dawg.from(['a', 'abc', 'abcde']);

d.match('a') === 'a'
d.match('x') === ''
d.match('abx') === 'a'
d.match('abcd') === 'abc'
d.match('abcdx') === 'abc'
d.match('abcdefg') === 'abcde'

Dawg.prototype.longest(string)

Same behavior as match, but returns the length of the longest match instead of the match itself.

One use of this is to get the matched vs unmatched parts of the input string:

const d = Dawg.from(['a', 'abc', 'abcde'])

const path = 'abcdx'
const i = d.longest(path)
const matched = path.slice(0, i)
const unmatched = path.slice(i)

matched === 'abc'
unmatched === 'dx'

Dawg.prototype.values(joiner = '') Dawg.prototype[Symbol.iterator]

Get an iterator to all strings in the dawg.

  • joiner - Optional. String or function used to join the letters of the strings together. If a string, this is inserted between each letter. If this is a function, it is invoked with the left and right parts of the string to join and must return a string.
const d = Dawg.from(['a', 'abc', 'abcde'])

Array.from(d.values()) === Array.from(d) === ['a', 'abc', 'abcde']

// Using a custom joiner
Array.from(d.values('|')) === ['a', 'a|b|c', 'a|b|c|d|e']

// Using with word level DAWG
const wd = Dawg.from([
    ['bull', 'dog'],
    ['bull', 'terrier']]);

Array.from(wd.values()) === Array.from(wd) === ['bulldog', 'bullterrier']

Array.from(wd.values(' ')) === ['bull dog', 'bull terrier']

Dawg.prototype.paths()

Get an iterator to all letter paths in the dawg.

Unlike values(), this returns arrays of characters.

const d = Dawg.from(['a', 'abc', 'abcde'])

Array.from(d.paths()) === [['a'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd', 'e']]

// Using with word level DAWG
const wd = Dawg.from([
    ['bull', 'dog'],
    ['bull', 'terrier']]);

Array.from(wd.paths()) === [['bull', 'dog'], ['bull', 'terrier']];

Dawg.prototype.valuesStartingWith(string, joiner = '')

Same basic behavior as values() but only returns values that start with string. string itself does not have to be in the set.

const d = Dawg.from(['a', 'abc', 'abcde', 'abe', 'd'])

Array.from(d.valuesStartingWith('abc')) ===  ['abc', 'abcde']

Credits

Implementation based on https://gist.github.com/smhanov/94230b422c2100ae4218

dawg-set's People

Contributors

mattbierner avatar

Stargazers

Yaroslav Serhieiev avatar Lance Pollard avatar  avatar Jonatan Heyman avatar  avatar Claudius Teodorescu avatar Erin McKean avatar John Haugeland avatar  avatar  avatar Khoa Nguyen avatar Leonardo Seccia avatar Nikita Bazalev avatar Dmytro Budnyk avatar

Watchers

 avatar James Cloos avatar WangJunKai avatar  avatar

Forkers

sebge-1

dawg-set's Issues

Insert order

Is inserting into a DAWG inherently required to be in lexicographic order or is that specific to this implementation? If not, would there be a large performance hit?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.