Coder Social home page Coder Social logo

yomguithereal / mnemonist Goto Github PK

View Code? Open in Web Editor NEW
2.2K 22.0 92.0 2.62 MB

Curated collection of data structures for the JavaScript/TypeScript language.

Home Page: https://yomguithereal.github.io/mnemonist

License: MIT License

JavaScript 98.80% TypeScript 1.20%
data-structure

mnemonist's Introduction

Build Status

Mnemonist

Mnemonist is a curated collection of data structures for the JavaScript language.

It gathers classic data structures (think heap, trie etc.) as well as more exotic ones such as Buckhard-Keller trees etc.

It strives at being:

  • As performant as possible for a high-level language.
  • Completely modular (don't need to import the whole library just to use a simple heap).
  • Simple & straightforward to use and consistent with JavaScript standard objects' API.
  • Completely typed and comfortably usable with Typescript.

Installation

npm install --save mnemonist

Documentation

Full documentation for the library can be found here.

Classics

Low-level & structures for very specific use cases

Information retrieval & Natural language processing

Space & time indexation

Metric space indexation

Probabilistic & succinct data structures

Utility classes


Note that this list does not include a Graph data structure, whose implementation is usually far too complex for the scope of this library.

However, we advise the reader to take a look at the graphology library instead.

Don't find the data structure you need? Maybe we can work it out together.

Contribution

Contributions are obviously welcome. Be sure to lint the code & add relevant unit tests.

# Installing
git clone [email protected]:Yomguithereal/mnemonist.git
cd mnemonist
npm install

# Linting
npm run lint

# Running the unit tests
npm test

License

MIT

mnemonist's People

Contributors

atombrenner avatar chaosforfun avatar clhuang avatar dmitri-gb avatar em-ctc avatar farjasju avatar jerome-benoit avatar jfoutts avatar macil avatar mrflip avatar pbadenski avatar philippemorier avatar rubenferreira97 avatar trivikr avatar veggiesaurus avatar xamgore avatar yomguithereal avatar yoursunny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mnemonist's Issues

Roadmap

  • MultiIndex
  • CompositeIndex
  • MultiSet
  • MultiMap
  • InvertedIndex
  • SearchIndex
  • Fuzzy?
  • Deque
  • BloomFilter
  • DoublyLinkedList
  • SuffixTree
  • VPTree
  • Buckhard-Keller Tree
  • SymSpell
  • QuadTree, MX-CIF QuadTree
  • k d tree
  • Interval tree
  • Utils, set operators
  • Static matrix + triangular
  • Radix tree, Patricia tree
  • Index
  • M-tree
  • BitSet
  • LRUCache
  • CoverTree
  • MarisaTrie
  • BiMap
  • UnsafeBiMap
  • Table
  • Multi Vantage Point Tree
  • Binary Tree
  • RedBlack Tree
  • ArrayTrie, DoubleArrayTrie StaticArrayTrie etc.
  • Sparse Matrix, Sparse Vector
  • Cuckoo filter
  • HeapSet, HeapMap
  • Static bit array
  • Dynamic Typed arrays (1.5 factor by default)
  • SparseSet
  • Anagram Tree
  • Palindrom Tree
  • Wavelet Tree
  • t-digest
  • SparseArray
  • CircularBuffer
  • BipBuffer
  • Nested Sets
  • Perfect Hashmaps
  • Static search set
  • TrieMap
  • BoundedHeap, Min-Max Heap
  • Spatial Hashmap
  • https://github.com/pabloem/advanced_arrays
  • Elias Fano QuasiSuccint Indices
  • Sparse Table
  • Tupled Stack
  • Disjoint Set
  • dawgmap, dafsa
  • GADDAG
  • Count Min sketch
  • PH-Tree
  • BallTree
  • HAMT
  • Unrolled Linked list
  • Fenwick Tree
  • Inversion List
  • TokenMap
  • CallWheel / HashedWheelTimer
  • HAT hashed array tree, HAT Trie
  • GapBuffer
  • x fast trie critbit trie y fast z fast
  • Compressed bit vectors rrr etc.
  • Default map
  • Static stack
  • Piece Table
  • RoaringBitmap
  • Vocab (alternative to incrementalmap)
  • Hyperloglog
  • SoftHeap
  • Boundaries Trie like web entities
  • Stern Brocot tree

iterate plain object's prototype key

the code in iterate.js

// The target is a plain object
for (k in target)
  callback(target[k], k);

should add a test of target.hasOwnProperty(k)

Linked List implementation details

@dheavy, I created a benchmark here to assess what was the best storage strategy for the list's node.

Tested strategies are:

  • Raw object
  • 2-Array (tuple)
  • Node class
  • Object.create(null)

Benchmarks seems to point toward raw objects but I notice some strange v8 optimizations going on. Meaning that the order of the tests, notably between raw objects & 2-arrays change dramatically.

What do you think?

Support for types

This is a cool library. Data structure has growing importance on FE since more logic is pushed to client side. It would be nice if this library can have DefinitelyTyped πŸ˜„

Standard serialization/deserialization API across all data-structures

Currently one can serialize mnemonist data-structures with .toJSON but there does not appear to be a standard way to deserialize. I'd like to be able to cache mnemonist structures in the browser or to send pre-processed structures to the client over a network.

To achieve serialization/deserialization at the moment, one has to write custom functions which often need to re-iterate over the entire data set. Re-iterating may be prohibitively expensive for large structures i.e. this sucks most for the exact use-cases where mnemonist would be most useful.

e.g. there should be a way to do something like:

dest.fromJSON(src.toJSON())

and ideally the deserialization process would need to do minimal reprocessing, it would basically just dump the data into place, something like:

dest.root = src.toJSON()

For example, I'd hoped this exact thing would work for Trie, except that toJSON loses the size information, and if you added .size to the structure produced by toJSON, you'd potentially break any 3rd party code consuming the current toJSON format.

Therefore, you should probably should use something other than toJSON, instead create a new API pair e.g. serialize/deserialize which produces/consumes a representation whose structure users would consider opaque because:

  • the best encoding for fast serialization/deserialization may not be JSON (e.g. bloom filters) and
  • you want to be able to have the freedom to change the serialization format so you're not stuck working with some inefficient representation simply because you want to avoid breaking the public toJSON API.

Perhaps serialize would just generate JSON or a JS object for now, but you don't want to be locked into that, nor into the structure it produces.


Related to #28

Performance benchmark

#Can you please add a section comparing the performance against the alternatives in plain JavaScript??

I made some tests and actually I got better results without the library. Perhaps I'm not using it properly. Here is the code I used:

var Trie = require('mnemonist/trie');
var query = 'rom';

var words = [];
const dictionary = ['category', 'romance'];

for (var i = 0; i < 10000; i++) {
  words.push(dictionary[i % 2] + i);
}

console.time('ES5');
var result = words.filter(function(word) {
  return word.startsWith(query);
});
console.log(result.length);
console.timeEnd('ES5');

console.time('mnemonist');
// Let's create a trie from our words
var trie = Trie.from(words);

// Now let's query our trie
var wordsWithMatchingPrefix = trie.get(query);
console.log(wordsWithMatchingPrefix.length);
console.timeEnd('mnemonist');

running symspell test cannot find module damerau-levenshtein

Your test utility for symspell specifies: damerauLevenshtein = require('damerau-levenshtein');

Running the test suite using "mocha symspell.js" from the folder containing test\symspell.js produces this error:
Cannot find module 'damerau-levenshtein'

I tried to use the version on npm by installing globally: npm install --global damerau-levenshtein

This did not prevent the error message described above.

Would you please advise where the required damerau-levenshtein file is to be found. Shouldn't it be part of the mnemonist package?

Many thanks.

Structures to add

  • Bags (Multiset, Counter)
  • BloomFilter
  • Deque
  • Doubly Linked List
  • Fibonacci Heap
  • Linked List
  • Heap
  • Queue
  • Radix Tree
  • Stack
  • Suffix Tree (Ukkonen)
  • Suffix Array
  • Trie
  • Inverted Index (simple & complex)
  • Index & MultiIndex

Change import path as true name for clarity

For the time being, one requires a structure likewise:

var LinkedList = require('mnemonist/linked-list');

Would it be better to have:

var LinkedList = require('mnemonist/LinkedList');

@jacomyal what do you think of this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.