Basically, it's a set of strings that lets you test for membership, do spelling correction (fuzzy matching) and autocomplete, but with higher memory efficiency than a regular trie.
Can you please elaborate on how this can be used for fuzzy matching and autocomplete?
As far as I can tell, you can only test for strict presence of a full string. It's not possible to check if a substring or a fuzzy match exists in the set, is it?
Thanks for making this. Can you please share the 8M strings of testing data you used for this? This will give a good point of comparison with other libs out there.
The encoding currently only supports ASCII (oops - forgot to upgrade that) since each character only gets 1 byte. This should be expanded to more bytes when the user wants to encode trees with Unicode characters.
However, since this greatly increases the file size and may not be necessary, so this option should be configurable.
Please correct me if I'm wrong. I don't think there's a standard interface for Encoder/Decoder types, but as far as I can see, they all typically take an io.Reader to create a Decoder, and io.Writer to create an Encoder.
To minimize the size of the file even more, the size of the pointer should be adjusted based on how many items are in the tree that are being encoded. Right now we simply use 4 bytes, but small trees don't need this many. 4 bytes allows over 4 billion items...
The pointer size should be automatically adjusted or configurable.