gonzedge / rambling-trie Goto Github PK
View Code? Open in Web Editor NEWA Ruby implementation of the trie data structure.
License: MIT License
A Ruby implementation of the trie data structure.
License: MIT License
Now that the gem can serialize in-memory tries into disk and back, let's add the ability for people to configure them. Someone using the gem should be able to configure the available serializers
, the default serializer
, etc. I'm thinking an API like this:
require 'rambling-trie'
Rambling::Trie.config do |c|
c.serializers.add :json, MyJsonSerializer.new
c.serializers.default = c.serializers[:yml]
end
# Load a trie from disk and do things with it...
Also, the way Readers
are configured and treated in general should be very similar to what would be done with Serializers
. That way, you could also provide your own readers
for getting a list of words from disk (or any IO
, really):
require 'rambling-trie'
Rambling::Trie.config do |c|
c.readers.add :html, MyHtmlReader.new
c.readers.default = c.readers[:html]
end
# Create a trie and do things with it...
Implement each
, to_a
and size
methods to begin with.
I have a use case where fetching a full word from a partial one would be really useful. Am I missing something or there's no such feature?
The methods has_branch_for?
and is_word?
do not work correctly on a compressed trie. They should be fixed.
Also, the method add_branch_from
does not make sense for a compressed trie and should be removed.
As mentioned by @darkogj on Twitter, there is no current way to find out if a given string contains a valid word from a given dictionary. In @darkogj's words:
... The challenge is, if you have 'aheyb' as the target word, for the gem to recognize 'hey' is a word although it's not at the beginning of the string.
I made a short implementation of this for uncompressed tries, but it hasn't been thoroughly tested or benchmarked and it isn't compatible with compressed tries. Those need to be taken care of before integrating it into the gem.
Add the <<
method to the Rambling::Trie::Root
class.
Having:
trie = Rambling::Trie.create
This:
trie << 'word'
Is more friendly and intuitive than:
trie.add_branch_from 'word'
This gem could use a changelog to document the changes (breaking or not) that have happened between versions.
When searching for a partial word whose path through the trie does not exist, there's a call to method "length" for a nil object. I think the issue is with line 111 in branches.rb
Thanks for your work!
When you persist a trie made from a dictionary with ~170K words, the resulting file size is ~8MB for an uncompressed trie and ~5MB. If you then zip those files, you get ~2MB and ~1MB respectively.
It would be nice if, as a user, I don't need to unzip the file myself and the gem took care of that itself.
If I've never seen this codebase, but have a feature that I'd like to add or a bug that I'd like to fix, how do I get started with contributing?
When doing this:
trie = Rambling::Trie.create
word = 'abc'
trie.add_branch_from word # or trie << word
The word
variable goes from 'abc'
to ''
(empty string).
Add the include?
method to the Rambling::Trie::Root
class.
Having:
trie = Rambling::Trie.create
This:
trie.include? 'word'
Is more friendly and intuitive than:
trie.is_word? 'word'
If I have a dictionary, say ['hey', 'there', 'hello'], and do trie.partial_word?('he') it's just going to return true. It would be nice if there was a method so it returned all the words that started with that prefix, so the result would be the array ['hey', 'hello']
. A nice method name for this would be select_all_starting_with
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.