Coder Social home page Coder Social logo

prefix-search's Introduction

Ternary Search Tree (tst)

A ternary search tree is a binary search tree (bst) with an additional node pointer.

There is a relative dearth of information available about ternary trees, and specifially, proper node-rotation when a node is deleted from the tree. The only examples for a node-delete found are simple deletes that leave the tree dirty by leaving a node, with or withoutsiblings, but having no valid middle (or equal) pointer.

With a binary search tree each node contains a left and right node-pointer so a binary choice controls traversal. Either a greater than or less than choice. As the result of a comparison between the node-data and a reference either the left or right node-pointer is used to further descend.

A ternary search tree adds a third (or middle) node. While you can still use the left-middle-right notation, ternary trees use a low-equal-high pointer verbiage. With each node holding a key ID, the node pointers for the this code uses a lokid, eqkid, and hikid pointer naming convention. If the difference between a refernence and the node key is negative (lower than), the lokid node is traversed, if they are equal, the eqkid node is followed or hikid is followed.

In addition to the node->key, the node used in this example adds a reference count (a node->refcnt) to the node data to track the number of occurrances for each word it holds. So for example, if using the tree to track the words in an editor buffer (where there may be multiple occurrences of 'the' or other common words), the node holding 'the' is not deleted, upon delete, until no other occurrences remain (i.e. the node->refcnt is zero).

Each individual node has the following form:

                              o
                              |-key
                              |-refcnt
                  ------------+------------
                  |lokid      |eqkid      |hikid
                  o           o           o

The string data (a pointer to a word, or a copy of the word itself) is stored in an additional special node following the node containing the last character (node->key) in the search path for the word, cast to type (node *) and stored as the node->eqkid pointer. Further, since this is the node after the last character, its key is the nul-character (decimal 0) just as you would expect when ending a string. Thus, the 'key's for each of the nodes that make up the search path of a word, are the letters in the word with the final node having a key 0 with either a pointer to the string (if stored in an external data structure) or an allocated copy of the string itself if the string is to be stored in the tree. (as in holding the words for an edit buffer, where the location/address for the string changes with each keypress). In either case, the traversal down nodes to the final node will have a form similar to the following for the word "cat":

                              o
                              |-c
                              |-0
                  ------------+------------
                  |lokid      |eqkid      |hikid
                  o           o           o
                              |-a
                              |-0
                          ----+----
                          |   |   |    note: any of the lokid or hikid nodes
                              o              can also have pointers to nodes
                              |-t            for words that "cat" or "ca" is
                              |-0            a partial prefix to.
                          ----+----
                          |   |   |
                              o
                              |-0
                              |-1    <== the refcnt is only relevant to the final node
                          ----+----
                          |   |   |
                        NULL  o  NULL
                            "cat"

The ternary tree has the same O(n) efficiency for insert and search as does a bst. The delete is only slightly worse due to the proper deletion of the chain of unique nodes that make a word and proper rotation to eliminate the final node containing siblings. Lookup times associated with loading the entire /usr/share/dict/words file and searching range between 0.000002 - 0.000014 sec. However, the prefix search ability offered by the ternary search tree sets it apart from virtually all other data stuctures. While Tri/Radix trees can perform as well, their memory requirements are often 20 times more than a ternary tree.

The benefit of a ternary tree for prefix searching of text lies in its ability to quickly traverse a tree of any size finding the node containing the last character in the wanted prefix. An in-order traversal of that node identifies all strings in the tree containing the prefix.

prefix-search's People

Contributors

jserv avatar brchiu avatar lihikoka avatar

Watchers

DanielChen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.