Coder Social home page Coder Social logo

rollinghash's Introduction

Build Status Coverage Status GoDoc Reference Go 1.7+

Rolling Hashes

Philosophy

This package contains several various rolling hashes for you to play with crazy ideas. The API design philosophy is to stick as closely as possible to the interface provided by the builtin hash package (the hashes implemented here are effectively drop-in replacements for their builtin counterparts), while providing simultaneously the highest speed and simplicity.

Usage

A rollinghash.Hash is just a hash.Hash which implements the Roller interface. Here is how it is typically used:

data := []byte("here is some data to roll on")
h := buzhash64.New()
n := 16

// Initialize the rolling window
h.Write(data[:n])

for _, c := range(data[n:]) {

    // Slide the window and update the hash
    h.Roll(c)

    // Get the updated hash value
    fmt.Println(h.Sum64())
}

Gotchas

The rolling window MUST be initialized by calling Write first (which saves a copy). The byte leaving the rolling window is inferred from the internal copy of the rolling window, which is updated with every call to Roll.

If you want your code to run at the highest speed, do NOT cast the result of a New() as a rollinghash.Hash. Instead, use the native type returned by New(). This is because the go compiler cannot inline calls from an interface. When later you call Roll(), the native type call will be inlined by the compiler, but not the casted type call.

var h1 rollinghash.Hash
h1 = buzhash32.New()
h2 := buzhash32.New()

[...]

h1.Roll(b) // Not inlined (slow)
h2.Roll(b) // inlined (fast)

What's new in v4

In v4:

  • Write has become fully consistent with hash.Hash. As opposed to previous versions, where writing data would reinitialize the window, it now appends this data to the existing window. In order to reset the window, one should instead use the Reset method.

  • Calling Roll on an empty window is considered a bug, and now triggers a panic.

Brief reminder of the behaviors in previous versions:

  • From v0.x.x to v2.x.x: Roll returns an error for an empty window. Write reinitializes the rolling window.

  • v3.x.x : Roll does not return anything. Write still reinitializes the rolling window. The rolling window always has a minimum size of 1, which yields wrong results when using roll before having initialized the window.

Go versions

The RabinKarp64 rollinghash does not yield consistent results before go1.7. This is because it uses Rand.Read() from the builtin math/rand. This function was fixed in go 1.7 to produce a consistent stream of bytes that is independant of the size of the input buffer. If you depend on this hash, it is strongly recommended to stick to versions of go superior to 1.7.

License

This code is delivered to you under the terms of the MIT public license, except the rabinkarp64 subpackage, which has been adapted from restic (BSD 2-clause "Simplified").

Notable users

  • syncthing, a decentralized synchronisation solution
  • muscato, a genome analysis tool

If you are using this in production or for research, let me know and I will happily put a link here!

rollinghash's People

Contributors

calmh avatar chmduquesne avatar kshedden avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.