Coder Social home page Coder Social logo

fuzzyhash's Introduction

FuzzyHash

Deep nested hashing of Python objects.

Out of the box object freezing and hashing in Python does not generate a unique hash for nested iterables and mixed types. JSON responses from web servers contain inconsistencies and cannot be compared in their string form, or as hashes. Inconsistencies in JSON can develop from a service's type inference, as well as in dictionary key or list order. e.g. 1 == 1.0 == '1' == '1.0', but 1 !== 1.0 !== '1' !== '1.0'

FuzzyHash allows for both strict type and order hashing, as well as both type coercion and object manipulation prior to hashing.

1.) By default, lists and tuples are sorted before hashing, but passing in strict = True will respect their order, opting out of sorting.

2.) Type coercion prior to hashing can be done for all levels in a nested object by passing in a dictionary as type_map to init. An example type map would be {int: unicode, float: unicode, str: unicode}. All integers, floats, and strings will be cast to unicode strings before hashing.

Functions can be passed as type map values

For example:

from types import NoneType

def none_to_zero_len_str(value):
    return ''

def to_upper(string):
    return string.upper()

some_object_1 = {'name': 'Bob Evans', 'title': ''}
some_object_2 = {'name': 'BOB EVANS', 'title': None}
type_map = {str: to_upper, NoneType: none_to_zero_len_str}
hash_object_1 = FuzzyHash(some_object_1, type_map=type_map)
hash_object_2 = FuzzyHash(some_object_2, type_map=type_map)
hash_object_1 == hash_object_2
>>> True

3.) Dictionary keys can be ignored by passing a list of keys as dict_key_ignore to init. Example usage would be for API response timestamps. Note: Not all objects are hashable. JSON strings passed to init's on data parameter will be deserialized into a native Python object.

Examples:

some_object_1 = {'name': 'Bob Evans', 'title': '', 'edit_log': ['2020-05-01 12:29:25.984355', '2020-05-03 13:05:22.338301', '2020-05-29 19:01:51.871108']}
some_object_2 = {'name': 'Bob Evans', 'edit_log': []}
hash_object_1 = FuzzyHash(some_object_1, type_map=type_map)
hash_object_2 = FuzzyHash(some_object_2, type_map=type_map)
hash_object_1 == hash_object_2
>>> False
hash_object_1 = FuzzyHash(some_object_1, type_map=type_map, dict_key_ignore=['title', 'edit_log'])
hash_object_2 = FuzzyHash(some_object_2, type_map=type_map, dict_key_ignore=['title', 'edit_log'])
hash_object_1 == hash_object_2
>>> True

fuzzyhash's People

Contributors

nickvellios avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.