SHA objects about sha.jl HOT 7 OPEN

juliacrypto commented on July 17, 2024

SHA objects

from sha.jl.

Comments (7)

simonbyrne commented on July 17, 2024 1

I think the main advantages are:

It adds semantic information: I know the bytes correspond to the output of a specific hash. This is helpful when variables are just named commit_hash.
It can be a bitstype, which has some performance advantages, and makes it easier for C interop.
- it can be easily stored inline in a struct, e.g. https://github.com/JuliaLang/julia/blob/2d0aeca686581276e0da87a9d9e593db1dba5516/stdlib/LibGit2/src/types.jl#L523: otherwise, you would have to use an NTuple{20,UInt8}, and convert to/from a Vector{UInt8}.
- it is easy to use when C functions return via pointer arguments, e.g https://github.com/JuliaLang/julia/blob/2d0aeca686581276e0da87a9d9e593db1dba5516/stdlib/LibGit2/src/index.jl#L53-L56
There are many cases where hashes are passed as hexadecimal strings: having dedicated hash objects makes the conversions easier, e.g.
- the various registry packages (Registrator.jl, RegistryTools.jl, RegistryCI.jl) all represents hashes as strings, as that is how they are represented in the TOML files.
- interfacing with git: it's helpful to be able to do
```
hash = SHA1("...")
run(`git checkout $hash`)
```
  and have it work as expected
It appears to be what people do anyway, but we end up with the same thing defined in multiple places: I didn't realize Base had an SHA1 type, but LibGit2 should use this rather than define it's own GitHash type. Similarly, GitHub.jl should use this instead String, etc.

from sha.jl.

simonbyrne commented on July 17, 2024

See GitHash object in LibGit2:
https://github.com/JuliaLang/julia/blob/972f55feedd12b6549002604fb08fde5206bfe37/stdlib/LibGit2/src/types.jl#L13-L24

from sha.jl.

inkydragon commented on July 17, 2024

A quick Proof-Of-Concept impl:

If we really need this, I'd like to add it to the base and have SHA, MD5, CRC32, GitHash... all reuse these codes.

hash_obj.jl

# SPDX-License-Identifier: MIT
abstract type AbstractHash end

"""
    HashBytes{N}

A hash object identifier. It is a `N` byte string.
"""
struct HashBytes{N} <: AbstractHash where {N}
    val::NTuple{N, UInt8}
    HashBytes(val::NTuple{N, UInt8}) where N = new{N}(val)
end

HashBytes{N}() where N = HashBytes(ntuple(i->zero(UInt8), N))
HashBytes(h::HashBytes) = h
function HashBytes{N}(u8::Vector{UInt8}) where N
    @assert N == length(u8) "Hash length not match"
    HashBytes(ntuple(idx->u8[idx], N))
end
HashBytes(s::AbstractString) = error("not impl")


import Base.show
function show(io::IO, hash_bytes::HashBytes{N}) where N
    hash = join( repr(u)[3:end] for u in hash_bytes.val )
    print(io, "HashBytes{$N}($(repr(hash)))")
end



# ==== Generate Hash Type Definitions for All SHA Types
# Examples:
#   const Sha1Hash = HashBytes{20}
#   const Sha3_512Hash = HashBytes{64}
using SHA
for (sha_prefix, sha_type) in [(:Sha1, :SHA1_CTX),
                 (:Sha224, :SHA224_CTX),
                 (:Sha256, :SHA256_CTX),
                 (:Sha384, :SHA384_CTX),
                 (:Sha512, :SHA512_CTX),
                 (:Sha2_224, :SHA2_224_CTX),
                 (:Sha2_256, :SHA2_256_CTX),
                 (:Sha2_384, :SHA2_384_CTX),
                 (:Sha2_512, :SHA2_512_CTX),
                 (:Sha3_224, :SHA3_224_CTX),
                 (:Sha3_256, :SHA3_256_CTX),
                 (:Sha3_384, :SHA3_384_CTX),
                 (:Sha3_512, :SHA3_512_CTX)]
    hashsha_type = Symbol(sha_prefix, :Hash)
    @eval begin
        hashtype_len = SHA.digestlen($sha_type)
        const $(hashsha_type) = HashBytes{hashtype_len}
    end
end


# ---- examples:
Sha1Hash(sha1(""))
Sha3_256Hash(sha3_256(""))
Sha3_512Hash(sha3_512(""))

example outout:

julia> Sha1Hash(sha1(""))
HashBytes{20}("da39a3ee5e6b4b0d3255bfef95601890afd80709")

julia> Sha3_256Hash(sha3_256(""))
HashBytes{32}("a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a")

julia> Sha3_512Hash(sha3_512(""))
HashBytes{64}("a69f73cca23a9ac5c8b567dc185a756e97c982164fe25859e0d1dcc1475c80a615b2123af1f5f94c11e3e9402c3ac558f500199d95b6d3e301758586281dcd26")

from sha.jl.

staticfloat commented on July 17, 2024

It would be helpful if for each hash there was an object representing a hash (e.g. SHA1, SHA256 etc), similar to UUID

Can you explain a bit more about what you want and why it would be useful? I have heard strong arguments both for hashes being objects, and for hash contexts being objects, but the hashes themselves being just arrays of bytes. I'd like to hear your argument for why it's better that they are their own objects.

If its just for dispatch, I think a higher-level package like AbstractHashing or something similar may be a better fit for these kinds of concerns. I myself wanted something that lives higher level than SHA.jl (and can work with MD5 and whatnot) so I wrote this mini package to make dealing with different hashes easier. You can then constrain things to only take a certain hash type via snippets like this.

from sha.jl.

staticfloat commented on July 17, 2024

Yes, so my main point would be that we probably want an AbstractHashType that is more than just SHA hashes, and then we have two options for implementation:

Bottom-up; define AbstractHashType in some bare-bones package, then packages like MD5.jl can define their types as inheriting from the abstract type, and get all the goodness defined in the abstract package's generic methods.

Top-down; create a AbstractHashes package that imports SHA.jl, MD5.jl, and every other hash type, then defines the shared functionality right there in terms of the things it has imported.

I think the bottom-up organization is better, but I don't think we want AbstractHashType to be tied to julia releases as a stdlib. So perhaps the best way forward is to have a kind of middle ground, where AbstractHashes.jl is meant to be a bottom-up package, but it includes funcitonality for SHA.jl since it knows that will always be a part of your environment?

from sha.jl.

simonbyrne commented on July 17, 2024

Possibly? This is complicated somewhat by the fact that SHA1 is defined in Base: ideally that would use the same machinery, otherwise we end up with multiple implementations again.

What if we add AbstractHashType and SHA1Hash (and make Base.SHA1 an alias) in Base, adding them to Compat.jl for existing releases, and add the remaining hash objects here?

from sha.jl.

staticfloat commented on July 17, 2024

What if we add AbstractHashType and SHA1Hash (and make Base.SHA1 an alias) in Base, adding them to Compat.jl for existing releases, and add the remaining hash objects here?

The downside to this is that it's then only available in Julia v1.12+, and if we want to change something about how hash functions work, we have to wait for a new Julia version. I think it's actually better to have an AbstractHash.jl that just implements whatever adapters are needed for the SHA that happens to be shipped with Julia, and then has maybe package extensions for MD5 and other hash types. Truly the only reason SHA is a stdlib is because Pkg needs to be able to hash things to verify their contents, we should not introduce more code into the stdlib if at all possible.

from sha.jl.

SHA objects about sha.jl HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent