Comments (7)
I think the main advantages are:
- It adds semantic information: I know the bytes correspond to the output of a specific hash. This is helpful when variables are just named
commit_hash
. - It can be a bitstype, which has some performance advantages, and makes it easier for C interop.
- it can be easily stored inline in a struct, e.g. https://github.com/JuliaLang/julia/blob/2d0aeca686581276e0da87a9d9e593db1dba5516/stdlib/LibGit2/src/types.jl#L523: otherwise, you would have to use an
NTuple{20,UInt8}
, and convert to/from aVector{UInt8}
. - it is easy to use when C functions return via pointer arguments, e.g https://github.com/JuliaLang/julia/blob/2d0aeca686581276e0da87a9d9e593db1dba5516/stdlib/LibGit2/src/index.jl#L53-L56
- it can be easily stored inline in a struct, e.g. https://github.com/JuliaLang/julia/blob/2d0aeca686581276e0da87a9d9e593db1dba5516/stdlib/LibGit2/src/types.jl#L523: otherwise, you would have to use an
- There are many cases where hashes are passed as hexadecimal strings: having dedicated hash objects makes the conversions easier, e.g.
- the various registry packages (Registrator.jl, RegistryTools.jl, RegistryCI.jl) all represents hashes as strings, as that is how they are represented in the TOML files.
- interfacing with
git
: it's helpful to be able to doand have it work as expectedhash = SHA1("...") run(`git checkout $hash`)
- It appears to be what people do anyway, but we end up with the same thing defined in multiple places: I didn't realize Base had an
SHA1
type, but LibGit2 should use this rather than define it's ownGitHash
type. Similarly, GitHub.jl should use this insteadString
, etc.
from sha.jl.
See GitHash
object in LibGit2:
https://github.com/JuliaLang/julia/blob/972f55feedd12b6549002604fb08fde5206bfe37/stdlib/LibGit2/src/types.jl#L13-L24
from sha.jl.
A quick Proof-Of-Concept impl:
If we really need this, I'd like to add it to the base and have SHA, MD5, CRC32, GitHash... all reuse these codes.
hash_obj.jl
# SPDX-License-Identifier: MIT
abstract type AbstractHash end
"""
HashBytes{N}
A hash object identifier. It is a `N` byte string.
"""
struct HashBytes{N} <: AbstractHash where {N}
val::NTuple{N, UInt8}
HashBytes(val::NTuple{N, UInt8}) where N = new{N}(val)
end
HashBytes{N}() where N = HashBytes(ntuple(i->zero(UInt8), N))
HashBytes(h::HashBytes) = h
function HashBytes{N}(u8::Vector{UInt8}) where N
@assert N == length(u8) "Hash length not match"
HashBytes(ntuple(idx->u8[idx], N))
end
HashBytes(s::AbstractString) = error("not impl")
import Base.show
function show(io::IO, hash_bytes::HashBytes{N}) where N
hash = join( repr(u)[3:end] for u in hash_bytes.val )
print(io, "HashBytes{$N}($(repr(hash)))")
end
# ==== Generate Hash Type Definitions for All SHA Types
# Examples:
# const Sha1Hash = HashBytes{20}
# const Sha3_512Hash = HashBytes{64}
using SHA
for (sha_prefix, sha_type) in [(:Sha1, :SHA1_CTX),
(:Sha224, :SHA224_CTX),
(:Sha256, :SHA256_CTX),
(:Sha384, :SHA384_CTX),
(:Sha512, :SHA512_CTX),
(:Sha2_224, :SHA2_224_CTX),
(:Sha2_256, :SHA2_256_CTX),
(:Sha2_384, :SHA2_384_CTX),
(:Sha2_512, :SHA2_512_CTX),
(:Sha3_224, :SHA3_224_CTX),
(:Sha3_256, :SHA3_256_CTX),
(:Sha3_384, :SHA3_384_CTX),
(:Sha3_512, :SHA3_512_CTX)]
hashsha_type = Symbol(sha_prefix, :Hash)
@eval begin
hashtype_len = SHA.digestlen($sha_type)
const $(hashsha_type) = HashBytes{hashtype_len}
end
end
# ---- examples:
Sha1Hash(sha1(""))
Sha3_256Hash(sha3_256(""))
Sha3_512Hash(sha3_512(""))
example outout:
julia> Sha1Hash(sha1(""))
HashBytes{20}("da39a3ee5e6b4b0d3255bfef95601890afd80709")
julia> Sha3_256Hash(sha3_256(""))
HashBytes{32}("a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a")
julia> Sha3_512Hash(sha3_512(""))
HashBytes{64}("a69f73cca23a9ac5c8b567dc185a756e97c982164fe25859e0d1dcc1475c80a615b2123af1f5f94c11e3e9402c3ac558f500199d95b6d3e301758586281dcd26")
from sha.jl.
It would be helpful if for each hash there was an object representing a hash (e.g. SHA1, SHA256 etc), similar to UUID
Can you explain a bit more about what you want and why it would be useful? I have heard strong arguments both for hashes being objects, and for hash contexts being objects, but the hashes themselves being just arrays of bytes. I'd like to hear your argument for why it's better that they are their own objects.
If its just for dispatch, I think a higher-level package like AbstractHashing or something similar may be a better fit for these kinds of concerns. I myself wanted something that lives higher level than SHA.jl (and can work with MD5 and whatnot) so I wrote this mini package to make dealing with different hashes easier. You can then constrain things to only take a certain hash type via snippets like this.
from sha.jl.
Yes, so my main point would be that we probably want an AbstractHashType
that is more than just SHA hashes, and then we have two options for implementation:
Bottom-up; define AbstractHashType
in some bare-bones package, then packages like MD5.jl
can define their types as inheriting from the abstract type, and get all the goodness defined in the abstract package's generic methods.
Top-down; create a AbstractHashes
package that imports SHA.jl
, MD5.jl
, and every other hash type, then defines the shared functionality right there in terms of the things it has imported.
I think the bottom-up organization is better, but I don't think we want AbstractHashType
to be tied to julia releases as a stdlib. So perhaps the best way forward is to have a kind of middle ground, where AbstractHashes.jl
is meant to be a bottom-up package, but it includes funcitonality for SHA.jl
since it knows that will always be a part of your environment?
from sha.jl.
Possibly? This is complicated somewhat by the fact that SHA1
is defined in Base: ideally that would use the same machinery, otherwise we end up with multiple implementations again.
What if we add AbstractHashType
and SHA1Hash
(and make Base.SHA1
an alias) in Base, adding them to Compat.jl for existing releases, and add the remaining hash objects here?
from sha.jl.
What if we add AbstractHashType and SHA1Hash (and make Base.SHA1 an alias) in Base, adding them to Compat.jl for existing releases, and add the remaining hash objects here?
The downside to this is that it's then only available in Julia v1.12+, and if we want to change something about how hash functions work, we have to wait for a new Julia version. I think it's actually better to have an AbstractHash.jl
that just implements whatever adapters are needed for the SHA
that happens to be shipped with Julia, and then has maybe package extensions for MD5
and other hash types. Truly the only reason SHA is a stdlib is because Pkg needs to be able to hash things to verify their contents, we should not introduce more code into the stdlib if at all possible.
from sha.jl.
Related Issues (20)
- Use julia-actions/cache to speed up CI
- New Release 0.7 or 0.8?
- Incorrect SHA-3 computation for message length just below multiple of block size HOT 1
- Faster SHA-3 implementations
- is this repo still maintained? HOT 2
- add fast sha2 to julia stdlib HOT 3
- Cap julia version on REQUIRE HOT 1
- Is there any interest in adding the SHA512/224 and SHA512/256 algorithms? HOT 1
- hmac_sha256 incorrect HOT 2
- [doc] missing docs
- Remove `Base.getproperty` HOT 2
- Run doctests on CI?
- Doctests are failing
- HMAC functionality needs to be added in SHA.jl documentation
- Docs: Say that contexts are unusable after calling `digest!`
- Specifying SHA.jl compatibility with Julia 1.6/1.7 HOT 3
- Package does not use hardware acceleration HOT 2
- sha256 is ~ 6x slower than python version HOT 1
- shake128 test coverage HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sha.jl.