Coder Social home page Coder Social logo

aliastables.jl's Introduction

AliasTables

docs Build Status Coverage PkgEval Aqua deps

AliasTables provides the AliasTable type, which is an object that defines a probability distribution over 1:n for some n. They are efficient to construct and very efficient to sample from.

An alias table can be combined with a dense vector of values to create a discrete distribution over anything.

Internally, AliasTables define a mapping from an unsigned integer type to the sampling domain. To get a random sample according to the AliasTable's distribution, one must provide a random unsigned integer uniformly at random. One can also provide a Random.AbstractRNG object instead and a random unsigned integer will be generated using that rng. When using the random API, this latter approach is taken.

julia> using AliasTables

julia> at = AliasTable([5,10,1])
AliasTable([0x5000000000000000, 0xa000000000000000, 0x1000000000000000])

julia> rand(at, 10)
10-element Vector{Int64}:
 2
 1
 2
 2
 2
 2
 1
 1
 3
 2

julia> using Chairmarks

julia> @b at rand
2.898 ns

julia> @b rand(UInt)
2.738 ns

julia> @b rand(1000) AliasTable
9.167 μs (2 allocs: 16.031 KiB)

julia> @b AliasTable(rand(1000)) rand(_, 1000)
1.506 μs (3 allocs: 7.875 KiB)

julia> @b AliasTable(rand(1000)), rand(1000) AliasTables.set_weights!(_...)
8.427 μs

julia> using StatsBase

julia> at = AliasTable{UInt16}([5,10,1])
AliasTable{UInt16}([0x5000, 0xa000, 0x1000])

julia> countmap(AliasTables.sample(x, at) for x in typemin(UInt16):typemax(UInt16))
Dict{Any, Int64} with 3 entries:
  2 => 40960
  3 => 4096
  1 => 20480

aliastables.jl's People

Contributors

dependabot[bot] avatar lilithhafner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aliastables.jl's Issues

Bad & inconsistent error message when length of weights is greater than typemax(T)

julia> AliasTable{UInt8}(vcat(fill(0x00, 2^8), 0x80, 0x80))
ERROR: ArgumentError: sum(weights) is too high
Stacktrace:
 [1] _alias_table(::Type{UInt8}, ::Type{Int64}, weights0::Vector{UInt8})
   @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:231
 [2] AliasTable{UInt8, Int64}(weights::Vector{UInt8}; _normalize::Bool)
   @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:85
 [3] AliasTable
   @ ~/.julia/dev/AliasTables/src/AliasTables.jl:78 [inlined]
 [4] (AliasTable{UInt8})(weights::Vector{UInt8})
   @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:77
 [5] top-level scope
   @ REPL[16]:1

julia> AliasTable{UInt8}(vcat(0x80, 0x80, fill(0x00, 2^8)))
AliasTable{UInt8}([0x80, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00])

AliasTables 1.0.0 (cb7b2d6)
Juila Version 1.11.0-beta1 (though this error should manifest on all Julia versions)

This is because points_per_cell rounds down to zero. We can special case long (necessarily sparse) weights by setting all accept probabilities to one pip (100%) and then we have what is effectively a lookup table sample(x::T) = table[x]::I, but with a but of overhead because the types don't tell us that we're in this case.

It's somewhat unreasonable to expect an alias table to preform adequately in this extreme case, but we can without extra costs (aside from some additional complexity isolated in a separate runtime branch), so we should.

`AliasTable([0x0ffffffffffffffff000000000000000, 0x0ffffffffffffffff000000000000000])` throws

julia> AliasTable{UInt8}([0x0ffffffffffffffff000000000000000, 0x0ffffffffffffffff000000000000000])
ERROR: BoundsError: attempt to access 2-element Vector{UInt8} at index [242]
Stacktrace:
 [1] throw_boundserror(A::Vector{UInt8}, I::Tuple{Int64})
   @ Base ./essentials.jl:14
 [2] getindex
   @ ./essentials.jl:891 [inlined]
 [3] normalize_to_uint(::Type{UInt8}, v::Vector{UInt128}, sm::UInt128)
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:488
 [4] AliasTable{UInt8, Int64}(weights::Vector{UInt128}; _normalize::Bool)
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:87
 [5] AliasTable
   @ ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:78 [inlined]
 [6] (AliasTable{UInt8})(weights::Vector{UInt128})
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:77
 [7] top-level scope
   @ REPL[6]:1

Downstream errors on 32bit

Downstream tests in Turing on 32bit recently started to fail due to AliasTables (or its integration in Distributions): https://github.com/TuringLang/Turing.jl/actions/runs/8868873134/job/24348938461#step:6:588

  ArgumentError: Lookup table longer than length(probablity_alias)
  Stacktrace:
    [1] _lookup_alias_table!(probability_alias::Vector{Tuple{UInt64, Int32}}, weights::AliasTables.MallocArrays.MallocArray{UInt64, 1}, mtz::Int32)
      @ AliasTables ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:182
    [2] _alias_table!(probability_alias::Vector{Tuple{UInt64, Int32}}, weights::AliasTables.MallocArrays.MallocArray{UInt64, 1})
      @ AliasTables ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:248
    [3] set_weights!(at::AliasTables.AliasTable{UInt64, Int32}, weights::Vector{Float64}; _normalize::Bool)
      @ AliasTables ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:159
    [4] _
      @ ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:115 [inlined]
    [5] #AliasTable#3
      @ ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:119 [inlined]
    [6] AliasTable
      @ ~/.julia/packages/AliasTables/wd9Qk/src/AliasTables.jl:119 [inlined]
    [7] AliasTable
      @ ~/.julia/packages/Distributions/fgrZq/src/samplers/aliastable.jl:3 [inlined]
    [8] sampler
      @ ~/.julia/packages/Distributions/fgrZq/src/univariate/discrete/categorical.jl:118 [inlined]
    [9] rand(rng::StableRNGs.LehmerRNG, s::Categorical{Float64, Vector{Float64}}, dims::Tuple{Int32})
      @ Distributions ~/.julia/packages/Distributions/fgrZq/src/genericrand.jl:35
   [10] rand(::StableRNGs.LehmerRNG, ::Categorical{Float64, Vector{Float64}}, ::Int32)
      @ Distributions ~/.julia/packages/Distributions/fgrZq/src/genericrand.jl:24
...

My initial guess is that some part of AliasTables implicitly assumes that Int = Int64.

AliasTable(UInt32[0x60000000, 0x40000000, 0x60000000]) fails on 32-bit

Broken on v1.0.0, v1.1.0, and main. Found while working on #52.

julia> AliasTable(UInt32[0x60000000, 0x40000000, 0x60000000])
ERROR: DivideError: integer division error
Stacktrace:
  [1] divrem(x::UInt128, y::UInt128)
    @ Base ./int.jl:828
  [2] div
    @ ./int.jl:867 [inlined]
  [3] div
    @ ./div.jl:252 [inlined]
  [4] div
    @ ./div.jl:37 [inlined]
  [5] normalize_to_uint!(res::AliasTables.MallocArrays.MallocArray{UInt64, 1}, v::Vector{UInt32}, sm::UInt32)
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:631
  [6] set_weights!(at::AliasTable{UInt64, Int32}, weights::Vector{UInt32}; _normalize::Bool)
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:158
  [7] _
    @ ~/.julia/dev/AliasTables/src/AliasTables.jl:115 [inlined]
  [8] #AliasTable#1
    @ ~/.julia/dev/AliasTables/src/AliasTables.jl:76 [inlined]
  [9] AliasTable(weights::Vector{UInt32})
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:119
 [10] top-level scope
    @ REPL[21]:1

Unsigned output type fails

julia> AliasTable{UInt, UInt}([1,2,3])
ERROR: InexactError: convert(UInt64, -2)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:748
  [2] check_sign_bit
    @ ./boot.jl:754 [inlined]
  [3] toUInt64
    @ ./boot.jl:865 [inlined]
  [4] UInt64
    @ ./boot.jl:895 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cvt1
    @ ./essentials.jl:587 [inlined]
  [7] ntuple
    @ ./ntuple.jl:49 [inlined]
  [8] convert
    @ ./essentials.jl:589 [inlined]
  [9] setindex!
    @ ./genericmemory.jl:211 [inlined]
 [10] _alias_table!(probability_alias::Memory{Tuple{UInt64, UInt64}}, weights::AliasTables.MallocArrays.MallocArray{UInt64, 1})
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:312
 [11] set_weights!(at::AliasTable{UInt64, UInt64}, weights::Vector{Int64}; _normalize::Bool)
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:159
 [12] set_weights!
    @ ~/.julia/dev/AliasTables/src/AliasTables.jl:145 [inlined]
 [13] _
    @ ~/.julia/dev/AliasTables/src/AliasTables.jl:115 [inlined]
 [14] AliasTable{UInt64, UInt64}(weights::Vector{Int64})
    @ AliasTables ~/.julia/dev/AliasTables/src/AliasTables.jl:110
 [15] top-level scope
    @ REPL[18]:1

`AliasTable{UInt8}(fill(1.0, 1000))` throws

julia> AliasTable{UInt8}(fill(1, 1000))
AliasTable{UInt8}([0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01  …  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00])

julia> AliasTable{UInt8}(fill(1.0, 1000))
ERROR: ArgumentError: all weights are zero
Stacktrace:
 [1] get_only_nonzero
   @ ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:124 [inlined]
 [2] normalize_to_uint(::Type{UInt8}, v::Vector{Float64}, sm::Float64)
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:465
 [3] AliasTable{UInt8, Int64}(weights::Vector{Float64}; _normalize::Bool)
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:87
 [4] AliasTable
   @ ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:78 [inlined]
 [5] (AliasTable{UInt8})(weights::Vector{Float64})
   @ AliasTables ~/.julia/packages/AliasTables/yt2Qj/src/AliasTables.jl:77
 [6] top-level scope
   @ REPL[5]:1

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.