juliapomdp / ardespot.jl Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 8.0 269 KB

Implementation of the AR-DESPOT POMDP algorithm

License: Other

Julia 100.00%

ardespot.jl's People

Contributors

Stargazers

Watchers

Forkers

lassepe autonomobil whifflefish himanshugupta1009 stjordanis neroblackstone lorenzobonanni bkraske

ardespot.jl's Issues

Could you provide documents or reference code on how to establish bound?

I'm so sorry to disturb you and make this request.
I am a newbie in programming. When I used the ARDESPOT package, I didn't know how to establish bound. It’s frustrating that I didn’t find out how to establish bound in "bound.jl". It makes me very confused.
Thank you very much for making this project!

Random stream should learn how many random numbers it needs to generate per step.

Right now it just warns the user if there are too many. It would be fairly easy to just learn it.

Infinite bounds

Right now infinite bounds produce errors. I added a warning to notify users, but maybe there is a way to support it.

Use OrderedDict

Propose using ordered dictionaries for reproducibility/fixing of seeds (see branch: ordered_dictionaries)

Both expand! and branching_sim iterate over a dictionary (odict), which may not have consistent order (Why don’t Julia dictionaries preserve order?). As I understand it, the seeding scheme is dependent on these dictionaries having a consistent order. There are cases with unordered dictionaries where lower bound values may be different with the same seed due to the scenarios being evaluated in a different order. Using ordered dictionaries fixes this as evaluation order will be consistent.

Tests are currently passing, but I haven't compared benchmark times yet.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Implementation required for custom rand

Error when using DESPOTSolver with explicit Tiger POMDP

When trying to use the DESPOTSolver with the explicit Tiger POMDP defined here the following error occurs:

[No requirements specified]
ERROR: LoadError: MethodError: no method matching gen_rand!(::MemorizingRNG{MemorizingSource{Random._GLOBAL_RNG}}, ::Int64)
Closest candidates are:
  gen_rand!(!Matched::MemorizingRNG{Random.MersenneTwister}, ::Integer) at /home/cremer/.julia/packages/ARDESPOT/5VlKW/src/memorizing_rng.jl:36
  gen_rand!(!Matched::MemorizingRNG{MemorizingSource{Random.MersenneTwister}}, ::Integer) at /home/cremer/.julia/packages/ARDESPOT/5VlKW/src/random_2.jl:59
Stacktrace:

I guess it could be a problem with julia 1.3 and Random.MersenneTwister

Code:

using POMDPs
using POMDPModelTools
using ARDESPOT
using POMCPOW
using POMDPSimulators
using POMDPPolicies

struct TigerPOMDP <: POMDP{Bool, Symbol, Bool} # POMDP{State, Action, Observation}
    r_listen::Float64 # reward for listening (default -1)
    r_findtiger::Float64 # reward for finding the tiger (default -100)
    r_escapetiger::Float64 # reward for escaping (default 10)
    p_listen_correctly::Float64 # prob of correctly listening (default 0.85)
    discount_factor::Float64 # discount
end

TigerPOMDP() = TigerPOMDP(-1., -100., 10., 0.85, 0.95)

###### STATE SPACE
POMDPs.states(pomdp::TigerPOMDP) = [true, false]
POMDPs.stateindex(pomdp::TigerPOMDP, s::Bool) = s ? 1 : 2 ;


###### ACTION SPACE
POMDPs.actions(pomdp::TigerPOMDP) = [:open_left, :open_right, :listen]
function POMDPs.actionindex(pomdp::TigerPOMDP, a::Symbol)
    if a==:open_left
        return 1
    elseif a==:open_right
        return 2
    elseif a==:listen
        return 3
    end
    error("invalid TigerPOMDP action: $a")
end;

###### TRANSITION FUNCTION
function POMDPs.transition(pomdp::TigerPOMDP, s::Bool, a::Symbol)
    if a == :open_left || a == :open_right
        # problem resets
        return BoolDistribution(0.5) 
    elseif s
        # tiger on the left stays on the left 
        return BoolDistribution(1.0)
    else
        return BoolDistribution(0.0)
    end
end


###### OBERSERVATION SPACE
POMDPs.observations(pomdp::TigerPOMDP) = [true, false]
POMDPs.obsindex(pomdp::TigerPOMDP, o::Bool) = o+1

###### OBERSERVATION FUNCTION
function POMDPs.observation(pomdp::TigerPOMDP, a::Symbol, s::Bool)
    pc = pomdp.p_listen_correctly
    if a == :listen 
        if s 
            return BoolDistribution(pc)
        else
            return BoolDistribution(1 - pc)
        end
    else
        return BoolDistribution(0.5)
    end
end

###### REWARD FUNCTION
function POMDPs.reward(pomdp::TigerPOMDP, s::Bool, a::Symbol)
    r = 0.0
    if a == :listen
        r+=pomdp.r_listen
    elseif a == :open_left
        s ? (r += pomdp.r_findtiger) : (r += pomdp.r_escapetiger)
    elseif a == :open_right
        s ? (r += pomdp.r_escapetiger) : (r += pomdp.r_findtiger)
    end
    return r
end

# init
POMDPs.initialstate_distribution(pomdp::TigerPOMDP) = BoolDistribution(0.5)

POMDPs.discount(pomdp::TigerPOMDP) = pomdp.discount_factor

myPOMDP = TigerPOMDP()

solver = DESPOTSolver(criterion=MaxUCB(20.0)) 
policy = solve(solver, myPOMDP) # compute a pomdp policy

for (s, a, r) in stepthrough(myPOMDP, policy, "s,a,r", max_steps=10)
    @show s
    @show a
    @show r
    println()
end

Using Lower Bound Policy as Default Action

In cases where DefaultPolicyLB() is used for the lower bound and a default action needs to be specified due to the bounds being the same, it could be helpful to use the lower bound policy automatically.

`ScenarioBelief` is missing `weight_sum` implementation

weight_sum is not defined on ScenarioBelief. I noticed this when computing mode(b::ScenarioBelief). Maybe there should be a corresponding default weigth_sum(b::AbstractParticleBelief) = sum(weights(b))?

Error message:

ERROR: MethodError: no method matching weight_sum(::ScenarioBelief{HSState{HumanBoltzmannBState},ParticleCollection{HSState{HumanBoltzmannBState}},MemorizingSource{MersenneTwister}})
Closest candidates are:
  weight_sum(::ParticleCollection) at /home/lassepe/.julia/packages/ParticleFilters/LfMDC/src/beliefs.jl:99
  weight_sum(::WeightedParticleBelief) at /home/lassepe/.julia/packages/ParticleFilters/LfMDC/src/beliefs.jl:124
  weight_sum(::SharedExternalStateBelief) at /home/lassepe/worktree/pomdp_research/HumanSwitching.jl/src/particle_filter.jl:58

Custom rand implementation required

I get the following stack trace when I try to call ARDespot:

ERROR: LoadError: MethodError: no method matching rand(::ARDESPOT.MemorizingRNG{ARDESPOT.MemorizingSource{MersenneTwister}}, ::Type{Float64})
Closest candidates are:
  rand(::AbstractRNG, ::Type, !Matched::Tuple{Vararg{Int64,N}} where N) at random.jl:371
  rand(::AbstractRNG, ::Type, !Matched::Integer, !Matched::Integer...) at random.jl:372
  rand(!Matched::Union{MersenneTwister, RandomDevice}, ::Type{Float64}) at random.jl:304
  ...
Stacktrace:
 [1] (::##3#8{Array{Int64,1}})(::BitArray{2}, ::Array{Int64,1}, ::ARDESPOT.MemorizingRNG{ARDESPOT.MemorizingSource{MersenneTwister}}) at /Users/shushmanchoudhury/Documents/Courses/DMU-autumn17/CS238-Project/juliaCode/Sensors.jl:92
 [2] generate_sor(::UAVpomdp, ::State, ::Int64, ::ARDESPOT.MemorizingRNG{ARDESPOT.MemorizingSource{MersenneTwister}}) at /Users/shushmanchoudhury/Documents/Courses/DMU-autumn17/CS238-Project/juliaCode/UAVpomdp.jl:524
 [3] expand!(::ARDESPOT.DESPOT{State,Int64,Observation}, ::Int64, ::ARDESPOT.DESPOTPlanner{UAVpomdp,Tuple{Float64,Float64},ARDESPOT.MemorizingSource{MersenneTwister},MersenneTwister}) at /Users/shushmanchoudhury/.julia/v0.6/ARDESPOT/src/tree.jl:66
 [4] explore!(::ARDESPOT.DESPOT{State,Int64,Observation}, ::Int64, ::ARDESPOT.DESPOTPlanner{UAVpomdp,Tuple{Float64,Float64},ARDESPOT.MemorizingSource{MersenneTwister},MersenneTwister}) at /Users/shushmanchoudhury/.julia/v0.6/ARDESPOT/src/planner.jl:24
 [5] build_despot(::ARDESPOT.DESPOTPlanner{UAVpomdp,Tuple{Float64,Float64},ARDESPOT.MemorizingSource{MersenneTwister},MersenneTwister}, ::BeliefState) at /Users/shushmanchoudhury/.julia/v0.6/ARDESPOT/src/planner.jl:10
 [6] action(::ARDESPOT.DESPOTPlanner{UAVpomdp,Tuple{Float64,Float64},ARDESPOT.MemorizingSource{MersenneTwister},MersenneTwister}, ::BeliefState) at /Users/shushmanchoudhury/.julia/v0.6/ARDESPOT/src/pomdps_glue.jl:7
 [7] run_iteration_pomdp(::SimulatorState, ::Array{Sensor,1}, ::Array{Float64,1}, ::Int64, ::Bool) at /Users/shushmanchoudhury/Documents/Courses/DMU-autumn17/CS238-Project/juliaCode/RunPomdp.jl:43
 [8] macro expansion at ./util.jl:237 [inlined]
 [9] run_trials(::SimulatorState, ::Array{Sensor,1}, ::Array{Float64,1}, ::Int64, ::Bool, ::Int64) at /Users/shushmanchoudhury/Documents/Courses/DMU-autumn17/CS238-Project/juliaCode/RunPomdp.jl:125
 [10] include_from_node1(::String) at ./loading.jl:569
 [11] include(::String) at ./sysimg.jl:14
 [12] process_options(::Base.JLOptions) at ./client.jl:305
 [13] _start() at ./client.jl:371

I see that generate_sor expects some specific kind of rng, but I thought the idea was just to use rng::AbstractRNG everywhere? What is the .MemorizingRNG about? The generate_sor we have is here - https://github.com/Shushman/CS238-Project/blob/DESPOT_messing/juliaCode/UAVpomdp.jl#L476

Planner failed to choose an action

When I try to run this program, I receive the following error message:

ERROR: LoadError: Planner failed to choose an action because the following exception was thrown:
The lower and upper bounds for the root belief were both 99.59999999999911, so no tree was created.

Use the default_action solver parameter to specify behavior for this case.


To specify an action for this case, use the default_action solver parameter.

Stacktrace:
 [1] action_info(::DESPOTPlanner{Speed_Planner_POMDP,IndependentBounds{DefaultPolicyLB{FunctionPolicy{var"#13#15"},Int64,ARDESPOT.var"#19#21"},typeof(golf_cart_upper_bound)},MemorizingSource{MersenneTwister},MersenneTwister}, ::SparseCat{Array{Any,1},Array{Float64,1}}) at /home/jkwwwwow/.julia/packages/ARDESPOT/thdGA/src/pomdps_glue.jl:17
 [2] action(::DESPOTPlanner{Speed_Planner_POMDP,IndependentBounds{DefaultPolicyLB{FunctionPolicy{var"#13#15"},Int64,ARDESPOT.var"#19#21"},typeof(golf_cart_upper_bound)},MemorizingSource{MersenneTwister},MersenneTwister}, ::SparseCat{Array{Any,1},Array{Float64,1}}) at /home/jkwwwwow/.julia/packages/ARDESPOT/thdGA/src/pomdps_glue.jl:38
 [3] get_best_possible_action(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,1}, ::Array{Float64,1}, ::Array{Int64,1}) at /home/jkwwwwow/autonomous_golf_cart/pomdp_python_integration/speed_planner_updated.jl:426
 [4] top-level scope at /home/jkwwwwow/autonomous_golf_cart/pomdp_python_integration/speed_planner_updated.jl:443
 [5] include(::Module, ::String) at ./Base.jl:377
 [6] exec_options(::Base.JLOptions) at ./client.jl:288
 [7] _start() at ./client.jl:484
in expression starting at /home/jkwwwwow/autonomous_golf_cart/pomdp_python_integration/speed_planner_updated.jl:443

I'm not sure whether there is a problem with ARDESPOT or a bug in the program.

Error when trying the example

I tried to use the example:

using POMDPs, POMDPModels, POMDPSimulators, ARDESPOT

pomdp = TigerPOMDP()

solver = DESPOTSolver(bounds=(-20.0, 0.0))
planner = solve(solver, pomdp)

for (s, a, o) in stepthrough(pomdp, planner, "sao", max_steps=10)
    println("State was $s,")
    println("action $a was taken,")
    println("and observation $o was received.\n")
end

But I get this error message:

┌ Warning: uncrecognized symbol sao in step iteration specification sao.
└ @ POMDPSimulators ~/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:145
ERROR: LoadError: type NamedTuple has no field sao
Stacktrace:
 [1] getindex at ./namedtuple.jl:107 [inlined]
 [2] macro expansion at /home/mo/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:109 [inlined]
 [3] out_tuple at /home/mo/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:103 [inlined]
 [4] iterate(::POMDPSimulators.POMDPSimIterator{:sao,TigerPOMDP,DESPOTPlanner{TigerPOMDP,Tuple{Float64,Float64},MemorizingSource{Random.MersenneTwister},Random.MersenneTwister},ParticleFilters.BasicParticleFilter{TigerPOMDP,TigerPOMDP,ParticleFilters.LowVarianceResampler,Random.MersenneTwister,Array{Bool,1}},Random._GLOBAL_RNG,ParticleFilters.ParticleCollection{Bool},Bool}, ::Tuple{Int64,Bool,ParticleFilters.ParticleCollection{Bool}}) at /home/mo/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:98
 [5] iterate(::POMDPSimulators.POMDPSimIterator{:sao,TigerPOMDP,DESPOTPlanner{TigerPOMDP,Tuple{Float64,Float64},MemorizingSource{Random.MersenneTwister},Random.MersenneTwister},ParticleFilters.BasicParticleFilter{TigerPOMDP,TigerPOMDP,ParticleFilters.LowVarianceResampler,Random.MersenneTwister,Array{Bool,1}},Random._GLOBAL_RNG,ParticleFilters.ParticleCollection{Bool},Bool}) at /home/mo/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:86