juliareinforcementlearning / reinforcementlearningcore.jl Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 11.0 408 KB

License: MIT License

Julia 100.00%

reinforcementlearningcore.jl's People

Contributors

Stargazers

Watchers

Forkers

findmyway aterenin felixchalumeau rbange sid-bhatia-0 ilancoulon sriram13m pilgrimygy takasho777 leizhougetbetter jinraekim

reinforcementlearningcore.jl's Issues

Type definition in update function of CircularArrayBuffer structure

Hi!

I have a doubt about the update function of the CircularArrayBuffer. The docstring mention AbstractArray{T} whereas the function has AbstractArray as a type of the data parameter.

So I am not sure which one is a mistake.

The code is here: https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl/blob/master/src/utils/circular_array_buffer.jl#L161

What do you think about it ?

Ambiguity in update! for CircularArrayBuffer

A simple example reproducing the issue:

julia>sample_data = [1, 2, 3, 4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia>cb = CircularArrayBuffer{Int}(size(sample_data)...)
0-element CircularArrayBuffer{Int64,1}

julia>push!(cb, sample_data)
ERROR: MethodError: update!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) is ambiguous. Candidates:
  update!(cb::CircularArrayBuffer{T,N}, data::AbstractArray) where {T, N} in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:166
  update!(cb::CircularArrayBuffer{T,1}, data) where T in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:171
Possible fix, define
  update!(::CircularArrayBuffer{T,1}, ::AbstractArray) where T
Stacktrace:
 [1] push!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:178
 [2] top-level scope at REPL[37]:1

Prefetch data from trajectory for learner

According to my test of DQNLearner, sending a non-continuous batch of state with the size of (84,84,4,1) to GPU will take about 2ms. It doubles when we also want to send the next states. In the meanwhile, calculating the gradients will take about 8ms. It would reduce the time a lot (about 1/3) if we can adopt the async manner here.

Or, try https://github.com/oxinabox/AutoPreallocation.jl ?

Is it possible to have multiple progress meters for ComposedStopCondition?

It is possible to have multiple progress meters in the case of ComposedStopCondition?
I want to use both StopAfterStep and StopAfterEpisode, and stop at whichever occurs earlier. But the progress meter only shows the step-wise progress. Is it possible to also display episode-wise progress along with it?

Remove `extract_experience`

ref: JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl#17

Why is there no AbstractStopCondition type ?

Hi,
I was just wandering why there was an AbstractHook type but no AbstractStopCondition type. It might not be important at all as everything run perfectly and I have not build any use case where this would be an real issue but I am curious about this.

Reward transformations

Sometimes one wants to give a transformed reward to the learner, but keep the true reward given by the environment for evaluation purposes. For example Dopamine clamps all rewards to [-1, 1] and I believe some of our methods are unstable in the Atari domain, because we don't clip the rewards. Where would it be best to transform rewards? Should we add a POST_OBSERVE hook, or allow for applying the transformation when observations are put into buffers or just before the actual learning takes place?

Implemente `StatsBase.wsample` so can sample from action space with certain probabilitly like in a policy

Currently only rand(action_space) is implemented. wsample would be nice.

Bug with the length of CircularTrajectory

julia> buffer = CircularTrajectory(capacity=10, state=Float64=>(3,3), reward=Float64=>tuple())
0-element Trajectory{(:state, :reward),Tuple{Float64,Float64},NamedTuple{(:state, :reward),Tuple{CircularArrayBuffer{Float64,3},CircularArrayBuffer{Float64,1}}}}

julia> push!(buffer; state=rand(3,3), reward=1.0)

julia> get_trace(buffer, :state)
3×3×1 CircularArrayBuffer{Float64,3}:
[:, :, 1] =
 0.88554   0.547466  0.960766
 0.819505  0.977083  0.614598
 0.904878  0.249443  0.345301

julia> get_trace(buffer, :reward)
1-element CircularArrayBuffer{Float64,1}:
 1.0

julia> length(buffer)
9

The length should be 1.

Support `trainmode!` and `testmode!`

Add ElasticArray as a container in Trajectory

https://github.com/JuliaArrays/ElasticArrays.jl

Similar to CircularCompactSARTSATrajectory, we can create an ElasticCompactSARTSATrajectory for efficiency in some cases.

JuliaReinforcementLearning/ReinforcementLearningZoo.jl#80

improve `find_all_max`

Ref:

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Performance issue with models running on CPU

Problem

@norci mentioned in JuliaReinforcementLearning/ReinforcementLearningZoo.jl#87 (comment) that, there may be some potential performance improvements with algorithms running on CPU only.

Currently the experience buffer is using CircularArrayBuffer to store data. When doing batch updating, we use select_last_dim function to create a view. But according to the doc: Copying-data-is-not-always-bad, it may be faster to turn the view into an Array first before feeding it into Flux models.

Initial investigation shows that, by transforming the SubArray into Array, the average time per step of experiment E`JuliaRL_BasicDQN_CartPole` will decrease from ~0.00128 to ~0.00107. When the model is more complex, the improvement becomes larger.

Models on GPU are not affected

Note that models on GPU will not be affected, since SubArray will be automatically converted to Array first:

https://github.com/JuliaGPU/CUDA.jl/blob/f31cbe22b4baba872a48bcb48e9f60e712f653fc/src/array.jl#L206

And we have already forced the SubArray of Array to be converted into CuArray instead of SubArray of CuArray here:

ReinforcementLearningCore.jl/src/utils/device.jl

Lines 16 to 32 in a94544c

    
           send_to_device( 
        
               ::Val{:gpu}, 
        
               x::Union{ 
        
                   SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}}, 
        
                   Base.ReshapedArray{<:Any,<:Any,<:SubArray{<:Any,<:Any,<:CircularArrayBuffer}}, 
        
                   SubArray{ 
        
                       <:Any, 
        
                       <:Any, 
        
                       <:Base.ReshapedArray{ 
        
                           <:Any, 
        
                           <:Any, 
        
                           <:SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}}, 
        
                       }, 
        
                   }, 
        
                   ElasticArray, 
        
               }, 
        
           ) = CuArray(x)

Possible Solutions

Nothing to change (But need to document it somewhere)

Users need to manually add a layer in models to convert the SubArray into Array first when working in CPU only devices .

Automacally convert SubArray of Array into Array in the send_to_host function.

This is the easiest way. But I think it breaks the meaning of send_to_host. Afterall, the SubArray of Array is already in CPU.

Please create new issues in ReinforcementLearning.jl for better retrieval

https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues

Adding a render hook or a render option in run()

Hi, some environments have render functions (like MountainCar etc...) but the main run function does not propose any native way to use this render function when possible.

It would be great to add this as a parameter or to create a new hook RenderEpisode (or something like that) which call the render(env) function at each PRE_ACT_STAGE (plus special cases).
I personaly find the hook more flexible and adapted to the package style !

Thanks !

	send_to_device(
	::Val{:gpu},
	x::Union{
	SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}},
	Base.ReshapedArray{<:Any,<:Any,<:SubArray{<:Any,<:Any,<:CircularArrayBuffer}},
	SubArray{
	<:Any,
	<:Any,
	<:Base.ReshapedArray{
	<:Any,
	<:Any,
	<:SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}},
	},
	},
	ElasticArray,
	},
	) = CuArray(x)