Coder Social home page Coder Social logo

reinforcementlearningcore.jl's People

Contributors

albheim avatar aterenin avatar felixchalumeau avatar findmyway avatar github-actions[bot] avatar ilancoulon avatar jinraekim avatar juliatagbot avatar norci avatar rbange avatar sid-bhatia-0 avatar sriram13m avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

reinforcementlearningcore.jl's Issues

Ambiguity in update! for CircularArrayBuffer

A simple example reproducing the issue:

julia>sample_data = [1, 2, 3, 4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia>cb = CircularArrayBuffer{Int}(size(sample_data)...)
0-element CircularArrayBuffer{Int64,1}

julia>push!(cb, sample_data)
ERROR: MethodError: update!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) is ambiguous. Candidates:
  update!(cb::CircularArrayBuffer{T,N}, data::AbstractArray) where {T, N} in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:166
  update!(cb::CircularArrayBuffer{T,1}, data) where T in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:171
Possible fix, define
  update!(::CircularArrayBuffer{T,1}, ::AbstractArray) where T
Stacktrace:
 [1] push!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:178
 [2] top-level scope at REPL[37]:1

Is it possible to have multiple progress meters for ComposedStopCondition?

It is possible to have multiple progress meters in the case of ComposedStopCondition?
I want to use both StopAfterStep and StopAfterEpisode, and stop at whichever occurs earlier. But the progress meter only shows the step-wise progress. Is it possible to also display episode-wise progress along with it?

Why is there no AbstractStopCondition type ?

Hi,
I was just wandering why there was an AbstractHook type but no AbstractStopCondition type. It might not be important at all as everything run perfectly and I have not build any use case where this would be an real issue but I am curious about this.

Reward transformations

Sometimes one wants to give a transformed reward to the learner, but keep the true reward given by the environment for evaluation purposes. For example Dopamine clamps all rewards to [-1, 1] and I believe some of our methods are unstable in the Atari domain, because we don't clip the rewards. Where would it be best to transform rewards? Should we add a POST_OBSERVE hook, or allow for applying the transformation when observations are put into buffers or just before the actual learning takes place?

Bug with the length of CircularTrajectory

julia> buffer = CircularTrajectory(capacity=10, state=Float64=>(3,3), reward=Float64=>tuple())
0-element Trajectory{(:state, :reward),Tuple{Float64,Float64},NamedTuple{(:state, :reward),Tuple{CircularArrayBuffer{Float64,3},CircularArrayBuffer{Float64,1}}}}

julia> push!(buffer; state=rand(3,3), reward=1.0)

julia> get_trace(buffer, :state)
3×3×1 CircularArrayBuffer{Float64,3}:
[:, :, 1] =
 0.88554   0.547466  0.960766
 0.819505  0.977083  0.614598
 0.904878  0.249443  0.345301

julia> get_trace(buffer, :reward)
1-element CircularArrayBuffer{Float64,1}:
 1.0

julia> length(buffer)
9

The length should be 1.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Performance issue with models running on CPU

Problem

@norci mentioned in JuliaReinforcementLearning/ReinforcementLearningZoo.jl#87 (comment) that, there may be some potential performance improvements with algorithms running on CPU only.

Currently the experience buffer is using CircularArrayBuffer to store data. When doing batch updating, we use select_last_dim function to create a view. But according to the doc: Copying-data-is-not-always-bad, it may be faster to turn the view into an Array first before feeding it into Flux models.

Initial investigation shows that, by transforming the SubArray into Array, the average time per step of experiment E`JuliaRL_BasicDQN_CartPole` will decrease from ~0.00128 to ~0.00107. When the model is more complex, the improvement becomes larger.

Models on GPU are not affected

Note that models on GPU will not be affected, since SubArray will be automatically converted to Array first:

https://github.com/JuliaGPU/CUDA.jl/blob/f31cbe22b4baba872a48bcb48e9f60e712f653fc/src/array.jl#L206

And we have already forced the SubArray of Array to be converted into CuArray instead of SubArray of CuArray here:

send_to_device(
::Val{:gpu},
x::Union{
SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}},
Base.ReshapedArray{<:Any,<:Any,<:SubArray{<:Any,<:Any,<:CircularArrayBuffer}},
SubArray{
<:Any,
<:Any,
<:Base.ReshapedArray{
<:Any,
<:Any,
<:SubArray{<:Any,<:Any,<:Union{CircularArrayBuffer,ElasticArray}},
},
},
ElasticArray,
},
) = CuArray(x)

Possible Solutions

  1. Nothing to change (But need to document it somewhere)

Users need to manually add a layer in models to convert the SubArray into Array first when working in CPU only devices .

  1. Automacally convert SubArray of Array into Array in the send_to_host function.

This is the easiest way. But I think it breaks the meaning of send_to_host. Afterall, the SubArray of Array is already in CPU.

Adding a render hook or a render option in run()

Hi, some environments have render functions (like MountainCar etc...) but the main run function does not propose any native way to use this render function when possible.

It would be great to add this as a parameter or to create a new hook RenderEpisode (or something like that) which call the render(env) function at each PRE_ACT_STAGE (plus special cases).
I personaly find the hook more flexible and adapted to the package style !

Thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.