juliareinforcementlearning / reinforcementlearningcore.jl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi!
I have a doubt about the update function of the CircularArrayBuffer. The docstring mention AbstractArray{T} whereas the function has AbstractArray as a type of the data parameter.
So I am not sure which one is a mistake.
The code is here: https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl/blob/master/src/utils/circular_array_buffer.jl#L161
What do you think about it ?
A simple example reproducing the issue:
julia>sample_data = [1, 2, 3, 4]
4-element Array{Int64,1}:
1
2
3
4
julia>cb = CircularArrayBuffer{Int}(size(sample_data)...)
0-element CircularArrayBuffer{Int64,1}
julia>push!(cb, sample_data)
ERROR: MethodError: update!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) is ambiguous. Candidates:
update!(cb::CircularArrayBuffer{T,N}, data::AbstractArray) where {T, N} in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:166
update!(cb::CircularArrayBuffer{T,1}, data) where T in ReinforcementLearningCore at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:171
Possible fix, define
update!(::CircularArrayBuffer{T,1}, ::AbstractArray) where T
Stacktrace:
[1] push!(::CircularArrayBuffer{Int64,1}, ::Array{Int64,1}) at /home/oystein/.julia/packages/ReinforcementLearningCore/nMAEB/src/utils/circular_array_buffer.jl:178
[2] top-level scope at REPL[37]:1
According to my test of DQNLearner, sending a non-continuous batch of state with the size of (84,84,4,1) to GPU will take about 2ms. It doubles when we also want to send the next states. In the meanwhile, calculating the gradients will take about 8ms. It would reduce the time a lot (about 1/3) if we can adopt the async manner here.
It is possible to have multiple progress meters in the case of ComposedStopCondition
?
I want to use both StopAfterStep
and StopAfterEpisode
, and stop at whichever occurs earlier. But the progress meter only shows the step-wise progress. Is it possible to also display episode-wise progress along with it?
Hi,
I was just wandering why there was an AbstractHook type but no AbstractStopCondition type. It might not be important at all as everything run perfectly and I have not build any use case where this would be an real issue but I am curious about this.
Sometimes one wants to give a transformed reward to the learner, but keep the true reward given by the environment for evaluation purposes. For example Dopamine clamps all rewards to [-1, 1] and I believe some of our methods are unstable in the Atari domain, because we don't clip the rewards. Where would it be best to transform rewards? Should we add a POST_OBSERVE
hook, or allow for applying the transformation when observations are put into buffers or just before the actual learning takes place?
Currently only rand(action_space)
is implemented. wsample
would be nice.
julia> buffer = CircularTrajectory(capacity=10, state=Float64=>(3,3), reward=Float64=>tuple())
0-element Trajectory{(:state, :reward),Tuple{Float64,Float64},NamedTuple{(:state, :reward),Tuple{CircularArrayBuffer{Float64,3},CircularArrayBuffer{Float64,1}}}}
julia> push!(buffer; state=rand(3,3), reward=1.0)
julia> get_trace(buffer, :state)
3×3×1 CircularArrayBuffer{Float64,3}:
[:, :, 1] =
0.88554 0.547466 0.960766
0.819505 0.977083 0.614598
0.904878 0.249443 0.345301
julia> get_trace(buffer, :reward)
1-element CircularArrayBuffer{Float64,1}:
1.0
julia> length(buffer)
9
The length should be 1.
https://github.com/JuliaArrays/ElasticArrays.jl
Similar to CircularCompactSARTSATrajectory
, we can create an ElasticCompactSARTSATrajectory
for efficiency in some cases.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
@norci mentioned in JuliaReinforcementLearning/ReinforcementLearningZoo.jl#87 (comment) that, there may be some potential performance improvements with algorithms running on CPU only.
Currently the experience buffer is using CircularArrayBuffer
to store data. When doing batch updating, we use select_last_dim
function to create a view. But according to the doc: Copying-data-is-not-always-bad, it may be faster to turn the view into an Array first before feeding it into Flux models.
Initial investigation shows that, by transforming the SubArray
into Array
, the average time per step of experiment E`JuliaRL_BasicDQN_CartPole`
will decrease from ~0.00128
to ~0.00107
. When the model is more complex, the improvement becomes larger.
Note that models on GPU will not be affected, since SubArray
will be automatically converted to Array
first:
https://github.com/JuliaGPU/CUDA.jl/blob/f31cbe22b4baba872a48bcb48e9f60e712f653fc/src/array.jl#L206
And we have already forced the SubArray
of Array
to be converted into CuArray
instead of SubArray
of CuArray
here:
ReinforcementLearningCore.jl/src/utils/device.jl
Lines 16 to 32 in a94544c
Users need to manually add a layer in models to convert the SubArray
into Array
first when working in CPU only devices .
SubArray
of Array
into Array
in the send_to_host
function.This is the easiest way. But I think it breaks the meaning of send_to_host
. Afterall, the SubArray
of Array
is already in CPU.
Hi, some environments have render functions (like MountainCar etc...) but the main run function does not propose any native way to use this render function when possible.
It would be great to add this as a parameter or to create a new hook RenderEpisode (or something like that) which call the render(env) function at each PRE_ACT_STAGE (plus special cases).
I personaly find the hook more flexible and adapted to the package style !
Thanks !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.