ufal / npfl122 Goto Github PK
View Code? Open in Web Editor NEWNPFL122 repository
License: Creative Commons Attribution Share Alike 4.0 International
NPFL122 repository
License: Creative Commons Attribution Share Alike 4.0 International
The slide is titled "Fixed learning rate" but alpha_n is used in convergence condition. I suppose we could keep decreasing the learning rate and still call it fixed but maybe this could be better explained in slides?
It feels that the examples of MDPs used in lectures and assignments encourage the idea that while an action may lead to many states, the triplet old state - action - new state always produces the same reward, which is (given that I understand it all right) incorrect. Maybe it would be worth to make this possibility more explicit?
Please see screenshot. Resizing doesn't seem to help as the presentation keeps a fixed aspect ratio.
The slides say that the value estimate v is normalized with respect to an unnormalized value predictor n. Isn't it actually the other way round?
The paper says: In order to normalise both baseline and policy gradient updates, we first parameterise the value estimate v as the linear transformation of a suitably normalised value prediction n.
In Lecture 4, slides 18 and 19, shouldn't we initialize action-value-function weights instead of just value-function weights?
Slides of lecture 10 state that "the gradient direction is a local minimizer" - shouldn't it actually say that it's a maximizer?
This is also what is said in the paper corresponding to the slides.
At least my Python 3.9.7 on Windows won't run it without import math
which is used on line 89 as math.inf
.
ucb: perform UCB search with confidence level c and computing the value function using averaging. Plot results for c=1 and ε∈{1/128,1/64,1/32,1/16,1/8}
But according to the slides, the ε parameter is not used at all in UCB, as we replace the ε-greedy strategy with an argmax. Perhaps we should be changing c?
Behem delani ulohy tarce_algorithms
jsem narazil na nasledujici nejasnost:
Myslim, ze tam misto R_{t+1}
ma byt R_i
Ale kdyz si dany vzorecek dosadim do sum z prehledu
Tak mi vychazi zohlednene rewards o jedna posunute.
Na piazze zminujete nasledujici:
If an episode ends, only the value function of the V(next_state) “after the episode” is not used; all other value functions in TD errors are computed. So all TD errors are unchanged, except for the last one, which is just R_T - V(S_T)
zdroj citace Coz neodpovida slidum.
Cekal bych, ze R_t
a V_t
budou vynasobene stejnou mocninou gammy.
EDIT: Klidne vyrobim PR, jenom jsem si chtel nejdrive overit, ze to je opravdu spatne.
There are non-matching types in policy_iteration_mc_egreedy.py. Main function is supposed to return "list[int]", instead returns "np.ndarray".
When studying the materials in slides for my diploma thesis I ran into a possible error or misleading formulation in the pseudocode for REINFORCE on slide https://ufal.mff.cuni.cz/~straka/courses/npfl122/2223/slides/?06#22
There is IMO an error on the last line in that the
Consider the definition of on-policy distribution for infinite horizon trajectories (the one not mentioned in the Sutton's book as they only define it for finite horizon non-discounted tasks).
where
When I expand the recursion I get
so I get the term
Now as I'm calculating the expectation under the on-policy distribution
To be more clear what I mean is that it is true that:
but the corresponding update rule when estimating the expectations should be
Hello, I would like to propose an enhancement to the current set of templates. I believe, it would be worth the effort to start using f-strings, a feature available since Python 3.6. It would improve readability of the code and it would make the code shorter and more concise.
I would be willing to create a pull request for all of the already published assignments which would implement this transition to f-strings.
In code for value iteration (lec 2, slide 11) and Sarsa (lec 3, slide 13), S^+ is used but not explained. Does it stand for reachable states only? (And could that be written on the slides? :))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.