udlbook / udlbook Goto Github PK
View Code? Open in Web Editor NEWUnderstanding Deep Learning - Simon J.D. Prince
License: Other
Understanding Deep Learning - Simon J.D. Prince
License: Other
Just a couple of minor issues I think I've spotted:
Figure 5.1b): x axis of top right figure should run from 0 to 10 instead of 0 to 2, as in 5.1a) (or the probabilities below should read something like P(y | x = 0.4) and P(y | x = 1.4) )
Figure 5.1d): same as 5.1b)
Figure 5.11: the text in the grey rows is cut off at the bottom (makes [] look like the ceiling function ⌈⌉ )
Thanks for this great book, it's very helpful.
In pages 27 and 59 (v. 7-10-22-C-5) the word "multidimensional" is written without a hyphen, right before eq. 3.11.
Also, in page 90, you write "one dimensional", where in most places this is 1D.
These are very minor of course, I'm just letting you know in case you want to normalize w.r.t. the norm throughout the text.
Equation 15.4 is a more complex loss function than we have seen before; the discriminator parameters φ are manipulated to minimize the loss function and the generative parameters φ are manipulated to maximize the loss function.
I think the second φ should be θ, if my understanding is correct.
Under section B.3.3 it says "When the mean of a multivariate normal in x is a linear function Az + b of a second variable y," -- I think this should be Ay + b.
fforward pass -> forward pass
Again on version 7-10-22-C5:
p.122: "ADAM" -> "Adam"
p.131:"Randomly which is inefficient": missing punctuation(??)
p.132: Missing reference to an Appendix
p.140: Missing reference to figure in figure 9.5 (probably to fig.8.1)
p.144: "this has smooths out the learned function"
p.145: "maximum-likelihoodcriterion"
p.146: Eq.9.11, change φ το φ' on the integral of the denominator?
p.154: "is not well understood although.." (missing a comma? unsure)
"In fact, this iterative approach is not necessary for the linear regression model."
I would remove "In fact" because the text related to the note doesn't demonstrate the presence of a closed-form solution
Book looks great so far!
I was reading version 07_10_22_C and spotted the following minor typos.
Eqn 5.4 - last line
missing a "]"
Eqn 6.1
Missing bold \phi under argmin
Eqn 6.5 - top line
\sum_{i=1}^I l_i should be \sum_{i=1}^I {\ell}_i ?
Fig 7.5
Missing indentation for a few lines after the second for loop "for i, data in ..."
"For example, we might want to predict a molecules melting and boiling point (a multivariate classification problem, figure 1.2b) or the object class at every point in an image (a multivariate classification problem, figure 1.4a)"
The molecule melting and boiling point is a regression model I think. Should this be replaced by "(a multivariate regression problem, figure 1.2b)"?
In fig 8.5 Sources of test error, subplot b) text says:
"b) Bias. Even with the best possible parameters, the three-region model (brown line) cannot fit the true function (cyan line) exactly. (...) "
To match the colors in the figure it should say:
"b) Bias. Even with the best possible parameters, the three-region model (cyan line) cannot fit the true function (black line) exactly."
Book version 06_02_23_C
On page 100, below equation 7.6, "We aim to compute the derivatives: .... and
On page 103, caption of Fig 7.5, should be "... the derivatives
P.S.
For the PDF version, is it possible for you to add chapter numbers to the navigation panel? There are places in the book where you say something will be discussed in Chapter XX, and I think it'll be much easier to locate the chapters for PDF readers.
Thank you very much for your great work.
The order of partial derivatives is inconsistent in the equations. While the partial derivatives themselves are correct, they are arranged in different orders. In equations 7.11 and 7.12, when we apply the chain rule, the first term for the loss with respect to the pre-activation starts on the left, and each subsequent term goes on the right, i.e., the terms are ordered from left to right. However, in equations 7.17-7.19, the order is from right to left.
This maybe due to order of matrix multiplication
Page 138, section 9.2, Implicit regularization: the first sentence talks about "An intriguing recent finding..." which calls for a citation.
Version 2023_01_16, PDF Page 46:
"Many different activation functions have been tried (figure 3.13), but the most common choice is the ReLU" -->
"Many different activation functions have been tried (figure 3.13), but the most common choice is the ReLU (figure 3.1)"
In v. 2023-03-03:
Page 49, Sec. 4.4.1:
"the remaining matrices
I think that following the same notation as the first part of this section it should be:
"
Or
"
I found the explanation for Figure 3.8 to be lacking. I understand that each image in Figure 3.8 a) to e) corresponds to one activation unti, but I do not understand what the gray lines mean and what does the brown gradient represent.
An explanation along the following lines would benefit the reader: "For example, in Figure 3.8 a), a gray line is the intersection (?) of one of the hyperplane parameterized with (\Phi_0, \Phi_1, \Phi_2) with the (x_1, x_2) plane. The color of the image represents the value of the resulting linear combination of parameters and inputs with warmer colors corresponding to more positive values."
Version 2023-03-03
Page 306: "Consider applying a function
Equation 16.8 --> It looks like the
Equation 16.8 --> the term
Figure 16.18 (a) --> it should be
Figure 16.18 (b) --> it should be
Equation 16.19 --> assuming right the equation 16.18, it should be
Equation 16.19 --> assuming right the equation 16.18, it should be
Before Equation 16.23 --> I would emphasize that the computation of the trace could be computationally expensive and for this reason we need an estimation that can be done using Hutchinson's trace estimator (https://people.cs.umass.edu/~cmusco/personal_site/pdfs/hutchplusplus50.pdf).
Figure 16.10, Is it possible to create also the figure for the normalizing direction? The description in the text is not so clear for me, In my understanding after the first inverse mapping, using
page 318 Footnote 2 --> Could you explain synthetically what do you mean with "better"? Do you mean that the quality of samples is not so good as in the other approaches?
Paragraph 16.5.3 --> The figure 16.4 and the text in §16.5.3 look not so coherent because they use
Ver 8-3-2023
Please verify the Equation 16.19 and Figure 16.8
Figure 16.8
Figure 16.8 (a) --> it should be
Figure 16.8 (b) --> it should be
Equation 16.19
Version 06-02-2023
Eq, 19.12 -->I think it should be
§19.4 --> Just a comment on the style "The principle of fitted Q-Learning is... . This is known as fitted Q-Learning" --> The repetition of "fitted Q-Learning" looks strange IMHO.
Fig. 19.2 --> "It does not slip on the ice and moves downward" I think that it should be: "It does slip on the ice and moves downward" instead to go left.
Eq. 19.15 :
Fig 19.12 --> the same "problem" of Eq. 19.15
Eq. 19.16 --> the same "problem" of Eq. 19.15
Eq. 19.17 --> the same "problem" of Eq. 19.15
Text below Eq. 19.17 --> the same "problem" of Eq. 19.15
§19.4.1 (book page 394) : values
Eq. 19.18 --> the same "problem" of Eq. 19.15
Eq. 19.19 --> the same "problem" of Eq. 19.15
Eq. 19.20 --> the same "problem" of Eq. 19.15 but for
Eq. 19.21 --> the same "problem" of Eq. 19.15 but for
Page 396: "DQN se deep networks" --> "DQNs use deep networks"
Hello! I've made a pass through chapter 17—thanks again for the very informative write-up! Here are some small corrections and some questions (you can of course ignore these more general comments if you think they are too distracting from the main point of the book):
Zhao, Shengjia, Jiaming Song, and Stefano Ermon. "InfoVAE: Information maximizing variational autoencoders." AAAI (2017).
Page 252. There is an extra closing bracket ] in Eq. (13.4)
Page 292 (Sec 15.3) missing a full stop at the end of the sentence beginning with “Mini-batch discrimination ensures”
Page 318 (Sec 16.3.5) there is a sentence that reads “This requires a different formuation of normalizing flows that learns to from another function rather than a set of samples” – typo in formulation and “to from” should be just “from” I think.
I believe in figure 3.8, there seems to be a small typo.
g-h) The clipped planes are then weighted
should be:
g-i) The clipped planes are then weighted
In caption of Fig. 2.3 "The three circles represent the three lines from figure 2.2b-d", the two circles related to 2.2b and 2.2d have the same color of the relative lines, but the circle related to 2.2c has a different color.
The word "personal" is not present in any term of the factorization shown in 12.15.
Be careful that also the next paragraph (§12.7.2) does not consider "personal" word when it talks about the right context.
Version 2023-02-02
Fig. 19.7: "Blue arrow" but the arrow is not blue
Eq. 19.7 --> I think that
Eq. 19.11 --> I think it should be
Eq, 19.12 -->I think it should be
Version 06-02-2023
Eq. 19.24:
Second line of Eq. 19.25: The same problem of equation 19.24
Check the second line of eq. 19.25: it looks like there is a not necessary ']' and the
Check the eq. 19.31: it looks like there is a not necessary ']'
v. 2023-03-03
Page 437, Appendix C
Section C.3.1:
Section C.3.2: About Stirling’s formula is still empty.
Section C.3.3:
The second line in the second paragraph on page 167: change "thes hidden units" to "the hidden units"?
Maybe change "Problems 10.17 - 10.16" to "Problems 10.16 - 10.17" on page 176?
Your book is a fantastic exposition of deep learning, certainly the best I have ever read on the subject. Thank you so much for your work!
On page 329 in equation (16.27), exp(x^2/2) should be exp(-z^2/2).
In page 423, the paragraph started by "Pruning can be considered a form of ..." may have a loss of symbol "(i)".
In the second paragraph of section 9.3.3 "Dropout", a word "kinks" occurs. I guess it might be "links".
training algorithms -> training samples
Thanks for releasing the book! I've enjoyed going through it!
Here are some typos that I could spot while reading chapter 18:
The following questions might out of the scope of the book, but I do wonder:
References:
The figure text '...is used to show that the Kullback-Leibler divergence is always greater than one'. The KL divergence should be always greater than zero.
Version 31_01_23_C
In the caption of Fig 19.3 is indicated "blue arrows" but in the figure the arrows don't look blue.
In the Fig 19.4 The policy a) doesn't lead the agent to the reward as indicated in the caption ("This policy generally steers the penguin from top-left to bottom-right where the reward lies"). The action in state 4 and 8 should be modified in "down".
Also the policy b) for different initial states don't reach the goal. Is it the desired behavior ?
Hey Prof. Simon, it's Roy Amoyal the undergraduate student again.
Something that I have noticed in our Deep Learning class at my university and in the book is that the backpropagation process is the most "mathematically challenging" for most of the students. Unfortunately, when most of the lecturers explain the backpropagation math process, they give up a reminder about the chain rule and a simple illustration using the neural network notations. While for some students it may be obvious, for some reason most of the students lost the lecturers in that part (including me)
Although you give a reference to the "Matrix calculus Appendix C.2" next to equation 7.6, I think something like the next equations can be quite helpful and even dramatically change the understanding of the backpropagation process before jumping into equation 7.6.
Reminder of the chain rule (composition of 3 functions):
(self-note: I think it's better to remind with 3 functions instead of just 2)
If m(t)=(f∘g∘k)(t)=f(g(k(t))). then m′(x)=f′(g(k(x)))⋅g′(k(x))⋅k′(x).
Because (***self-note: here I think it is really important to explicitly write the equation with the substitutions ***):
(loss function) li = l[f3, yi] = l(b3 + Ω3⋅h3,yi) = l(b3 + Ω3⋅a[f2],yi) = l(b3 + Ω3⋅a[b2 + Ω2⋅h2],yi)
(substituting the expressions from equation 7.5) and because h3 is the inner function of f2 (h2(x)=a[x], and in this case x=f2 so h2=a[f2]), just like in the reminder, when k(t) is an inner function of t inside f and g (composition), we get:
Because we are doing the "backpropagation", we first calculate the derivative of the most "inner" function of the neural network, in our case h3 = a[f2] so we first calculate:
and then we keep calculating the expressions backward.
What do you think?
It could be really helpful to understand this topic better and faster.
Thanks, Roy.
In v. 2023-03-03
Page 56, Loss functions:
Page 57, figure 5.1 :
Page 58, Sec. 5.1.1 :
I'm probably wrong about these, but :
Page 60, equation 5.5 :
Page 62, equation 5.12 :
Page 426, B.1.5 :
Version 2023_01_16, Fig 4.1 Caption:
"The first network maps inputs x ∈ [0,1] to outputs y ∈ [0,1]" --> "The first network maps inputs x ∈ [-1,1] to outputs y ∈ [-1,1]"
Version 2023_01_31
Fig 12.9: The vocabulary is indicate as
UnderstandingDeepLearning_03_03_23_C.pdf Chapter 9 Page 157
The paragraph preceding equation 9.18 contains the text "the second term on the right-hand side must equal zero" but I think it was meant to say "the third term on the right-hand side must equal zero"
Thank you for writing this book!
In v. 2023-03-03
Page 421, Appendix A, Sets:
"The notation
Since x and y are vectors and not scalars I think it should be:
"The notation
In
Page 422, Appendix A, Sets:
In v. 2023-03-03:
Page 22, Sec. 2.2.4, Line 4: "However, it also depends on the how expressive the model is."
In the "Notes" at page 403, there may be a grammatical error in the first sentence.
Fig 15.20: "Changing course styles" -> "Changing coarse styles" ? Same for "course" noise.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.