Coder Social home page Coder Social logo

udlbook's People

Contributors

ani0075saha avatar dhruvpatel01 avatar dillonplunkett avatar ferdiekrammer avatar igorrusso avatar krishnams0ni avatar pitmonticone avatar ritog avatar swaystar123 avatar tonyjo avatar udlbook avatar yrahal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

udlbook's Issues

Fig 5.1 and 5.11

Just a couple of minor issues I think I've spotted:

Figure 5.1b): x axis of top right figure should run from 0 to 10 instead of 0 to 2, as in 5.1a) (or the probabilities below should read something like P(y | x = 0.4) and P(y | x = 1.4) )

Figure 5.1d): same as 5.1b)

Figure 5.11: the text in the grey rows is cut off at the bottom (makes [] look like the ceiling function ⌈⌉ )

Thanks for this great book, it's very helpful.

Missing hyphens? etc

In pages 27 and 59 (v. 7-10-22-C-5) the word "multidimensional" is written without a hyphen, right before eq. 3.11.
Also, in page 90, you write "one dimensional", where in most places this is 1D.

These are very minor of course, I'm just letting you know in case you want to normalize w.r.t. the norm throughout the text.

P. 279: GAN loss parameters

Equation 15.4 is a more complex loss function than we have seen before; the discriminator parameters φ are manipulated to minimize the loss function and the generative parameters φ are manipulated to maximize the loss function.
I think the second φ should be θ, if my understanding is correct.

Minor typo on page 437 (Appendix B)

Under section B.3.3 it says "When the mean of a multivariate normal in x is a linear function Az + b of a second variable y," -- I think this should be Ay + b.

Some more minor issues

Again on version 7-10-22-C5:

p.122: "ADAM" -> "Adam"
p.131:"Randomly which is inefficient": missing punctuation(??)
p.132: Missing reference to an Appendix
p.140: Missing reference to figure in figure 9.5 (probably to fig.8.1)
p.144: "this has smooths out the learned function"
p.145: "maximum-likelihoodcriterion"
p.146: Eq.9.11, change φ το φ' on the integral of the denominator?
p.154: "is not well understood although.." (missing a comma? unsure)

Note 1 on page 16

"In fact, this iterative approach is not necessary for the linear regression model."
I would remove "In fact" because the text related to the note doesn't demonstrate the presence of a closed-form solution

Minor typos

Book looks great so far!

I was reading version 07_10_22_C and spotted the following minor typos.

Eqn 5.4 - last line
missing a "]"

Eqn 6.1
Missing bold \phi under argmin

Eqn 6.5 - top line
\sum_{i=1}^I l_i should be \sum_{i=1}^I {\ell}_i ?

Fig 7.5
Missing indentation for a few lines after the second for loop "for i, data in ..."

Section 5.6 regression vs classification

"For example, we might want to predict a molecules melting and boiling point (a multivariate classification problem, figure 1.2b) or the object class at every point in an image (a multivariate classification problem, figure 1.4a)"

The molecule melting and boiling point is a regression model I think. Should this be replaced by "(a multivariate regression problem, figure 1.2b)"?

Wrong colors in text of fig 8.5b p. 119

In fig 8.5 Sources of test error, subplot b) text says:

"b) Bias. Even with the best possible parameters, the three-region model (brown line) cannot fit the true function (cyan line) exactly. (...) "

To match the colors in the figure it should say:

"b) Bias. Even with the best possible parameters, the three-region model (cyan line) cannot fit the true function (black line) exactly."

page 100 and page 103

Book version 06_02_23_C

On page 100, below equation 7.6, "We aim to compute the derivatives: .... and $\partial y/\partial \omega_4$." Is $\partial y/\partial \omega_4$ a typo?

On page 103, caption of Fig 7.5, should be "... the derivatives $\partial l_i / \partial \beta$ and $\partial l_i/ \partial \omega$", and "... by $\partial l_i/ \partial \beta_k$ or $\partial l_i/ \partial \omega_k$ as appropriate."

P.S.
For the PDF version, is it possible for you to add chapter numbers to the navigation panel? There are places in the book where you say something will be discussed in Chapter XX, and I think it'll be much easier to locate the chapters for PDF readers.

Thank you very much for your great work.

partial derivatives is inconsistent in the equations

The order of partial derivatives is inconsistent in the equations. While the partial derivatives themselves are correct, they are arranged in different orders. In equations 7.11 and 7.12, when we apply the chain rule, the first term for the loss with respect to the pre-activation starts on the left, and each subsequent term goes on the right, i.e., the terms are ordered from left to right. However, in equations 7.17-7.19, the order is from right to left.

EQ 7.12
Screenshot 2023-02-27 at 5 18 11 PM

EQ 7.18 and 7.19
Screenshot 2023-02-27 at 5 18 33 PM

This maybe due to order of matrix multiplication

Suggestion: Cite figure where ReLU is depicted

Version 2023_01_16, PDF Page 46:
"Many different activation functions have been tried (figure 3.13), but the most common choice is the ReLU" -->
"Many different activation functions have been tried (figure 3.13), but the most common choice is the ReLU (figure 3.1)"

Notation erratum, page 49 (v. 2023-03-03)

In v. 2023-03-03:
Page 49, Sec. 4.4.1:

"the remaining matrices $\Omega_k$ are $D_k \times D_{k-1}$"

I think that following the same notation as the first part of this section it should be:
" $\Omega_k$ are $D_{k+1} \times D_{k}$"
Or
" $\Omega_{k-1}$ are $D_k \times D_{k-1}$"

Figure 3.8

I found the explanation for Figure 3.8 to be lacking. I understand that each image in Figure 3.8 a) to e) corresponds to one activation unti, but I do not understand what the gray lines mean and what does the brown gradient represent.

An explanation along the following lines would benefit the reader: "For example, in Figure 3.8 a), a gray line is the intersection (?) of one of the hyperplane parameterized with (\Phi_0, \Phi_1, \Phi_2) with the (x_1, x_2) plane. The color of the image represents the value of the resulting linear combination of parameters and inputs with warmer colors corresponding to more positive values."

Some suggestions and doubts for Chapter 16

Version 2023-03-03
Page 306: "Consider applying a function $x = f[z, \phi]$ to a base density $Pr(z)$, where $z \in \mathbb{R}^D$ and $f[z,\phi]$ is a deep network": I suggest to rephrase as: "Consider applying a function $x = f[z, \phi]$ to a random variable $z \in \mathbb{R}^D$ with a known density $Pr(z)$ and $f[z,\phi]$ is a neural network"

Equation 16.8 --> It looks like the $-\log [Pr(z_i)]$ should be inside the square "]"
Equation 16.8 --> the term $\phi$ on argmax and argmin is missing
Figure 16.18 (a) --> it should be $f_2[h_1', \phi_2]$
Figure 16.18 (b) --> it should be $f_2[h_1', \phi_2]$
Equation 16.19 --> assuming right the equation 16.18, it should be $h_2 = f_2[h_1', \phi_2] - h_2'$
Equation 16.19 --> assuming right the equation 16.18, it should be $h_1 = f_1[h_2, \phi_1] - h_1'$

Before Equation 16.23 --> I would emphasize that the computation of the trace could be computationally expensive and for this reason we need an estimation that can be done using Hutchinson's trace estimator (https://people.cs.umass.edu/~cmusco/personal_site/pdfs/hutchplusplus50.pdf).

Figure 16.10, Is it possible to create also the figure for the normalizing direction? The description in the text is not so clear for me, In my understanding after the first inverse mapping, using $f_4^{-1}$, I can remove the last part that is already $z^4$ and continue with the remaining part. Am I wrong?

page 318 Footnote 2 --> Could you explain synthetically what do you mean with "better"? Do you mean that the quality of samples is not so good as in the other approaches?

Paragraph 16.5.3 --> The figure 16.4 and the text in §16.5.3 look not so coherent because they use $z$ and $x$ differently. Moreover figure doesn't use q. Is it intentional?

Equation 16.19 and Figure 16.8

Ver 8-3-2023

Please verify the Equation 16.19 and Figure 16.8

Figure 16.8
Figure 16.8 (a) --> it should be $f_2[h_1', \phi_2]$
Figure 16.8 (b) --> it should be $f_2[h_1', \phi_2]$

Equation 16.19
$h_2 = h_2' - f_2[h_1',\phi_2]$
$h_1 = h_1' - f_1[h_2,\phi_1]$

Some suggestions on Chapter 19

Version 06-02-2023

Eq, 19.12 -->I think it should be $r[s_t,a_t]$ and not $r[s,a]$

§19.4 --> Just a comment on the style "The principle of fitted Q-Learning is... . This is known as fitted Q-Learning" --> The repetition of "fitted Q-Learning" looks strange IMHO.

Fig. 19.2 --> "It does not slip on the ice and moves downward" I think that it should be: "It does slip on the ice and moves downward" instead to go left.

Eq. 19.15 : $\max_a [ q[s_{t+1},a_{t+1}] ]$ --> $\max_{a_{t+1}} [ q[s_{t+1},a_{t+1}] ]$. Please note that some authors, e.g. Sutton-Barto, use another formalism: $\max_a [ q[s_{t+1},a] ]$, where $a$ indicates a generic action. It is up to you decide which formalism to choose.

Fig 19.12 --> the same "problem" of Eq. 19.15

Eq. 19.16 --> the same "problem" of Eq. 19.15

Eq. 19.17 --> the same "problem" of Eq. 19.15

Text below Eq. 19.17 --> the same "problem" of Eq. 19.15

§19.4.1 (book page 394) : values $\phi^-)$ --> values $\phi^-$

Eq. 19.18 --> the same "problem" of Eq. 19.15

Eq. 19.19 --> the same "problem" of Eq. 19.15

Eq. 19.20 --> the same "problem" of Eq. 19.15 but for $\arg \max$

Eq. 19.21 --> the same "problem" of Eq. 19.15 but for $\arg \max$

Page 396: "DQN se deep networks" --> "DQNs use deep networks"

A few corrections and comments on chapter 17, version 06-02-23

Hello! I've made a pass through chapter 17—thanks again for the very informative write-up! Here are some small corrections and some questions (you can of course ignore these more general comments if you think they are too distracting from the main point of the book):

  • equation 17.2: $x$ should be bold, that is, $\mathbf{x}$
  • equation 17.7, first line: missing a vertical bar to denote conditioning on the parameters, that is, $Pr(\mathbf{x}, \mathbf{z} | \mathbf{Φ})$
  • figure 17.6, caption: "by either a) improving the ELBO" should be "by improving the ELBO either a) with respect to ... or b) with respect to ..."
  • figure 17.6: the caption says that "we get closer to the true log likelihood by improving the ELBO [...] with respect to the original parameters φ", but the figure on the right indicates that the gap increases. I would assume that the statement is correct (since, for example, diffusion models close the gap even if they use a fixed variational distribution), but that means that the figure is somewhat deceiving?!
  • figure 17.12: contains a self reference (that is, the caption mentions "figure17.12", which is incidentally also missing a space).
  • subsection 17.8.3 mentions "the product $Pr(\mathbf{z}|\mathbf{x})Pr(\mathbf{z})$": I was wondering is there a principled justification of multiplying the prior $P(\mathbf{z})$ with the posterior $P(\mathbf{z}|\mathbf{x})$? It seems a bit unnatural to me from a probabilistic point of view, but maybe this is just motivated from a more pragmatic perspective?!
  • subsection 17.8.4, equation (17.29):
    • the variable $\mathbf{z}$ seems to be unbound—where does it come from?
    • are there any properties particular properties that the functions $r_1$ and $r_2$ should have? Defining the regularization terms as arbitrary functions seemed a bit too general to me (e.g., they can negate their input argument or completely ignore it).
    • any intuition on why these two regularization terms encourage disentanglement? Are they related to beta-VAE and total correlation VAE? (It was not immediately clear to me the connection between those concepts.)
    • (minor) if $L_\mathrm{new}$ denotes a loss, maybe it would be a bit more precise to use a negative ELBO and positive regularizes?
  • page 352: Maybe the wording can be improved in the following: "[...] then autoencoder is just [...] PCA. Hence, the autoencoder is a generalization of PCA."; for example, by saying "Hence, a nonlinear autoencoder is a generalization of PCA," or something along these lines?
  • page 352, second paragraph on "Latent space, prior and posterior": there is an undefined reference
  • page 353, paragraph on "Posterior collapse", last sentence: there is an undefined reference
  • page 353, paragraph on "Other problems":
    • From the description the "information preference" problem seems very similar to posterior collapse. What is the difference between the two?
    • The paragraph cites Chen et al (2017) for InfoVAE, but I wonder whether the correct reference isn't

Zhao, Shengjia, Jiaming Song, and Stefano Ermon. "InfoVAE: Information maximizing variational autoencoders." AAAI (2017).

  • page 354, paragraph on "Disentangling latent representation": "e.g.," should probably be "etc."?
  • equation 17.35:
    • is the subscript $z$ correct in the expectation? There is no $z$ defined elsewhere.
    • on the left hand side of the equation "f" is written in roman typeface, while on the right it is written in italics; is this correct? I'm yet sure what is the convention for each of the two variants.

Miscellaneous minor typos

Page 252. There is an extra closing bracket ] in Eq. (13.4)

Page 292 (Sec 15.3) missing a full stop at the end of the sentence beginning with “Mini-batch discrimination ensures”

Page 318 (Sec 16.3.5) there is a sentence that reads “This requires a different formuation of normalizing flows that learns to from another function rather than a set of samples” – typo in formulation and “to from” should be just “from” I think.

Typo in figure 3.8

I believe in figure 3.8, there seems to be a small typo.

g-h) The clipped planes are then weighted

should be:

g-i) The clipped planes are then weighted

Minor issue on color of circles in Fig. 2.3

In caption of Fig. 2.3 "The three circles represent the three lines from figure 2.2b-d", the two circles related to 2.2b and 2.2d have the same color of the relative lines, but the circle related to 2.2c has a different color.

Missing "personal" in equation 12.15

The word "personal" is not present in any term of the factorization shown in 12.15.

Be careful that also the next paragraph (§12.7.2) does not consider "personal" word when it talks about the right context.

Possible typo in equation 5.31 (dx instead of dy)

Hey, it's Roy the undergraduate student from your previous LinkedIn post.

I think there is a typo (dx instead of dy) in equation 5.31 but I may be wrong.

image

And I would like to ask a small question, how did we get from the first expression in the equation to the second expression?

Thanks!

Other suggestions on chapter 19

Version 2023-02-02
Fig. 19.7: "Blue arrow" but the arrow is not blue
Eq. 19.7 --> I think that $\pi[a_t,s_t]$ should be substituted by $\pi[a_t|s_t]$
Eq. 19.11 --> I think it should be $\sum_{a_t}$ and not $\sum_{a}$
Eq, 19.12 -->I think it should be $\arg \max_{a_t}$ and not $\arg \max_{a}$; $r[s_t, a_t]$ should substitute $r[s, a]$

Suggestions on paragraph 19.5

Version 06-02-2023
Eq. 19.24: $\frac{\partial Pr(\tau_i | \theta)}{\partial \theta}$ --> $\frac{\partial Pr(\tau| \theta)}{\partial \theta}$

Second line of Eq. 19.25: The same problem of equation 19.24

Check the second line of eq. 19.25: it looks like there is a not necessary ']' and the $\pi$ should be bold

Check the eq. 19.31: it looks like there is a not necessary ']'

Page 112 code indentation

There is a small indentation issue on the latest version. The for i, data in enumerate(data_loader): statement should has the same indentation as epoch_loss = 0.0
Screenshot

Figure reference error & other minor errata, page 437 (v. 2023-03-03)

v. 2023-03-03
Page 437, Appendix C

Section C.3.1:

  • problem with the exponential function figure reference.
  • problem with the logarithm function figure reference.

Section C.3.2: About Stirling’s formula is still empty.

Section C.3.3:

  • Equation C.3 : Two different font styles for the function f without explanation (I assume the second one is the complex conjugate).
  • Missing space in "(i.e.the time lag)." should be "(i.e. the time lag)."

Minor typos in 10.2.5 on page 167 and the side column on page 176

The second line in the second paragraph on page 167: change "thes hidden units" to "the hidden units"?
Maybe change "Problems 10.17 - 10.16" to "Problems 10.16 - 10.17" on page 176?

Your book is a fantastic exposition of deep learning, certainly the best I have ever read on the subject. Thank you so much for your work!

errata_2

In page 423, the paragraph started by "Pruning can be considered a form of ..." may have a loss of symbol "(i)".

Minor corrections on chapter 18, version 31-01-23-C

Thanks for releasing the book! I've enjoyed going through it!

Here are some typos that I could spot while reading chapter 18:

  • page 361, line 1: missing a comma in the formula; should be $q(z_{1:t}, x)$
  • page 361, second paragraph: $x$ has an extra subscript, should probably be $q(z_t|x)$ instead of $q(z_t | x_t)$
  • equation 18.14: missing a left bracket
  • equations 18.16 (last line) and 18.26: $σ^2_t$ should probably be $σ^2_1$
  • equation 18.19: $Φ_{1...t}$ should probably be $Φ_{1...T}$
  • section 18.4 onwards: $z_{1:t}$ should probably be $z_{1:T}$
  • section 18.4 onwards: inconsistent indexing for $z$—previous material uses $z_{1:T}$, but here $z_{1‥.T}$ is used interchangeably
  • section 18.4.1, last paragraph: by "modifying itself" should we also understand "optimize the parameters" (as it mentioned in the second item), or is there a subtle difference?
  • section 18.4.2, first sentence: "minimize" should probably be "maximize"?
  • algorithm 18.1: what is $q$? I assume it has nothing to do with the variational distribution, right?
  • algorithm 18.1: I'm somewhat confused by the notation: $i$ and $\mathcal{B}$. I assume the former is the batched dataset? But then $i$ should range over batches?
  • algorithm 18.2, last line: $g_1$ should probably be $f_1$
  • section 18.6.1, last paragraph: shouldn't it say that $q(z_{t-1} | z_t)$ becomes closer to normally distributed with larger time steps?
  • section 18.6.1, last paragraph (minor): I don't know if T should be encolsed in a math environment, i.e., $T$
  • section 18.6.2, first sentence (minor): I don't know if you want to use the $t$ subscripts for the $α$'s (to match $z_t$)

The following questions might out of the scope of the book, but I do wonder:

  • How are the fixed parameters (β, T, σ) chosen in practice?
  • Can we (efficiently and accurately) estimate $p(x)$ for a given sample? This could be potentially useful for out-of-distribution detection. For example, I know that normalizing flows, while in principle could be used for estimating the model's probability, their performance usually suffers in practice [1]. For diffusion models, there is a discussion on this in section 2.3 of [2], but I'm not sure I follow their argument.

References:

  • [1] Kirichenko, P., Izmailov, P., & Wilson, A. G. (2020). Why normalizing flows fail to detect out-of-distribution data. NeurIPS, 33, 20578-20589.
  • [2] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June). Deep unsupervised learning using nonequilibrium thermodynamics. In ICML (pp. 2256-2265). PMLR.

Page 434 Figure B.3

The figure text '...is used to show that the Kullback-Leibler divergence is always greater than one'. The KL divergence should be always greater than zero.

Typo in Fig 19.3 and doubt on Fig 19.4

Version 31_01_23_C
In the caption of Fig 19.3 is indicated "blue arrows" but in the figure the arrows don't look blue.

In the Fig 19.4 The policy a) doesn't lead the agent to the reward as indicated in the caption ("This policy generally steers the penguin from top-left to bottom-right where the reward lies"). The action in state 4 and 8 should be modified in "down".

Also the policy b) for different initial states don't reach the goal. Is it the desired behavior ?

Expanding the explanation of backpropagation equations

Hey Prof. Simon, it's Roy Amoyal the undergraduate student again.

Something that I have noticed in our Deep Learning class at my university and in the book is that the backpropagation process is the most "mathematically challenging" for most of the students. Unfortunately, when most of the lecturers explain the backpropagation math process, they give up a reminder about the chain rule and a simple illustration using the neural network notations. While for some students it may be obvious, for some reason most of the students lost the lecturers in that part (including me)

Although you give a reference to the "Matrix calculus Appendix C.2" next to equation 7.6, I think something like the next equations can be quite helpful and even dramatically change the understanding of the backpropagation process before jumping into equation 7.6.

For example:
Equation 7.5:
image

Reminder of the chain rule (composition of 3 functions):
(self-note: I think it's better to remind with 3 functions instead of just 2)
If m(t)=(f∘g∘k)(t)=f(g(k(t))). then m′(x)=f′(g(k(x)))⋅g′(k(x))⋅k′(x).

Because (***self-note: here I think it is really important to explicitly write the equation with the substitutions ***):
(loss function) li = l[f3, yi] = l(b3 + Ω3⋅h3,yi) = l(b3 + Ω3⋅a[f2],yi) = l(b3 + Ω3⋅a[b2 + Ω2⋅h2],yi)
(substituting the expressions from equation 7.5) and because h3 is the inner function of f2 (h2(x)=a[x], and in this case x=f2 so h2=a[f2]), just like in the reminder, when k(t) is an inner function of t inside f and g (composition), we get:

image

Because we are doing the "backpropagation", we first calculate the derivative of the most "inner" function of the neural network, in our case h3 = a[f2] so we first calculate:
image
and then we keep calculating the expressions backward.

What do you think?
It could be really helpful to understand this topic better and faster.

Thanks, Roy.

Minor errata, pp. 56-60 & 426(v. 2023-03-03)

In v. 2023-03-03
Page 56, Loss functions:

  • For binary classification, I think " $y \in [0, 1]$ " should be " $y \in \{0, 1\}$ " because it's a set of two integers (or classes) not an interval of all possible real values between 0 and 1.
  • For multiclass classification, I think " $y \in [1, 2, ..., K]$ " should be " $y \in \{1, 2, ..., K\}$ " for the same reason

Page 57, figure 5.1 :

  • " $y \in \mathcal{R}$ " should be " $y \in \mathbb{R}$ "

Page 58, Sec. 5.1.1 :
I'm probably wrong about these, but :

  • "on the output domain $\boldsymbol{\mathrm{y}}$ ", "the prediction domain is $y \in \mathbb{R}$ " and " which is defined on $y \in \mathbb{R}$ " seem a little odd to me, because the definition domain is $\mathbb{R}$ (or $\mathbb{R}^D$ in the first one) not $y$ (or $\boldsymbol{\mathrm{y}}$). The only case where I would put $y$ is if I wrote it as a set $\{y \in \mathbb{R}\}$.
  • "The machine learning model $\boldsymbol{\mathrm{f}}[\boldsymbol{\mathrm{x}}, \boldsymbol{\phi}]$ ", are you always talking about a univariate problem ? If it's the case then it should be " $\mathrm{f}[\boldsymbol{\mathrm{x}}, \boldsymbol{\phi}]$ ".
  • In the footnote "As a function of $\phi$ " should be "As a function of $\psi$ "

Page 60, equation 5.5 :

  • I believe $\phi$ needs a hat $\hat{\phi}$ because it was estimated by minimizing the negative log likelihood (like in equation 5.12).

Page 62, equation 5.12 :

  • I believe $\mathrm{f}$ shouldn't be bold because it returns a scalar: $\mu$ or as explained later $\hat{y}$

Page 426, B.1.5 :

  • "If the value of the random variable variable $y$ "

Typo in Caption of Fig. 4.1

Version 2023_01_16, Fig 4.1 Caption:
"The first network maps inputs x ∈ [0,1] to outputs y ∈ [0,1]" --> "The first network maps inputs x ∈ [-1,1] to outputs y ∈ [-1,1]"

Typo in fig. 12.9

Version 2023_01_31

Fig 12.9: The vocabulary is indicate as $\Omega_v$ in the figure but in the text and in the caption $\Omega_e$ is used to refer to it.

Minor typo Ch9 Notes

UnderstandingDeepLearning_03_03_23_C.pdf Chapter 9 Page 157

The paragraph preceding equation 9.18 contains the text "the second term on the right-hand side must equal zero" but I think it was meant to say "the third term on the right-hand side must equal zero"

Thank you for writing this book!

Minor notation errata, pp. 421-422 (v. 2023-03-03)

In v. 2023-03-03
Page 421, Appendix A, Sets:

  • "The notation $\{\boldsymbol{\mathrm{x}}_i, \boldsymbol{\mathrm{y}}_i\} _{i=1}^I$ denotes the set of $I$ pairs $x_i, y_i$ ."
    Since x and y are vectors and not scalars I think it should be:
    "The notation $\{\boldsymbol{\mathrm{x}}_i, \boldsymbol{\mathrm{y}}_i\}^I _{i=1}$ denotes the set of $I$ pairs $\boldsymbol{\mathrm{x}}_i, \boldsymbol{\mathrm{y}}_i$."

  • In $\{1, 2, 3, ...,\}$ I would remove the last comma: $\{1, 2, 3, ...\}$

Page 422, Appendix A, Sets:

  • In $\{1, ... K\}$ I would add a comma just before $K$: $\{1, ..., K\}$

errata_1

In the "Notes" at page 403, there may be a grammatical error in the first sentence.

A typo

Fig 15.20: "Changing course styles" -> "Changing coarse styles" ? Same for "course" noise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.