mml-book / mml-book.github.io Goto Github PK
View Code? Open in Web Editor NEWCompanion webpage to the book "Mathematics For Machine Learning"
Companion webpage to the book "Mathematics For Machine Learning"
There are m rows in A (not k).
Eq. 2.12 uses other dimensions than in line 844. Does not seem necessary to me. Shouldn't the dimensions be consistent here?
Describe the mistake
The way this line is worded seems to suggest that the given Moore-Penrose pseudo-inverse can work for non-invertible square matrices.
Location
Please provide the
Proposed solution
Perhaps give the mild assumptions required for 2.46 to hold. (The columns of the matrix must be linearly independent).
end of sentence "use a bold letter to them," --> "use a bold letter to denote them,"
Describe the mistake
While the line "If A is invertible (A^{-1})^T = (A^T)^{-1}" is certainly true, it would be useful to state somewhere that we know that (A^T)^{-1} exists.
Location
Proposed solution
Replace the line with: "If A is invertible then so is A^T and (A^{-1})^T = (A^T)^{-1}"
Why are not anti-parallel vectors most dissimilar?
In chapter 5, page 118, line 2732: univariate is spelled incorrectly as "Univeriate"
Version 30/1/2018
Line 8383
There is a typo in the word misspelled
(Demo entry)
Hi,
Firstly, I love the style of this book - very clear and precise. I'm a big fan of Strang's lectures too.
On page 175 you mention some hardware-oriented constraints that encourage the use of large batch sizes to optimise performance in SGD, but these are predicated upon the use of GPUs, which are not (for long) going to be the dominant form of processor for ML. New devices are being produced right now that don't have the memory bandwidth or data-path width issues of GPUs, and papers such as this one: https://arxiv.org/abs/1804.07612 show that small batches are better for a number of reasons.
I assume you don't want your book to be out of date, or bound to existing and dated hardware practices.
Will definitely be buying this book as a reference resource!
Cheers,
Chris
Add "represent" or a similar word to the following phrase in line 751 "use a bold letter to them"
Missing line numbers and words in chapter 10
Version number: Draft chapter (May 28, 2018)
Many lines in chapter 10 PDF have line numbers missing. Examples:
-See line number 4442-4443 (page 255)
-See line number 4501-4502 (page 257)
-See line number 4518-4519 (page 257-258)
Also, some places words seem to be missing or have minor typos.
Thanks.
Describe the mistake
In Equation 5.46 the (2x+1) term should have the 3 exponent in the last two steps.
Location
Please provide the
The defined article "The" implies that there are no other definitions for an inner product of functions. But there is for instance a more general form with a weighting function corresponding to a metric depending on x. Maybe "An inner product of two functions can be defined as ..." is more appropriate.
I appreciate that the authors are being careful about the scope of the book, but in my own studies of ML, I've noticed that basic information theory would go a long way in helping understand some important concepts. I'm thinking of KL-Divergence, information gain, AIC, cross entropy and other concepts which show up even in basic ML.
Some books which do introduce ML start talking about coding theory and communication channels which may have been what motivated information theory but seem like the wrong approach to teach information theory to data scientists or ML practitioners.
This paper takes an interesting approach, they start with KL Divergence before even entropy.
Divergence, Entropy, Information
https://arxiv.org/abs/1708.07459
I don't see the purpose of the additional complexity by mentioning subspaces here.
The London-Munich example seems to imply that orthogonal vectors are independent while a third (non-orthogonal) is not. Of course, any pair is independent here, and a third one makes it dependent. Maybe, use East and Southeast as the first pair and South as the third. (A more natural non-orthogonal coordinate system would be even better, but none comes to mind.)
It should be (8, 2, -1, 0) instrad of (8, 12, -1, 0).
From Hacker News:
Not sure if the authors will read this or not but I beg of you, please put a table of notation in the forward. The Sutton and Barto Reinforcement Learning book did that for basically every notation that wasn't basic algebra and it's been extremely helpful.
Just labeling things I had never seen before, like indicator functions, was extremely valuable.Especially for this kind of book that is introducing mathematics to people from a broad background - I think it's important to understand how much of an impediment not knowing notation is by sight. Trying to Google or search for notation is a nightmare.
Describe the mistake
The used in between Linear/Affine Mapping overlaps with the box.
Location
Please provide the
Proposed solution
Maybe positioning it to the center of the arrow would help
Describe the mistake
The line: "Adding the first two equations yields (1) + (2) = 2x_1 + 3x_3 = 5"
feels a little off because the 2 uses of = here are fundamentally different.
Really, (1) + (2) = (2x_1 + 3x_3 = 5) but this bracketing could be confusing,
Location
Please provide the
Proposed solution
Reword the line to be: Adding the first two equations ((1) + (2)) yields 2x_1 + 3x_3 = 5
or: Adding the first two equations yields 2x_1 + 3x_3 = 5
Describe the mistake
A clear and concise description of what the mistake is.
The caption of Figure 10.11 is not visible
Location
Please provide the
Proposed solution
There must be a bug in the figure environment
Verification by plugging in, is actually incorrect. The verification behind why this is a unique solution to the system of linear equations stems from the theorem: A system of linear equations, which is non-homogenous has a unique solution if and only if the determinant is non zero. Otherwise it has either no solutions or infinite solutions. It is also possible I misunderstood the "verify by plugging in" statement, is it indicative of both the uniqueness and existence, or only of the existence. This may just be my lack of understanding of the statement, but maybe it can be made clearer.
Location
Proposed solution
What about something along the lines of: "The existence of the solution is verified by plugging in the vector, and the uniqueness is a result of determinants (and provide a link/extra reading on the matter for those interested)?
Additional context
Determinants are a fairly useful concept when it comes to the understanding of basics of matrices, would it be valuable to have a mention of it in the chapter apart from the additional exercises presented at the end?
Would you mind using
What about
Use
Also, isn't the gradient (column vector) the transposed of the Jacobian (row vector)?
What about having the differential operator d upright, and not like a variable?
Figure 5.6 uses bold upright font, instead of the italic one.
Coloured equations (like 5.189) have bad spacing when the colour is changed. Have a look at the "+" spacing.
Is it possible to have a look at the TikZ source? Do you use any GUI, or you code them up from scratch? Say, Figure 5.9, for example.
Unless it is for educational purposes (to force the reader to really parse the equations carefully), the use of Greek letters for different purposes may be confusing. Can one use \Psi for instance?
For which space is A in eq 2.67 not a generating set? Eq 2.65 and 2.66 deal with R^3.
There is no precise explanation that the vector space operations + and \cdot are inherited from space V.
Furthermore, the term "R-vector space" seems to be not defined.
Both the explanation of figure 3.3 and the text after eq. 3.3 refer to norm 1. Adding axis units would convey the concept much clearer.
"The intersection of all subspaces U_i โ V is called linear hull of V."
As the empty set is a subspace of V, the intersection is always the empty set.
The linear hull (linear span) is defined for a set of vectors.
Describe the mistake
The wording in lines 920, 921 directly after the definition of a groups seems to suggests that R, N, Z are groups with \otimes = +, \cdot and P(B) is a group under \cap, \cup, .
Of those pairs, only (R, +) and (Z, +) actually form groups.
Location
Please provide the
Proposed solution
Just delete these lines. You give examples of these things directly below anyway.
Doesn't he M step maximize the expected joint likelihood p(x,z), where expectation is taken under the posterior distribution. 12.82 and following lines seems to suggest expected p(x/theta) is maximised?
three lines above:
homomorphism
Would you please find here below a list of what I think to be errors, and what I think could require modifications.
Errata:
Line 2716: "where we look" instead of "where look"
Line 2855 (+5): "the gradient, we compute" instead of "the gradient compute"
Eq. (5.86): Index over the sum should be N instead of D
Eq. (5.103): Index over the sum should be m instead of n
Line 2895: ", and every" instead of ", end every"
Line 2906: "a taste" instead of "a taster"
Eq. (5.117), (5.139), (5.146): Error of sign in the derivative of the square root function
Line 2920: "f_{i}(x_{i-1})" instead of "f_{i}(x)"
Line 2925: "j" instead of "i"
Eq. (5.142): The second element of the second term of the RHS should be the partial derivative of e with respect to c instead of the partial derivative of d with respect to c.
Eq. (5.184), (5.185): Not being very knowledgeable about it, it felt very counter-intuitive. I would have derived the equation (5.184) by applying on H the operator d\dx, but it seems the equation was derived by applying it from the inside. Though it does not modify the final result, it feels odd.
Feedback:
Eq. (5.98): Would have found it more intuitive to have a transpose of the zero vector right before the transpose of x in the RHS of the equation.
Line 2927: Notation seems ambiguous. Clarifying to make sure that people understands that \theta is the set of all A_{.} and b_{.} in contrast with \theta_{j} which contains only the associated A_{j} and b_{j} would be nice.
I did truly appreciate that chapter.
When am I going to be able to buy and review this book?
Also the link https://www.mml-book.com/ gives me an a "site cannot be reached" error ...
Rationale: Scaling is still a binary relation (as opposed to the more general checks starting from line 1260) and is almost as easy to check as identity.
According to eq. 3.11 the inner product <x, y> (LHS) is defined by both the co-ordinates of x and y (\hat{x} and \hat{y}) and A (RHS).
line 761: "There is a 1:1 correspondence between any kind of vector and R^n." Does kind here mean any dimension? Or any field? What about infinite dimensional vectors?
"Tuples" is written as "tupels".
Version: Draft (2018-02-25)
Chapter 2, Page 46, Line 1399
Inversion is missplleed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.