Coder Social home page Coder Social logo

mml-book.github.io's People

Contributors

analogaldo avatar chengsoonong avatar gliptak avatar karsumit94 avatar madsjensen avatar mpd37 avatar sanket-kamthe avatar zoenolan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mml-book.github.io's Issues

Being more precise when giving an example of the Moore-Penrose pseudo-inverse.

Describe the mistake
The way this line is worded seems to suggest that the given Moore-Penrose pseudo-inverse can work for non-invertible square matrices.

Location
Please provide the

  1. version: Draft 2018-05-28
  2. chapter: 2
  3. page: 29
  4. line number/equation number: 870/2.46

Proposed solution
Perhaps give the mild assumptions required for 2.46 to hold. (The columns of the matrix must be linearly independent).

751 missing "denote"

end of sentence "use a bold letter to them," --> "use a bold letter to denote them,"

Wrtiting (A^T)^{-1} without stating that A^T is invertible.

Describe the mistake
While the line "If A is invertible (A^{-1})^T = (A^T)^{-1}" is certainly true, it would be useful to state somewhere that we know that (A^T)^{-1} exists.

Location

  1. version: 2018-05-28
  2. Chapter: 2
  3. page : 20
  4. line number/equation number: 763

Proposed solution
Replace the line with: "If A is invertible then so is A^T and (A^{-1})^T = (A^T)^{-1}"

V300118 L8383

Version 30/1/2018
Line 8383
There is a typo in the word misspelled

(Demo entry)

Batch sizes should not be tied to hardware constraints

Hi,

Firstly, I love the style of this book - very clear and precise. I'm a big fan of Strang's lectures too.

On page 175 you mention some hardware-oriented constraints that encourage the use of large batch sizes to optimise performance in SGD, but these are predicated upon the use of GPUs, which are not (for long) going to be the dominant form of processor for ML. New devices are being produced right now that don't have the memory bandwidth or data-path width issues of GPUs, and papers such as this one: https://arxiv.org/abs/1804.07612 show that small batches are better for a number of reasons.

I assume you don't want your book to be out of date, or bound to existing and dated hardware practices.

Will definitely be buying this book as a reference resource!

Cheers,

Chris

Line751: Word missing

Add "represent" or a similar word to the following phrase in line 751 "use a bold letter to them"

Missing line numbers, words and typos (pages 255-259, chapter 10)

Missing line numbers and words in chapter 10
Version number: Draft chapter (May 28, 2018)

  1. Many lines in chapter 10 PDF have line numbers missing. Examples:
    -See line number 4442-4443 (page 255)
    -See line number 4501-4502 (page 257)
    -See line number 4518-4519 (page 257-258)

  2. Also, some places words seem to be missing or have minor typos.

  • See line number 4480, page 256, the sentence "[...] corresponding to the training" is missing the word "data"?
  • See line number 4481, page 256, the sentence "Intuitively we imagine nice data for binary classification [...]" --> did you mean to use "nicely separable" here?, or it would be useful to let the reader know what "nice" means in this context.
    I haven't read the earlier chapters (came across this just today), so I apologize in advance if this has been explained before. In that case referring to it could still be useful.
  • See line number 4483, page 256, "[...] arranged in such as way as to allow [...]" --> "[...] arranged such a way as to allow [...]"
  1. See line number 4527, "[...] have a many possible classifiers." -- > "have many possible classifiers."

Thanks.

Missing exponent Equation 5.46

Describe the mistake
In Equation 5.46 the (2x+1) term should have the 3 exponent in the last two steps.

Location
Please provide the

  1. Version: Draft (2018-02-14)
  2. Chapter: 5
  3. Page: 122
  4. Equation: 5.46

before eq. 3.27 : The inner product of two functions ...

The defined article "The" implies that there are no other definitions for an inner product of functions. But there is for instance a more general form with a weighting function corresponding to a metric depending on x. Maybe "An inner product of two functions can be defined as ..." is more appropriate.

Is it possible to get an information theory chapter?

I appreciate that the authors are being careful about the scope of the book, but in my own studies of ML, I've noticed that basic information theory would go a long way in helping understand some important concepts. I'm thinking of KL-Divergence, information gain, AIC, cross entropy and other concepts which show up even in basic ML.

Some books which do introduce ML start talking about coding theory and communication channels which may have been what motivated information theory but seem like the wrong approach to teach information theory to data scientists or ML practitioners.

This paper takes an interesting approach, they start with KL Divergence before even entropy.
Divergence, Entropy, Information
https://arxiv.org/abs/1708.07459

line 1218: example seems to wrongly imply that independence is orthogonality

The London-Munich example seems to imply that orthogonal vectors are independent while a third (non-orthogonal) is not. Of course, any pair is independent here, and a third one makes it dependent. Maybe, use East and Southeast as the first pair and South as the third. (A more natural non-orthogonal coordinate system would be even better, but none comes to mind.)

Add a table of notations

From Hacker News:

Not sure if the authors will read this or not but I beg of you, please put a table of notation in the forward. The Sutton and Barto Reinforcement Learning book did that for basically every notation that wasn't basic algebra and it's been extremely helpful.
Just labeling things I had never seen before, like indicator functions, was extremely valuable.

Especially for this kind of book that is introducing mathematics to people from a broad background - I think it's important to understand how much of an impediment not knowing notation is by sight. Trying to Google or search for notation is a nightmare.

Cosmetic fix in Chapter 2: Figure 2.2

Describe the mistake
The used in between Linear/Affine Mapping overlaps with the box.

Location
Please provide the

  1. version (bottom of page)
  2. Chapter 2
  3. Page 17
  4. Figure 2.2

Proposed solution
Maybe positioning it to the center of the arrow would help

Minor wording issue in the second example in section 2.1.

Describe the mistake
The line: "Adding the first two equations yields (1) + (2) = 2x_1 + 3x_3 = 5"
feels a little off because the 2 uses of = here are fundamentally different.

Really, (1) + (2) = (2x_1 + 3x_3 = 5) but this bracketing could be confusing,

Location
Please provide the

  1. version: (2018-05-28)
  2. Chapter: 2
  3. page: 16
  4. line number/equation number : Below 701/2.4

Proposed solution
Reword the line to be: Adding the first two equations ((1) + (2)) yields 2x_1 + 3x_3 = 5
or: Adding the first two equations yields 2x_1 + 3x_3 = 5

Figure 10.11

Describe the mistake
A clear and concise description of what the mistake is.
The caption of Figure 10.11 is not visible

Location
Please provide the

  1. version (bottom of page)
    (2018-05-28
  2. Chapter
    10
  3. page
    275
  4. line number/equation number
    4762

Proposed solution
There must be a bug in the figure environment

Possible Misinterpretation in Uniqueness of solutions to Systems of Linear Equations

Verification by plugging in, is actually incorrect. The verification behind why this is a unique solution to the system of linear equations stems from the theorem: A system of linear equations, which is non-homogenous has a unique solution if and only if the determinant is non zero. Otherwise it has either no solutions or infinite solutions. It is also possible I misunderstood the "verify by plugging in" statement, is it indicative of both the uniqueness and existence, or only of the existence. This may just be my lack of understanding of the statement, but maybe it can be made clearer.

Location

  1. (2018-02-25)
  2. Chapter 2
  3. Page 18
  4. Line Number 834

Proposed solution
What about something along the lines of: "The existence of the solution is verified by plugging in the vector, and the uniqueness is a result of determinants (and provide a link/extra reading on the matter for those interested)?

Additional context
Determinants are a fairly useful concept when it comes to the understanding of basics of matrices, would it be valuable to have a mention of it in the chapter apart from the additional exercises presented at the end?

Cumulative feedback on first view

Would you mind using $\mathbb{R}$ for the blackboard R?
What about $\mathbb{I}$ for the identity matrix?
Use $\varnothing$ instead of $\emptyset$?
Also, isn't the gradient (column vector) the transposed of the Jacobian (row vector)?
What about having the differential operator d upright, and not like a variable?
Figure 5.6 uses bold upright font, instead of the italic one.
Coloured equations (like 5.189) have bad spacing when the colour is changed. Have a look at the "+" spacing.

Is it possible to have a look at the TikZ source? Do you use any GUI, or you code them up from scratch? Say, Figure 5.9, for example.

Figure 3.3: missing axis units

Both the explanation of figure 3.3 and the text after eq. 3.3 refer to norm 1. Adding axis units would convey the concept much clearer.

Wording suggests you're giving examples of groups as opposed to simply operations.

Describe the mistake

The wording in lines 920, 921 directly after the definition of a groups seems to suggests that R, N, Z are groups with \otimes = +, \cdot and P(B) is a group under \cap, \cup, .

Of those pairs, only (R, +) and (Z, +) actually form groups.

Location
Please provide the

  1. version (bottom of page): 2018-05-28
  2. Chapter: 2
  3. page: 31
  4. line number/equation number: 920 - 921

Proposed solution

Just delete these lines. You give examples of these things directly below anyway.

EM revisited - Chapter 12

Doesn't he M step maximize the expected joint likelihood p(x,z), where expectation is taken under the posterior distribution. 12.82 and following lines seems to suggest expected p(x/theta) is maximised?

[Chapter 5] Errata & feedback

Would you please find here below a list of what I think to be errors, and what I think could require modifications.

Errata:
Line 2716: "where we look" instead of "where look"
Line 2855 (+5): "the gradient, we compute" instead of "the gradient compute"
Eq. (5.86): Index over the sum should be N instead of D
Eq. (5.103): Index over the sum should be m instead of n
Line 2895: ", and every" instead of ", end every"
Line 2906: "a taste" instead of "a taster"
Eq. (5.117), (5.139), (5.146): Error of sign in the derivative of the square root function
Line 2920: "f_{i}(x_{i-1})" instead of "f_{i}(x)"
Line 2925: "j" instead of "i"
Eq. (5.142): The second element of the second term of the RHS should be the partial derivative of e with respect to c instead of the partial derivative of d with respect to c.
Eq. (5.184), (5.185): Not being very knowledgeable about it, it felt very counter-intuitive. I would have derived the equation (5.184) by applying on H the operator d\dx, but it seems the equation was derived by applying it from the inside. Though it does not modify the final result, it feels odd.

Feedback:
Eq. (5.98): Would have found it more intuitive to have a transpose of the zero vector right before the transpose of x in the RHS of the equation.
Line 2927: Notation seems ambiguous. Clarifying to make sure that people understands that \theta is the set of all A_{.} and b_{.} in contrast with \theta_{j} which contains only the associated A_{j} and b_{j} would be nice.

I did truly appreciate that chapter.

1:1 between any kind of vector and R^n

line 761: "There is a 1:1 correspondence between any kind of vector and R^n." Does kind here mean any dimension? Or any field? What about infinite dimensional vectors?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.