Coder Social home page Coder Social logo

prml-solution-manual's People

Contributors

zhengqigao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prml-solution-manual's Issues

Some issue about Problem 4.26 solution

Hi, Zhengqi, as usual, I learnt a lot from your solution, like the one to exercise 4.26 which uses a different approach, simple and elegant compared to taking differentiation and completing a square as hinted. But there are some problems in the exposition, which I will amend as follows.

First, a small suggestion for $P(X\le x)=\Phi\left(\frac{x-\mu}{\sigma}\right)$ if $X\sim\mathcal N(X|\mu,\sigma^2)$. We can arrive at this statement directly by properties of random variable in theory of probability, instead of integrals and change of variable. This can be seen by noting that $\frac{X-\mu}{\sigma}$ conforms to standard Gaussian distribution. As a result, $P(X\le x)=P(\frac{X-\mu}{\sigma}\le\frac{x-\mu}{\sigma})$ which is equal to $\Phi\left(\frac{x-\mu}{\sigma}\right)$ by definition of $\Phi$.

Then come the major problem of the solution where Bayesian Formula is applied. After defining two auxiliary random variables $X\sim\mathcal N(0,\lambda^{-2})$ and $Y\sim\mathcal N(\mu,\sigma^2)$, we need to further assume that these two random variables are independent. Otherwise, the equation $$P(X\le Y|Y=a)=P(X\le a)$$ would be dubious. This is because the left hand is actually a probability based on conditional pdf $p_{X|Y}(x|y)$ while the right hand is a probability based on marginal pdf $p_X(x)$. They are in general not equal. In other words, if $X$ and $Y$ are not uncorrelated, given $Y=a$, $X$ is another (Gaussian) distribution whose parameters (mean and covariance) are not necessarily still 0 and $\lambda^{-2}$, respectively. Consequently, the above equation does not hold in general. We have only the latter part of the line: $P(X\le a)=\Phi(\lambda a)$. Such an independence assumption is valid because we can make $X$ and $Y$ independent by linear transformations if they are not.

Next, we use multiple integral to write $$P(X\le Y)=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x,y)dx\right)dy=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x|y)p(y)dx\right)dy=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x|y)dx\right)p(y)dy.$$

Since $X$ and $Y$ are independent, we have $p(x|y)=p(x)$. So, the internal integral reduces to $$\int_{-\infty}^yp(x|y)dx=\int_{-\infty}^yp(x)dx=P(X\le y)=\Phi(\lambda y),$$ and in turn the above integral is $$P(X\le Y)=\int_{-\infty}^{+\infty}\Phi(\lambda y)p(y)dy=\int_{-\infty}^{+\infty}\Phi(\lambda y)\mathcal N(y|\mu,\sigma^2)dy.$$ Changing integral variable from $y$ to $a$, we will obtain the left side of equation (4.152).

For the right side of equation (4.152), it should be noted in particular that $X-Y$ is Gaussian only if $X$ and $Y$ are independent. See, e.g., this StackExchange thread. This is the second place to demonstrate the importance of independence assumption. Under this assumption, the argument of the original solution follows and we get $P(X-Y\le0)=\Phi\left(\frac{\mu}{(\lambda^{-2}+\sigma^2)^{1/2}}\right)$, which concludes the whole proof.

Problem 4.1

In the first part of the solution of problem 4.1, you have showed "they are not linearly separable if their convex hulls intersect," which means (intersection) -> (no linearly separability).
By contrapositive, that is same as (linearly separability) -> (no intersection).
Then in the second part, you have "let’s assume they are linearly separable and try to prove their convex hulls don’t intersect," which is exactly what you did in the first part.
So you're actually missing the part (no intersection) -> (linearly separability).

Issues about Exercise 7.19

Hi, Zhengqi. In solution 7.19, the derivative of the first term in the marginal likelihood with respect to alpha seems not zero (although the value is zero at w*), and this term should also be considered.

Problem 2.2

Hi, this is great work.

I think the variance in Problem 2.2 is incorrect though. Should be 1 - mu^2 instead of (1 - mu)^2.

Minor issue in Solution 10.26

Thank you for sharing your solutions! I think there's a minor issue in your solution to Exercise 10.26.
Shouldn't the result for $-\mathbb{E}[ln q^*(\beta)]$ be negated? Essentially, it is analogous to Equation 10.112 in the book.

error problem 3.13

The last line of p(t|X, T) is inconsistent with (2.160), for the parameter in Gamma distribution of (2.160) should be exactly the same.
Therefore the right solution for lambda should be an/bn/(1+phi'Snphi)

Minor issue in 2.56 part b

for Gam function
h(x) function should only depend on x but in the solution it also depends on eta1.
Alternative solution,
taking x^eta1 as exp(eta1*ln(x)) we get
eta = same
h(x) = x^-1
g(eta) = same
u(x) = [ln(x) -1]

Minor Issue in Problem 10.27 Solution

At the second term of the lower bound $\mathbb{E}[\ln p(\boldsymbol{w}|\alpha)]_{\boldsymbol{w},\alpha}$ , the last term should be

$$ \mathbb{E}[\boldsymbol{w}^{\text{T}} \boldsymbol{w}]_{\boldsymbol{w}} $$

image

Error in Problem 1.7

In page 4 the second line, the integrand should be $\exp(\frac{r^2}{2\sigma^2})rdr$ instead of $\exp(\frac{1}{2\sigma^2})rdr$, the $r^2$ term is missing.

Problem 1.20

my solution to 1.20 is also the same as yours. I believe the author made a mistake in expanding the taylor series. Plus I think he should use o() notation instead of O() for the taylor

Problem 6.1

image

About problem 6.1, is this really a linear combination? I do not really understand what is happening here. It seems you are using variables as the denominator of the combination coefficients.

Clarification in solution 2.36

The first term in the solution will not simply so easily, because we are doing summation(xn-mu(N))^2 and not summation(xn-mu(N-1))^2. Using sequential expansion for mu(N) = mu(N-1) + (xn - mu(N-1))/N I'm getting the first term as (N-1)^3/N^3 * sigma^2(N-1). Can you please verify this?

Normalization constant in Problem 11.06

Hey,

I was kinda confused reading your solution, so I referenced Bishops solutions,
turns out it was the normalization constant which threw me off:

Bishop uses
image
image

Where you put the Z_p into the denominator, which is in the numerator of
the fraction for Bishop's typical notation
image

In Bishops Solution, it then follows

image
image

I just wanted to leave this here for future reference.
Awesome collection of solutions by the way, thanks a lot!

Problem1.2

The final equation seems not correct. The j=i will happen N times, so the equation will became

A typo in problem 1.26

image
I think the equation at the bottom should be:
image
Alternatively, it also can be derived as:
image
which coincide with the errata:
image

Exc. 1.1

Hi! This is more of a question (albeit perhaps a stupid one), but I do not understand how the derivative of y(x_n,w) in exercise 1.1 is simply (x_n)^i. I see that if we take the derivative with respect to the weights {w} of y(x_n,w), it is \sum_{j=1}^{M} (x_n)^j ? How does that turn into (x_n)^i? If you could elaborate on that step, It would be much appreciated!

Thank you!

Small detail on problem 9.6

I believe that in the denominator of the final expression of the gradient the two sumatories simplify to N.

Some issue about Problem 4.6 solution

Hi, Zhengqi, I found an issue about Problem 4.6 solution.

At the bottom of page 93, the left side of the equation we need to prove is
$\sum\limits_{n=1}^N({\bf x}_n{\bf x}_n^T)-N{\bf mm}^T.$

For the second term, it is a product of a column vector and a row vector, resulting in a $D\times D$ square matrix. As a result, we cannot write it simply as a scalar $(\ldots)^2$. Instead, we should expand it as is:
$-N{\bf mm}^T= -N\bigl(\frac{1}{N}(N_1{\bf m}_1+N_2{\bf m}_2)\frac{1}{N}(N_1{\bf m}_1^T+N_2{\bf m}_2^T)\bigr)$
$=-\frac{1}{N}(N_1^2{\bf m}_1{\bf m}_1^T+N_1N_2{\bf m}_1{\bf m}_2^T+N_1N_2{\bf m}_2{\bf m}_1^T+N_2^2{\bf m}_2{\bf m}_2^T).\quad \quad \quad \quad\quad \quad \quad \quad(1)$
Note that the middle two terms cannot be merged because they are not equal.

Next let us go from the known result ${\bf S}_ W+\frac{N_1N_2}{N}{\bf S}_ B$. If we plug in definition (4.28) and expand it, we have
${\bf S}_ W=\sum\limits_{n\in\mathcal C_1}({\bf x}_ n{\bf x}_ n^T-{\bf x}_ n{\bf m}_ 1^T-{\bf m}_ 1{\bf x}_ n^T+{\bf m}_ 1{\bf m}_ 1^T)+\sum\limits_{n\in\mathcal C_2}({\bf x}_ n{\bf x}_ n^T-{\bf x}_ n{\bf m}_ 2^T-{\bf m}_ 2{\bf x}_ n^T+{\bf m}_ 2{\bf m}_ 2^T)$
$=\sum\limits_{n=1}^N{\bf x}_ n{\bf x}_ n^T -\left(\sum\limits_{n\in\mathcal C_1}{\bf x}_ n\right){\bf m}_ 1^T-{\bf m}_ 1\left(\sum\limits_{n\in\mathcal C_1}{\bf x}_ n^T\right)-\left(\sum\limits_{n\in\mathcal C_2}{\bf x}_ n\right){\bf m}_ 2^T-{\bf m}_ 2\left(\sum\limits_{n\in\mathcal C_2}{\bf x}_ n^T\right)+N_1{\bf m}_1{\bf m}_1^T+N_2{\bf m}_2{\bf m}_2^T.$

Since equation (1) contains only ${\bf m}_ 1$ and ${\bf m}_ 2$, we substitute $\sum\limits_{n\in\mathcal C_1} {\bf x}_ n= N_1{\bf m}_ 1$ and $\sum\limits_{n\in\mathcal C_2}{\bf x}_ n=N_2{\bf m}_ 2$ for the corresponding sums in the last expression, and we can cancel some terms to get
$\sum\limits_{n\in\mathcal {C_1}}{\bf x}_n{\bf x}_n^T-N_1{\bf m}_1{\bf m}_1^T-N_2{\bf m}_2{\bf m}_2^T.\quad \quad \quad \quad\quad \quad \quad \quad\quad \quad \quad \quad\quad \quad \quad \quad(2)$

Treating $\frac{N_1N_2}{N}{\bf S}_ B$ similarly, we get
$\frac{N_1N_2}{N}{\bf S}_ B=\frac{N_1N_2}{N}({\bf m}_2{\bf m}_2^T-{\bf m}_2{\bf m}_1^T-{\bf m}_1{\bf m}_2^T+{\bf m}_1{\bf m}_1^T).\quad \quad \quad \quad\quad \quad \quad \quad(3)$
Likewise, the middle two terms cannot be merged.

Adding (2) and (3) and merging the same terms, we'll get exactly the same expression as (1), except the additional $\sum\limits_{n=1}^N({\bf x}_n{\bf x}_n^T)$ that appears as the first term on the left side of the equation at the bottom of page 93 that we proposed to prove.

Problem 3.2

It seems that the proof of non-singularity is invalid. In fact, matrix ΦTΦ can be singular

quote from the 3.1.2

In practice, a direct solution of the normal equations can lead to numerical difficulties when ΦTΦ is close to singular

Your proof of non-singularity is no-good since you suppose that φ1...φM together are the basis of subspace which is spanned by Φ's columns. However, there is no reason to suppose this: they all can be collinear for instance, then any one of them will represent the basis of the subspace.

There is no need to proof that ΦTΦ is invertible anyway since its inverse is the part of the matrix which is given as an existing object in the task condition; (this implies (ΦTΦ)-1 exists).

Problem 2.58

I think the mistake in 2.58 is caused by the line under
"If we multiply both sides by -1/g(n), we can obtain"

The left hand side should expand to
−∇∇ln g(η)−E[u(x)]E[u(x)T]

instead of
−∇∇ln g(η)
I've demonstrated this here

Which causes the final solution to match the answer given in the book

Thanks for doing this by the way! Such a great resource

Minor Issue in Problem 1.12 Solution

Hello Zhengqi,
first thanks for the publication of the solution manual!

Starting after the paragraph

For E[σ2 ML ], we need to take advantage of (1.56) and what has been given in the problem :
I believe that in the solution to problem 1.12 the \( 1/N \) went missing in the fifth and sixth line of the equations in the last expression with expecation of µ^2_{ML}

The seventh line is correct again.

Keep up the great work!

Greetings
Markus

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.