zhengqigao / prml-solution-manual Goto Github PK

View Code? Open in Web Editor NEW

947.0 947.0 126.0 14.64 MB

My Own Solution Manual of PRML

prml-solution-manual's People

Contributors

Stargazers

Watchers

Forkers

conghannn adebayoj vr3d malithakabir arpit23697 sausax fauzhao leoyouli jizhihang ghzhangnj ythwilam mrhwc directorscut82 jiamim jdc08161063 cyinseu allensmile goodfellowlee uraboer devopsmi hurmean yuanqin27 chensi01 dreadlord1984 sonarahbar snowdj xuerenjie124 yash97 fatnerdfy asher00017 arsming276 transformerh kvnkuang ruidbras tongtong0402 chisness evanmu96 smith6036 aneeshks felixdae n0p2 ihaeyong bossm0n5t3r kagruber2412 gdxie1 pathfinder5 xiaoanshi speedyidea kutim kgulzina fanhuang321 givantsai dark-flash srikanthlakkoju xhyuo chittella369 hoangnguyen7699 metabolic-pathway emasa willjohn en-jai-neer gary3321 noviceyin jc5005 maj-biostat fabianfalck pratikchhapolika nm-f saransh09 sean-opensource higuseonhye dennistang742 sandy4321 deephaejoong namhar lnj0532 pavlov-s-ai xue49 binhudakhalid interactivetech greysr bassndao-research-fork khyoon1005 prithvi-shah iamfaith fish258 hjljhl danrong430 yanggis aboubakressadiq-redjil anlijuncn roboticsyimingli alierkan dragon-dane kkmumu deltadedirac au-maleci sinenne qqamber maxcopeland

prml-solution-manual's Issues

Some issue about Problem 4.26 solution

Hi, Zhengqi, as usual, I learnt a lot from your solution, like the one to exercise 4.26 which uses a different approach, simple and elegant compared to taking differentiation and completing a square as hinted. But there are some problems in the exposition, which I will amend as follows.

First, a small suggestion for $P(X\le x)=\Phi\left(\frac{x-\mu}{\sigma}\right)$ if $X\sim\mathcal N(X|\mu,\sigma^2)$. We can arrive at this statement directly by properties of random variable in theory of probability, instead of integrals and change of variable. This can be seen by noting that $\frac{X-\mu}{\sigma}$ conforms to standard Gaussian distribution. As a result, $P(X\le x)=P(\frac{X-\mu}{\sigma}\le\frac{x-\mu}{\sigma})$ which is equal to $\Phi\left(\frac{x-\mu}{\sigma}\right)$ by definition of $\Phi$.

Then come the major problem of the solution where Bayesian Formula is applied. After defining two auxiliary random variables $X\sim\mathcal N(0,\lambda^{-2})$ and $Y\sim\mathcal N(\mu,\sigma^2)$, we need to further assume that these two random variables are independent. Otherwise, the equation $$P(X\le Y|Y=a)=P(X\le a)$$ would be dubious. This is because the left hand is actually a probability based on conditional pdf $p_{X|Y}(x|y)$ while the right hand is a probability based on marginal pdf $p_X(x)$. They are in general not equal. In other words, if $X$ and $Y$ are not uncorrelated, given $Y=a$, $X$ is another (Gaussian) distribution whose parameters (mean and covariance) are not necessarily still 0 and $\lambda^{-2}$, respectively. Consequently, the above equation does not hold in general. We have only the latter part of the line: $P(X\le a)=\Phi(\lambda a)$. Such an independence assumption is valid because we can make $X$ and $Y$ independent by linear transformations if they are not.

Next, we use multiple integral to write $$P(X\le Y)=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x,y)dx\right)dy=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x|y)p(y)dx\right)dy=\int_{-\infty}^{+\infty}\left(\int_{-\infty}^yp(x|y)dx\right)p(y)dy.$$

Since $X$ and $Y$ are independent, we have $p(x|y)=p(x)$. So, the internal integral reduces to $$\int_{-\infty}^yp(x|y)dx=\int_{-\infty}^yp(x)dx=P(X\le y)=\Phi(\lambda y),$$ and in turn the above integral is $$P(X\le Y)=\int_{-\infty}^{+\infty}\Phi(\lambda y)p(y)dy=\int_{-\infty}^{+\infty}\Phi(\lambda y)\mathcal N(y|\mu,\sigma^2)dy.$$ Changing integral variable from $y$ to $a$, we will obtain the left side of equation (4.152).

For the right side of equation (4.152), it should be noted in particular that $X-Y$ is Gaussian only if $X$ and $Y$ are independent. See, e.g., this StackExchange thread. This is the second place to demonstrate the importance of independence assumption. Under this assumption, the argument of the original solution follows and we get $P(X-Y\le0)=\Phi\left(\frac{\mu}{(\lambda^{-2}+\sigma^2)^{1/2}}\right)$, which concludes the whole proof.

Problem 4.1

In the first part of the solution of problem 4.1, you have showed "they are not linearly separable if their convex hulls intersect," which means (intersection) -> (no linearly separability).
By contrapositive, that is same as (linearly separability) -> (no intersection).
Then in the second part, you have "let’s assume they are linearly separable and try to prove their convex hulls don’t intersect," which is exactly what you did in the first part.
So you're actually missing the part (no intersection) -> (linearly separability).

Issues about Exercise 7.19

Hi, Zhengqi. In solution 7.19, the derivative of the first term in the marginal likelihood with respect to alpha seems not zero (although the value is zero at w*), and this term should also be considered.

Problem 2.2

Hi, this is great work.

I think the variance in Problem 2.2 is incorrect though. Should be 1 - mu^2 instead of (1 - mu)^2.

Minor issue in Solution 10.26

Thank you for sharing your solutions! I think there's a minor issue in your solution to Exercise 10.26.
Shouldn't the result for $-\mathbb{E}[ln q^*(\beta)]$ be negated? Essentially, it is analogous to Equation 10.112 in the book.

error problem 3.13

The last line of p(t|X, T) is inconsistent with (2.160), for the parameter in Gamma distribution of (2.160) should be exactly the same.
Therefore the right solution for lambda should be an/bn/(1+phi'Snphi)

Minor issue in 2.56 part b

for Gam function
h(x) function should only depend on x but in the solution it also depends on eta1.
Alternative solution,
taking x^eta1 as exp(eta1*ln(x)) we get
eta = same
h(x) = x^-1
g(eta) = same
u(x) = [ln(x) -1]

forgot log in problem 4.11

The final equation is missing a log:
a_k = \sum \sum \phi_ml \log(\mu_kl) + \log ( p( C_k))

Minor Issue in Problem 10.27 Solution

At the second term of the lower bound $\mathbb{E}[\ln p(\boldsymbol{w}|\alpha)]_{\boldsymbol{w},\alpha}$ , the last term should be

$$ \mathbb{E}[\boldsymbol{w}^{\text{T}} \boldsymbol{w}]_{\boldsymbol{w}} $$

Error in Problem 1.7

In page 4 the second line, the integrand should be $\exp(\frac{r^2}{2\sigma^2})rdr$ instead of $\exp(\frac{1}{2\sigma^2})rdr$, the $r^2$ term is missing.

Problem 1.20

my solution to 1.20 is also the same as yours. I believe the author made a mistake in expanding the taylor series. Plus I think he should use o() notation instead of O() for the taylor

Problem 6.1

About problem 6.1, is this really a linear combination? I do not really understand what is happening here. It seems you are using variables as the denominator of the combination coefficients.

Clarification in solution 2.36

The first term in the solution will not simply so easily, because we are doing summation(xn-mu(N))^2 and not summation(xn-mu(N-1))^2. Using sequential expansion for mu(N) = mu(N-1) + (xn - mu(N-1))/N I'm getting the first term as (N-1)^3/N^3 * sigma^2(N-1). Can you please verify this?

Normalization constant in Problem 11.06

Hey,

I was kinda confused reading your solution, so I referenced Bishops solutions,
turns out it was the normalization constant which threw me off:

Bishop uses

Where you put the Z_p into the denominator, which is in the numerator of
the fraction for Bishop's typical notation

In Bishops Solution, it then follows

I just wanted to leave this here for future reference.
Awesome collection of solutions by the way, thanks a lot!

Problem1.2

The final equation seems not correct. The j=i will happen N times, so the equation will became

A typo in problem 1.26

I think the equation at the bottom should be:

Alternatively, it also can be derived as:

which coincide with the errata:

Exc. 1.1

Hi! This is more of a question (albeit perhaps a stupid one), but I do not understand how the derivative of y(x_n,w) in exercise 1.1 is simply (x_n)^i. I see that if we take the derivative with respect to the weights {w} of y(x_n,w), it is \sum_{j=1}^{M} (x_n)^j ? How does that turn into (x_n)^i? If you could elaborate on that step, It would be much appreciated!

Thank you!

Small detail on problem 9.6

I believe that in the denominator of the final expression of the gradient the two sumatories simplify to N.

Minor error in Problem 10.25 Solution

In page 227, I think it should be

Similar correction in the expression for ln q*(w).

Some issue about Problem 4.6 solution

Hi, Zhengqi, I found an issue about Problem 4.6 solution.

At the bottom of page 93, the left side of the equation we need to prove is
$\sum\limits_{n=1}^N({\bf x}_n{\bf x}_n^T)-N{\bf mm}^T.$

For the second term, it is a product of a column vector and a row vector, resulting in a $D\times D$ square matrix. As a result, we cannot write it simply as a scalar $(\ldots)^2$. Instead, we should expand it as is:
$-N{\bf mm}^T= -N\bigl(\frac{1}{N}(N_1{\bf m}_1+N_2{\bf m}_2)\frac{1}{N}(N_1{\bf m}_1^T+N_2{\bf m}_2^T)\bigr)$
$=-\frac{1}{N}(N_1^2{\bf m}_1{\bf m}_1^T+N_1N_2{\bf m}_1{\bf m}_2^T+N_1N_2{\bf m}_2{\bf m}_1^T+N_2^2{\bf m}_2{\bf m}_2^T).\quad \quad \quad \quad\quad \quad \quad \quad(1)$
Note that the middle two terms cannot be merged because they are not equal.

Next let us go from the known result ${\bf S}_ W+\frac{N_1N_2}{N}{\bf S}_ B$. If we plug in definition (4.28) and expand it, we have
${\bf S}_ W=\sum\limits_{n\in\mathcal C_1}({\bf x}_ n{\bf x}_ n^T-{\bf x}_ n{\bf m}_ 1^T-{\bf m}_ 1{\bf x}_ n^T+{\bf m}_ 1{\bf m}_ 1^T)+\sum\limits_{n\in\mathcal C_2}({\bf x}_ n{\bf x}_ n^T-{\bf x}_ n{\bf m}_ 2^T-{\bf m}_ 2{\bf x}_ n^T+{\bf m}_ 2{\bf m}_ 2^T)$
$=\sum\limits_{n=1}^N{\bf x}_ n{\bf x}_ n^T -\left(\sum\limits_{n\in\mathcal C_1}{\bf x}_ n\right){\bf m}_ 1^T-{\bf m}_ 1\left(\sum\limits_{n\in\mathcal C_1}{\bf x}_ n^T\right)-\left(\sum\limits_{n\in\mathcal C_2}{\bf x}_ n\right){\bf m}_ 2^T-{\bf m}_ 2\left(\sum\limits_{n\in\mathcal C_2}{\bf x}_ n^T\right)+N_1{\bf m}_1{\bf m}_1^T+N_2{\bf m}_2{\bf m}_2^T.$

Since equation (1) contains only ${\bf m}_ 1$ and ${\bf m}_ 2$, we substitute $\sum\limits_{n\in\mathcal C_1} {\bf x}_ n= N_1{\bf m}_ 1$ and $\sum\limits_{n\in\mathcal C_2}{\bf x}_ n=N_2{\bf m}_ 2$ for the corresponding sums in the last expression, and we can cancel some terms to get
$\sum\limits_{n\in\mathcal {C_1}}{\bf x}_n{\bf x}_n^T-N_1{\bf m}_1{\bf m}_1^T-N_2{\bf m}_2{\bf m}_2^T.\quad \quad \quad \quad\quad \quad \quad \quad\quad \quad \quad \quad\quad \quad \quad \quad(2)$

Treating $\frac{N_1N_2}{N}{\bf S}_ B$ similarly, we get
$\frac{N_1N_2}{N}{\bf S}_ B=\frac{N_1N_2}{N}({\bf m}_2{\bf m}_2^T-{\bf m}_2{\bf m}_1^T-{\bf m}_1{\bf m}_2^T+{\bf m}_1{\bf m}_1^T).\quad \quad \quad \quad\quad \quad \quad \quad(3)$
Likewise, the middle two terms cannot be merged.

Adding (2) and (3) and merging the same terms, we'll get exactly the same expression as (1), except the additional $\sum\limits_{n=1}^N({\bf x}_n{\bf x}_n^T)$ that appears as the first term on the left side of the equation at the bottom of page 93 that we proposed to prove.

Problem 3.2

It seems that the proof of non-singularity is invalid. In fact, matrix Φ^TΦ can be singular

quote from the 3.1.2

In practice, a direct solution of the normal equations can lead to numerical difficulties when Φ^TΦ is close to singular

Your proof of non-singularity is no-good since you suppose that φ₁...φ_M together are the basis of subspace which is spanned by Φ's columns. However, there is no reason to suppose this: they all can be collinear for instance, then any one of them will represent the basis of the subspace.

There is no need to proof that Φ^TΦ is invertible anyway since its inverse is the part of the matrix which is given as an existing object in the task condition; (this implies (Φ^TΦ)^-1 exists).

Problem 2.58

I think the mistake in 2.58 is caused by the line under
"If we multiply both sides by -1/g(n), we can obtain"

The left hand side should expand to
−∇∇ln g(η)−E[u(x)]E[u(x)T]

instead of
−∇∇ln g(η)
I've demonstrated this here

Which causes the final solution to match the answer given in the book

Thanks for doing this by the way! Such a great resource

Minor Issue in Problem 1.12 Solution

Hello Zhengqi,
first thanks for the publication of the solution manual!

Starting after the paragraph

For E[σ2 ML ], we need to take advantage of (1.56) and what has been given in the problem :
I believe that in the solution to problem 1.12 the $ 1/N $ went missing in the fifth and sixth line of the equations in the last expression with expecation of µ^2_{ML}

The seventh line is correct again.

Keep up the great work!

Greetings
Markus