I've been reading Elemnts of Statistical Learning by Hastie et. al. And I'm pretty sure the following ingredients is just wrong: "Note that the actual maximizer of the likelihood occurs when e put a spike of infinite height at any one dara point..." pg 274 par 4. They are talking about gaussian mixtures. But if you interpret this literally then the likelihood is 0 since all the other points have likelihood 0. If you interpret it in the limit the same occurs. The limit goes to zero, since one point in the product goes to 1 and the rest go to zero. Am I missing something? The book's been good so far.

Follow

@mandlebro In the context of the specific problem (mixture of two Gaussian distributions) he's discussing, the likelihood is nonzero. He's set only μ̂₁=yₓ, σ̂₁=0, but μ̂₂ and σ̂₂ can be anything; so he's effectively mixing a finite Gaussian with a Dirac delta function as an edge case. If you say Δₓ=0 and all other Δ=1, you're using the Dirac delta to explain yₓ and the finite Gaussian to explain all other y. The Dirac delta doesn't have to explain the other points, so the fact that it'd be zero at those values of y doesn't force the likelihood to zero.

That said, I don't see how he concludes this gives *infinite* likelihood (or even necessarily the maximum). It seems to me it just collapses to the likelihood of the finite Gaussian - one point goes to certainty (i.e. 1) and the others go to their likelihood under the finite Gaussian, which is between zero and one. But this isn't the sort of math I do much of, and intuition isn't always reliable when terms are going to infinity.

Hopefully my reasoning makes sense as to why the likelihood isn't zero in general. If it becomes clear to you why it goes to infinity and/or represents a maximum, please share your insights - you've got me curious! Alternatively, if you disagree with my reasoning, and you still think it should be zero, I'm happy to reconsider my position on it.

@khird a likelihood needs to be between 0 and 1 since its a probability so when he says it goes to infinity I assume that its just a typo and he means the negative log likelihood. You are, however, correct in that if the mixture parameter does not entirely favor the dirac then the log likelihood is nonzero. In fact, if you let the the mixture parameter vary and hold one side at a Dirac and the other a gaussian, you would expect the maximum likelihood solution to entirely favor the gaussian. It just seems like however wrote that section was mistaken. It is true that mixture models tend to have bad local minima and can have multiple mle solutions, so I think they just made a mistake while expressing that point.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.