I've been reading Elemnts of Statistical Learning by Hastie et. al. And I'm pretty sure the following ingredients is just wrong: "Note that the actual maximizer of the likelihood occurs when e put a spike of infinite height at any one dara point..." pg 274 par 4. They are talking about gaussian mixtures. But if you interpret this literally then the likelihood is 0 since all the other points have likelihood 0. If you interpret it in the limit the same occurs. The limit goes to zero, since one point in the product goes to 1 and the rest go to zero. Am I missing something? The book's been good so far.
@khird a likelihood needs to be between 0 and 1 since its a probability so when he says it goes to infinity I assume that its just a typo and he means the negative log likelihood. You are, however, correct in that if the mixture parameter does not entirely favor the dirac then the log likelihood is nonzero. In fact, if you let the the mixture parameter vary and hold one side at a Dirac and the other a gaussian, you would expect the maximum likelihood solution to entirely favor the gaussian. It just seems like however wrote that section was mistaken. It is true that mixture models tend to have bad local minima and can have multiple mle solutions, so I think they just made a mistake while expressing that point.