![]() over the possible outcome of Ĭase 1: Let’s say we have a belief that the dice is fair so that its true distribution should be It’s observed in the data set that the outcome is distribution is q is the estimated (observed) distribution of X (this is the predicted value of y-hat value in a ML problem)įor example, consider another dice example.p is the true distribution of X (this is the label of the y value in a ML problem).It might be confusing at first, but the reason we cannot use X in this case is that both p and q are distribution of X. What is changed from the formula for entropy H(X) is that now the argument of the random variable X is replaced by p and q. It’s called the cross entropy of distribution q relative to a distribution p. Hopefully you are convinced that the formula for Entropy H(X) measures the randomness.Ĭross refers to the fact that it needs to relate two distributions. 0.01*log(0.01) - 0.01*log(0.01) - 0.95*log(0.95) = 0.279Īs you can see, Case 1 with the most “randomness” has the highest entropy while Case 3 with least “randomness” has the lowest entropy. 0.1*log(0.1) - 0.1*log(0.1) - 0.5*log(0.5) = 1.289Ĭase 3, consider an extremely unfair dice that is very heavily biased to rolling a 6, with distribution 1/6*log(1/6) - 1/6*log(1/6) - 1/6*log(1/6) = 1.791Ĭase 2: consider the roll of a unfair dice that is more biased to rolling a 6, with distribution It is defined on between two probability distribution p and q where p is the true distribution and q is the estimated distribution.Įntropy measures the degree of randomness, and it’s denoted by the formula below:įor example, consider the roll of a six sided dice.Ĭase 1: For a fair dice, the is distribution can be represented as Cross Entropy and Negative Log Likelihood are the same thing, but I would like to explain why they are called that way to make them easier to understand.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |