So in reading group today Chernoff bounds came up. The bound basically says that if we sample random variables and sum them we are very likely to be close to the expected sum. Estimating partition functions is a central challenge in many machine learning applications, so couldn’t we just sample and estimate the sum and have the bound guarantee that we’re not far off?
To wit, let be independent random variables bounded between 0 and . Let . Then
Now let us adapt this to estimating the partition function, Let where is some measure bounded above by Note that is a random variable that satisfies the aforementioned conditions. Also note that and that This means that we can write the condition in the inequality as where is the empirically estimated expected value of . This seems pretty good doesn’t it?
So let’s see what happens when we do things empirically. I randomly generated Ising models by sampling canonical parameters (using the standard overspecified representation) from a N(0, 1) distribution for both individual and pair parameters. In total, there were 40 nodes. I estimate the partition function (or rather, the partition function divided by 2^40) using the above technique and plot the estimate as a function of iteration over 300 iterations. I restarted the estimation 5 times to see if the different runs would converge to similar values. Attached are said plots for 2 different models (10 runs of 300 iterations total).
Monte carlo estimates of an Ising model partition function (1/2)
Monte carlo estimates of an Ising model partition function (2/2)
It seems like this estimate does horrendously. You might as well use a random number generator =). The key problem is that there are sudden jumps in our estimates. When there’s enough coupling in the model, the modes become very peaky. Another way of looking at it is by examining the bound on the right. Recall that this term depends on the bound on the measure, . Unfortunately, this bound can be quite large; for the parameters I’ve chosen, it is something in the neighborhood of and expands exponentially in the number of nodes in the graph.
So there we have it, Hoeffding’s inequality can only be applied when there isn’t much coupling and the bound on the measure is small. As the bound increases, Hoeffding’s inequality gives us exponentially worse guarantees; unfortunately, the strongly coupled case is precisely the interesting case!