23 | January | 2010 | Please Scoop Me!

I once heard it on good authority that Gelman says you usually don’t need more than 12 samples. Well, as a result of a discussion with Sam Gershman (sorry Sam for not answering the actual question you asked!), I wondered if that was true; that is, if under reasonable assumptions it might be better to take a small number of samples. Caveat: there’s probably lots of work on this already, but where would the fun be in that?

Ok, let’s assume that your goal is to estimate $\mathbb{E}_{z \sim p(z | x)}[f(z)]$ , where $p(z | x)$ represents some distribution on hidden variables over which you are trying to compute a function, $f$ . For the usual reasons, it’s intractable to compute this exactly, so you’re going to use a sampler. Let’s assume

that your sampler has mixed and that you’re getting independent samples (that condition alone should give you fair warning that what I’m about to say is of little practical value);
$f$ is bounded (say between 0 and 1);
to obtain $n$ samples from the sampler costs some amount, say $R(n)$ .

More samples are usually better, because they’ll give you a better representation of the true distribution, i.e. $\mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] \rightarrow \mathbb{E}_{z \sim p(z | x)}[f(z)]$ , where $\hat{p_n}$ is the distribution obtained by using $n$ samples. Unfortunately, more samples come at a cost here, so you don’t want too many. How should you tradeoff then?

We can define a loss by $\ell = R(n) + |\mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] - \mathbb{E}_{z \sim p(z | x)}[f(z)]|$ , that is, how far off our sampled estimate is from the truth, plus the cost of obtaining those samples. Using Hoeffding, we can bound the loss $\ell < R(n) + \epsilon$ with probability $1 - 2 \exp( -2 n \epsilon^2)$ . This expression gives you something to think about when you're trying to decide how many samples to take — more samples loosen the bound but increase its probability.

If your cost is linear, $R(n) = a n$ , you might want to choose
something like $n = \frac{\epsilon}{a}$ , which gives you a loss of $\ell < 2 \epsilon$ with probability $1 - 2 \exp(-2 \epsilon^3 / a)$ .

The plot below shows what might happen if you make such a choice. Here, I've let the posterior be an equiprobable binomial distribution. The function I'm computing is the identity $f(z) = z$ . The curves show the loss, $\ell$ for various choices of the cost parameter $a$ as a function of the number of samples. The dots show the chosen values of $n$ for each value of $a$ ; the horizontal lines show the 80% loss bound for these choices.