I once heard it on good authority that Gelman says you usually don’t need more than 12 samples. Well, as a result of a discussion with Sam Gershman (sorry Sam for not answering the actual question you asked!), I wondered if that was true; that is, if under reasonable assumptions it might be better to take a small number of samples. Caveat: there’s probably lots of work on this already, but where would the fun be in that?

Ok, let’s assume that your goal is to estimate , where represents some distribution on hidden variables over which you are trying to compute a function, . For the usual reasons, it’s intractable to compute this exactly, so you’re going to use a sampler. Let’s assume

**that your sampler has mixed and that you’re getting independent samples**(that condition alone should give you fair warning that what I’m about to say is of little practical value);- is bounded (say between 0 and 1);
- to obtain samples from the sampler costs some amount, say .

More samples are usually better, because they’ll give you a better representation of the true distribution, i.e. , where is the distribution obtained by using samples. Unfortunately, more samples come at a cost here, so you don’t want too many. How should you tradeoff then?

We can define a loss by , that is, how far off our sampled estimate is from the truth, plus the cost of obtaining those samples. Using Hoeffding, we can bound the loss with probability . This expression gives you something to think about when you're trying to decide how many samples to take — more samples loosen the bound but increase its probability.

If your cost is linear, , you might want to choose

something like , which gives you a loss of with probability .

The plot below shows what might happen if you make such a choice. Here, I've let the posterior be an equiprobable binomial distribution. The function I'm computing is the identity . The curves show the loss, for various choices of the cost parameter as a function of the number of samples. The dots show the chosen values of for each value of ; the horizontal lines show the 80% loss bound for these choices.

Turns out for some reasonable values, you really should stick to about 12 samples.