# Daily Archives: January 23, 2010

## The cost of a sample

I once heard it on good authority that Gelman says you usually don’t need more than 12 samples. Well, as a result of a discussion with Sam Gershman (sorry Sam for not answering the actual question you asked!), I wondered if that was true; that is, if under reasonable assumptions it might be better to take a small number of samples. Caveat: there’s probably lots of work on this already, but where would the fun be in that?

Ok, let’s assume that your goal is to estimate $\mathbb{E}_{z \sim p(z | x)}[f(z)]$, where $p(z | x)$ represents some distribution on hidden variables over which you are trying to compute a function, $f$. For the usual reasons, it’s intractable to compute this exactly, so you’re going to use a sampler. Let’s assume

• that your sampler has mixed and that you’re getting independent samples (that condition alone should give you fair warning that what I’m about to say is of little practical value);
• $f$ is bounded (say between 0 and 1);
• to obtain $n$ samples from the sampler costs some amount, say $R(n)$.

More samples are usually better, because they’ll give you a better representation of the true distribution, i.e. $\mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] \rightarrow \mathbb{E}_{z \sim p(z | x)}[f(z)]$, where $\hat{p_n}$ is the distribution obtained by using $n$ samples. Unfortunately, more samples come at a cost here, so you don’t want too many. How should you tradeoff then?

We can define a loss by $\ell = R(n) + |\mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] - \mathbb{E}_{z \sim p(z | x)}[f(z)]|$, that is, how far off our sampled estimate is from the truth, plus the cost of obtaining those samples. Using Hoeffding, we can bound the loss $\ell < R(n) + \epsilon$ with probability $1 - 2 \exp( -2 n \epsilon^2)$. This expression gives you something to think about when you're trying to decide how many samples to take — more samples loosen the bound but increase its probability.

If your cost is linear, $R(n) = a n$, you might want to choose
something like $n = \frac{\epsilon}{a}$, which gives you a loss of $\ell < 2 \epsilon$ with probability $1 - 2 \exp(-2 \epsilon^3 / a)$.

The plot below shows what might happen if you make such a choice. Here, I've let the posterior be an equiprobable binomial distribution. The function I'm computing is the identity $f(z) = z$. The curves show the loss, $\ell$ for various choices of the cost parameter $a$ as a function of the number of samples. The dots show the chosen values of $n$ for each value of $a$; the horizontal lines show the 80% loss bound for these choices.

Turns out for some reasonable values, you really should stick to about 12 samples.