Daily Archives: January 23, 2010

The cost of a sample

I once heard it on good authority that Gelman says you usually don’t need more than 12 samples. Well, as a result of a discussion with Sam Gershman (sorry Sam for not answering the actual question you asked!), I wondered if that was true; that is, if under reasonable assumptions it might be better to take a small number of samples. Caveat: there’s probably lots of work on this already, but where would the fun be in that?

Ok, let’s assume that your goal is to estimate \mathbb{E}_{z \sim p(z | x)}[f(z)], where p(z | x) represents some distribution on hidden variables over which you are trying to compute a function, f. For the usual reasons, it’s intractable to compute this exactly, so you’re going to use a sampler. Let’s assume

  • that your sampler has mixed and that you’re getting independent samples (that condition alone should give you fair warning that what I’m about to say is of little practical value);
  • f is bounded (say between 0 and 1);
  • to obtain n samples from the sampler costs some amount, say R(n).

More samples are usually better, because they’ll give you a better representation of the true distribution, i.e. \mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] \rightarrow \mathbb{E}_{z \sim p(z | x)}[f(z)], where \hat{p_n} is the distribution obtained by using n samples. Unfortunately, more samples come at a cost here, so you don’t want too many. How should you tradeoff then?

We can define a loss by \ell = R(n) + |\mathbb{E}_{z \sim \hat{p_n}(z | x)}[f(z)] - \mathbb{E}_{z \sim p(z | x)}[f(z)]|, that is, how far off our sampled estimate is from the truth, plus the cost of obtaining those samples. Using Hoeffding, we can bound the loss \ell < R(n) + \epsilon with probability 1 - 2 \exp( -2 n \epsilon^2). This expression gives you something to think about when you're trying to decide how many samples to take — more samples loosen the bound but increase its probability.

If your cost is linear, R(n) = a n, you might want to choose
something like n = \frac{\epsilon}{a}, which gives you a loss of \ell < 2 \epsilon with probability 1 - 2 \exp(-2 \epsilon^3 / a).

The plot below shows what might happen if you make such a choice. Here, I've let the posterior be an equiprobable binomial distribution. The function I'm computing is the identity f(z) = z. The curves show the loss, \ell for various choices of the cost parameter a as a function of the number of samples. The dots show the chosen values of n for each value of a; the horizontal lines show the 80% loss bound for these choices.

Turns out for some reasonable values, you really should stick to about 12 samples.



Filed under Uncategorized