Approximating the logistic response

The central challenge of variational methods is usually computing expectations of log probabilities.  In the case of the RTM, this is \mathbb{E}[\log p(y | z, z')] = y \mathbb{E}[x] - \mathbb{E}[\log(1 + \exp(x))], where x = \eta^t z \circ z' + \nu.

The first term is linear and so is easy enough, the second is problematic though.  One approach is to use a Taylor approximation.  The issue then becomes choosing the point around which to center the approximation.  The partition function above really has two regimes: for small x, \log(1 + \exp(x)) \approx 0, but for large x, \log(1 + \exp(x)) \approx x.  The solution that the delta method uses is to center it at the mean \mu = \mathbb{E}[x]. But does this give us any real guarantee that we won’t be better off by centering it elsewhere? 

I couldn’t really answer this question analytically, so I decided to experiment.  I sampled x using settings typical of the corpora I look at.  Turns out that the first order approximation at the mean is really good because the variance on z is really low when you have enough words.

That of course brings up another question.  Why does doing the “correct” (\psi_\sigma) thing not work as well as the “incorrect” (\psi_e) approximation?

Advertisement

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s