# Approximating the logistic response

The central challenge of variational methods is usually computing expectations of log probabilities.  In the case of the RTM, this is $\mathbb{E}[\log p(y | z, z')] = y \mathbb{E}[x] - \mathbb{E}[\log(1 + \exp(x))],$ where $x = \eta^t z \circ z' + \nu$.

The first term is linear and so is easy enough, the second is problematic though.  One approach is to use a Taylor approximation.  The issue then becomes choosing the point around which to center the approximation.  The partition function above really has two regimes: for small $x, \log(1 + \exp(x)) \approx 0$, but for large $x, \log(1 + \exp(x)) \approx x$.  The solution that the delta method uses is to center it at the mean $\mu = \mathbb{E}[x]$. But does this give us any real guarantee that we won’t be better off by centering it elsewhere?

I couldn’t really answer this question analytically, so I decided to experiment.  I sampled $x$ using settings typical of the corpora I look at.  Turns out that the first order approximation at the mean is really good because the variance on z is really low when you have enough words.

That of course brings up another question.  Why does doing the “correct” ($\psi_\sigma$) thing not work as well as the “incorrect” ($\psi_e$) approximation?