# What’s the deal with the logistic response?

Below I asked the question of why an approximation which is more accurate seems to be performing worse than one which is more haphazard.  The two methods differ in two principal ways:

• The approximate gradient computed during the E-step.
• The optimization of the parameters in the M-step.

In the E-step they differ mainly in how they push a node’s topic distribution toward its neighbors’.  With the simple but incorrect method, the gradient is proportional to $\eta^t z'$ while the correct method gives $\eta^t z'(1 - \sigma(\eta^t z \circ z' + \nu))$.  Intuitively, the last method tells us that the closer we are to the right answer, the less we need to push.

I futzed with the code to make it so that we push more like the incorrect method, but this didn’t seem to affect the results.  I suspect that $\eta^t z \circ z' + \nu$ is always pretty small so this doesn’t have much of an impact.  Then I tried changing the M-Step.  In particular, I tried removing the M-Step fitting with $\psi_\sigma$.  It turns out that this “fit” performs about as well as $\psi_e$.  Examining the fits shows that $eta$ under $\psi_\sigma$ is consistently smaller than for $\psi_e$.  Why is this?  I hypothesize that the L2 regularization penalty is causing this to fail.  Having to determine the optimal penalty is a pain; I originally added this term because the fits were diverging.  Perhaps the right thing to do is figure out why they were diverging in the first place.