What’s the deal with the logistic response?

Below I asked the question of why an approximation which is more accurate seems to be performing worse than one which is more haphazard.  The two methods differ in two principal ways:

  • The approximate gradient computed during the E-step.  
  • The optimization of the parameters in the M-step.

In the E-step they differ mainly in how they push a node’s topic distribution toward its neighbors’.  With the simple but incorrect method, the gradient is proportional to \eta^t z' while the correct method gives \eta^t z'(1 - \sigma(\eta^t z \circ z' + \nu)).  Intuitively, the last method tells us that the closer we are to the right answer, the less we need to push. 

I futzed with the code to make it so that we push more like the incorrect method, but this didn’t seem to affect the results.  I suspect that \eta^t z \circ z' + \nu is always pretty small so this doesn’t have much of an impact.  Then I tried changing the M-Step.  In particular, I tried removing the M-Step fitting with \psi_\sigma.  It turns out that this “fit” performs about as well as \psi_e.  Examining the fits shows that eta under \psi_\sigma is consistently smaller than for \psi_e.  Why is this?  I hypothesize that the L2 regularization penalty is causing this to fail.  Having to determine the optimal penalty is a pain; I originally added this term because the fits were diverging.  Perhaps the right thing to do is figure out why they were diverging in the first place.

Advertisement

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s