20 | November | 2008 | Please Scoop Me!

Below I asked the question of why an approximation which is more accurate seems to be performing worse than one which is more haphazard. The two methods differ in two principal ways:

The approximate gradient computed during the E-step.
The optimization of the parameters in the M-step.

In the E-step they differ mainly in how they push a node’s topic distribution toward its neighbors’. With the simple but incorrect method, the gradient is proportional to $\eta^t z'$ while the correct method gives $\eta^t z'(1 - \sigma(\eta^t z \circ z' + \nu))$ . Intuitively, the last method tells us that the closer we are to the right answer, the less we need to push.

I futzed with the code to make it so that we push more like the incorrect method, but this didn’t seem to affect the results. I suspect that $\eta^t z \circ z' + \nu$ is always pretty small so this doesn’t have much of an impact. Then I tried changing the M-Step. In particular, I tried removing the M-Step fitting with $\psi_\sigma$ . It turns out that this “fit” performs about as well as $\psi_e$ . Examining the fits shows that $eta$ under $\psi_\sigma$ is consistently smaller than for $\psi_e$ . Why is this? I hypothesize that the L2 regularization penalty is causing this to fail. Having to determine the optimal penalty is a pain; I originally added this term because the fits were diverging. Perhaps the right thing to do is figure out why they were diverging in the first place.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Daily Archives: November 20, 2008

What’s the deal with the logistic response?

Blog Stats

Archives

Meta