Daily Archives: December 3, 2008

Deriving and evaluating the second order approximation for links

In the previous post I argued that the second order approximation is useful for prediction.  Let’s apply that to a model with links and see what happens. The random variable over which we take the expectation is now \bar{z}_1 \circ \bar{z}_2 and the second order term is then \frac{1}{2}\mathrm{Cov}(\bar{z}_1 \circ \bar{z}_2) H(\mathbb{E}[\bar{z}_1 \circ \bar{z}_2]), where H is the Hessian matrix.   We address each of these terms in turn.  

Let x = \eta^T (\bar{z}_1 \circ \bar{z}_2) + \nu.  Then f = \sigma(x) and \partial_{\bar{z}_{1k} \bar{z}_{2k}} f(x) = \eta_k \sigma'(x).  Consequently, the entries of the Hessian are H_{ij} = \eta_i \eta_j \sigma''(x).

 Now to compute the covariance.  Observe that \mathbb{E}[\bar{z}_{1k} \bar{z}_{2k} \bar{z}_{1j} \bar{z}_{2j}] = \mathbb{E}[\bar{z}_{1k} \bar{z}_{1j}] \mathbb{E}[\bar{z}_{2k} \bar{z}_{2j}] by independence.  This can be rewritten as C^1_{kj} C^2_{kj} + C^1_{kj} \mu^2_{kj} + C^2_{kj} \mu^1_{kj}, where C^i_{kj} = \mathrm{Cov}(\bar{z}_{ik}, \bar{z}_{ij}) and \mu^i_{kj} = \mathbb{E}[\bar{z}_{ik}]\mathbb{E}[\bar{z}_{ij}].

I implemented this new approximation and here are the results.  

Inference method Parameter estimation Prediction method Log likelihood (Higher is better)
\psi_\sigma Regularized optimization \psi_\sigma first order -11.755
\psi_e Regularized optimization \psi_e -11.6019
\psi_\sigma Fixed at 4,-3 \psi_\sigma first order -11.5183
\psi_e Fixed at 4,-3 \psi_e -11.4354
\psi_\sigma Fixed at 4,-3 \psi_\sigma second order -11.5192
\psi_\sigma Fixed at 4,-3 \psi_e -11.4352

Suckitude.  The only thing that really matters is

  • Fix parameters.
  • Use \psi_e predictive model.

The rest doesn’t matter.  Why?  It might help to look at the plots of the link probability functions.  These are parameterized by \eta and \nu.  We use a simplified model here where \eta is assumed to be the same across all components.  In that case, the response is entirely a function of scalar, \vec{1}^T (\bar{\phi}_1 \bar{\phi}_2).  I plot the values of the response against different values of this scalar.

Comparison of different link probability functions

Comparison of different link prediction functions (log)

The endpoints are the same but there’s a dip in the middle for \psi_\sigma.  I suppose that although this function seems more “correct,” and is better able to lower the joint likelihood, the dip means that it is less aggressive in proposing positive links and is therefore worse at precision/recall type measures.  

So why not directly optimize the objective we care about?


Leave a comment

Filed under Uncategorized