# Deriving and evaluating the second order approximation for links

In the previous post I argued that the second order approximation is useful for prediction.  Let’s apply that to a model with links and see what happens. The random variable over which we take the expectation is now $\bar{z}_1 \circ \bar{z}_2$ and the second order term is then $\frac{1}{2}\mathrm{Cov}(\bar{z}_1 \circ \bar{z}_2) H(\mathbb{E}[\bar{z}_1 \circ \bar{z}_2]),$ where $H$ is the Hessian matrix.   We address each of these terms in turn.

Let $x = \eta^T (\bar{z}_1 \circ \bar{z}_2) + \nu.$  Then $f = \sigma(x)$ and $\partial_{\bar{z}_{1k} \bar{z}_{2k}} f(x) = \eta_k \sigma'(x).$  Consequently, the entries of the Hessian are $H_{ij} = \eta_i \eta_j \sigma''(x).$

Now to compute the covariance.  Observe that $\mathbb{E}[\bar{z}_{1k} \bar{z}_{2k} \bar{z}_{1j} \bar{z}_{2j}] = \mathbb{E}[\bar{z}_{1k} \bar{z}_{1j}] \mathbb{E}[\bar{z}_{2k} \bar{z}_{2j}]$ by independence.  This can be rewritten as $C^1_{kj} C^2_{kj} + C^1_{kj} \mu^2_{kj} + C^2_{kj} \mu^1_{kj},$ where $C^i_{kj} = \mathrm{Cov}(\bar{z}_{ik}, \bar{z}_{ij})$ and $\mu^i_{kj} = \mathbb{E}[\bar{z}_{ik}]\mathbb{E}[\bar{z}_{ij}].$

I implemented this new approximation and here are the results.

 Inference method Parameter estimation Prediction method Log likelihood (Higher is better) $\psi_\sigma$ Regularized optimization $\psi_\sigma$ first order -11.755 $\psi_e$ Regularized optimization $\psi_e$ -11.6019 $\psi_\sigma$ Fixed at 4,-3 $\psi_\sigma$ first order -11.5183 $\psi_e$ Fixed at 4,-3 $\psi_e$ -11.4354 $\psi_\sigma$ Fixed at 4,-3 $\psi_\sigma$ second order -11.5192 $\psi_\sigma$ Fixed at 4,-3 $\psi_e$ -11.4352

Suckitude.  The only thing that really matters is

• Fix parameters.
• Use $\psi_e$ predictive model.

The rest doesn’t matter.  Why?  It might help to look at the plots of the link probability functions.  These are parameterized by $\eta$ and $\nu.$  We use a simplified model here where $\eta$ is assumed to be the same across all components.  In that case, the response is entirely a function of scalar, $\vec{1}^T (\bar{\phi}_1 \bar{\phi}_2).$  I plot the values of the response against different values of this scalar.

Comparison of different link probability functions

Comparison of different link prediction functions (log)

The endpoints are the same but there’s a dip in the middle for $\psi_\sigma.$  I suppose that although this function seems more “correct,” and is better able to lower the joint likelihood, the dip means that it is less aggressive in proposing positive links and is therefore worse at precision/recall type measures.

So why not directly optimize the objective we care about?