In the previous post I argued that the second order approximation is useful for prediction. Let’s apply that to a model with links and see what happens. The random variable over which we take the expectation is now and the second order term is then where is the Hessian matrix. We address each of these terms in turn.
Let Then and Consequently, the entries of the Hessian are
Now to compute the covariance. Observe that by independence. This can be rewritten as where and
I implemented this new approximation and here are the results.
|Inference method||Parameter estimation||Prediction method||Log likelihood (Higher is better)|
|Regularized optimization||first order||-11.755|
|Fixed at 4,-3||first order||-11.5183|
|Fixed at 4,-3||-11.4354|
|Fixed at 4,-3||second order||-11.5192|
|Fixed at 4,-3||-11.4352|
Suckitude. The only thing that really matters is
- Fix parameters.
- Use predictive model.
The rest doesn’t matter. Why? It might help to look at the plots of the link probability functions. These are parameterized by and We use a simplified model here where is assumed to be the same across all components. In that case, the response is entirely a function of scalar, I plot the values of the response against different values of this scalar.
The endpoints are the same but there’s a dip in the middle for I suppose that although this function seems more “correct,” and is better able to lower the joint likelihood, the dip means that it is less aggressive in proposing positive links and is therefore worse at precision/recall type measures.
So why not directly optimize the objective we care about?