# Another perspective on link probability functions

When deciding when a link between two documents should exist, we need to define a function of the covariates which we call a link probability function.  We have looked at many candidate functions but concentrated on two such functions:

• $\psi_\sigma = \sigma(\eta^T z_1 \circ z_2 + \nu)$
• $\psi_e = \exp(\eta^T z_1 \circ z_2 + \nu)$

Recently, I saw a new connections between the two functions.  This connection exists in the context of variational inference.  One of the terms we must compute in variational inference is $\mathbb{E}_q[\log \psi(z_1, z_2)]$ where the expectation is taken over $z_1, z_2$ and $\mathbb{E}_q[z_i] = \phi_i$.   If we expand the equation using $\psi_\sigma$, this amounts to computing $\eta^T \phi_1 \circ \phi_2 + \nu - \mathbb{E}_q[\log (1 + \exp(\eta^T z_1 \circ z_2 + \nu))]$.  This latter term is intractable to compute exactly, so we turn to approximation.

In previous posts, I primarily explored using the approximation $\log(1 + \exp(\eta^T \phi_1 \circ \phi_2 + \nu))$.   This is equivalent to approximating the function inside the expectation with a first order Taylor approximation centered around the mean.  Because of convexity, this approximation is a tangent line which forms a lower bound on the function (see graph below).   Unfortunately, this is the wrong bound for variational inference; there the quantity we are computing is supposed to be a lower bound which means that the bound on the partition function of the link probability function needs to be an upper bound.   That was a mouthful, and things seem grim but as it turns out this approximation is rather good since most of the points lie near the mean.

But why not construct an upper bound?  This is possible because the function is convex and because the covariates’ range is bounded.  The hyperplane which intersects the function at the covariates’ bounds is an upper bound for the function.  If we apply this approximation, the expectation expands into $(\phi_1 \circ \phi_2)^T (\log(1 + \exp(\eta +\ nu)) - \log(1 + \exp(\nu)))$ $+\log(1 + \exp(\nu))$.   This upper bound is also depicted in the figure below.  But also note that if we set $\nu' = \log(1 + \exp(\nu))$ and $\eta' = \log(1 + \exp(\eta + \nu)) - \log(1 + \exp(\nu))$ then this expression becomes $\eta'^T (\phi_1 \circ \phi_2) + \nu'$ which has the exact same functional form as $\mathbb{E}_q[\psi_e(z_1, z_2)]$.  Thus, using this upper bound reduces to using $\psi_e$!

Two bounds on the link probability function

Thus, rather than considering two link probability functions, it is perhaps better to think of two approximations to a single link probability function.  This way of thinking also sheds some light on why perhaps $\psi_e$ tends to perform better than $\psi_\sigma$.  Even though the lower-bound approximation is “better” in the sense that it is closer to the true value of the expectation, the bound is in the wrong direction for variational inference.  Maybe “cowboy” variational doesn’t work all that well after all.