Principal Hessian Directions

Principal Hessian Directions (pHd) is a dimension reduction method proposed by Ker-Chau Li.

Ker-Chau Li, "On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma", JASA 1992

Book chapter by Ker-Chau Li on pHd

Consider a regression model for (x,y) in which the regression function f(x) = E[y | x] depends only on a low dimensional projection of x: f(x) = g(B'x) for some matrix B (say, with fewer columns than rows). The goal is to find the range of B.

The Hessian of f at x is H(x) = B M(B'x) B', where M(B'x) is the Hessian of g at B'x. So the range of H(x) is contained in the range of B. The same is then true for the expected value of H(x). However, it is not obvious how to directly obtain an estimate of E[H(x)] using only an iid sample of (x,y).

The pHd method exploits Stein's identities to obtain an estimate of E[H(x)]. Suppose, for instance, that x is standard normal. Then E[y(xx' - I)] = E[f(x)(xx' - I)] = ... = E[H(x)]. The "..." part follows using integration-by-parts twice. This suggests estimating E[y(xx' - I)], say, by replacing the expectation with an empirical average over an iid sample, and then extracting the range of the best low rank approximation.

Note: It is not guaranteed that the entire range of B is captured by the expected Hessian of f. For instance, it may be that H(x) has mean zero (e.g., homogeneous cubic polynomials).

Learning convex concepts

Santosh Vempala considered the case where f is the 0-1 indicator function for a convex cylinder. That is, there is a convex body in some subspace such that f(x) = 1 if and only if the orthogonal projection of x to the subspace is contained in the convex body.

Santosh Vempala, "Learning Convex Concepts from Gaussian Distributions with PCA", FOCS 2010

Vempala's method turns out to be a special case of pHd. Let p be the marginal probability that y = 1. Then E[H(x)] = E[f(x)(xx' - I)] = E[f(x)xx'] - pI = p(E[xx' | y=1] - I). Vempala proves that E[(x'v)^2 | y=1] is at most 1 for any unit vector v, with equality if and only if v is orthogonal to the aforementioned subspace. So one can estimate the second moment matrix of just the positive examples, and extract the subspace spanned by its eigenvectors corresponding to non-unit eigenvalues.