August 2006 updated January 2008
An underlying premise in the analysis and modeling of high-dimensional physical and biological systems is that data generated by measuring thousands of variables lie on or near a low-dimensional manifold. This premise has led to various estimation and learning problems grouped under the heading of ``manifold learning.'' It is natural to formulate the problem of feature selection -- finding salient variables (or linear combinations of salient variables) and estimating how they covary -- in the manifold setting. For regression and classification the idea of selecting features via estimates of the gradient of the regression and classification function has been developed. In this paper we extend this approach from the Euclidean setting to the manifold setting. This results in: a method for estimating gradients on a manifold, generalization bounds for this gradient estimate, and a novel variable selection or dimensionality reduction procedure. The utility of our approach is illustrated on simulated and real data.
Keywords: Gradient estimation, Riemannian geometry, manifold learning, variable selection, feature selection, dimensionality reduction,
The manuscript is available in PDF formats.