Over the recent years, machine learning techniques have breathed a new life in to the
classical regression framework. The primary focus in these techniques has often been the
predictive performance of the estimated models and the models themselves have developed in
to sophisticated non-linear predictive machines. In this development, the ubiquitous
“kernel-trick” has played a very important role by providing a means to compute the
inner products in the unwieldy high-dimensional spaces via simple and easily computable
functions on the low-dimensional covariate domains, called as kernels. The domain
knowledge of data dictates the collection of kernels suitable for the
specific application. In “learning the kernel” paradigm, current state of the art is
to use some optimization method to select the best kernel for the data at hand from this
collection.
The work in this dissertation assumes the existence of a “true” underlying process, a
Gaussian Process, (defined by a fully specified covariance kernel) for the given data. The
Gaussian Process itself is considered as a prior on the reproducing kernel Hilbert space
of functions characterized by the associated kernel. The goal is to make suggestions
towards developing some diagnostic tools which can be used to hasten the kernel learning
process. In particular, the setup for computational experimentation is restricted to a
Gaussian Process Regression framework with some “mild stationarity”
and “closure”
type of assumptions on the possible family of kernels. Tools are developed based on the
generalized cross validation and the functional norm of the estimated functions. The
sign-change behaviors of these tools are exploited for diagnostic purposes. For the tool based on
generalized cross validation, a result is conjectured based on computational evidence, and
partially proved, which attempts to justify the observed sign-change patterns. Complete
proofs for the said result are given under some special classes of kernels. These
sign-change behaviors are intended to be a “guiding stick” for reducing the
computational effort and search space for “learning the kernel.”