Kernels#

Developed by Elias Anderssen Dalan ✉️, supported by Centre for Computing in Science Education and Hylleraas Centre for Quantum Molecular Sciences.

Things you might need to know before tackling this notebook:#

A crucial ingredient of Gaussian Processes (GPs) is the kernel function (also called the covariance function). The kernel determines the similarity between two data points, under the assumption that similar datapoints share similar output values. The kernel function, often shown as \(k(\mathbf{x_i},\mathbf{x_j})\) can take on different shapes.

For instance you have the constant kernel, \begin{equation} k(\mathbf{x_i},\mathbf{x_j}) = c, \end{equation}

the rational quadratic kernel (RQ), \begin{equation} k(\mathbf{x_i},\mathbf{x_j}) = (1 + d(\mathbf{x_i},\mathbf{x_j}))^{-\alpha}, \end{equation}

And the radial basis function (RBF): \begin{equation} k(\mathbf{x_i},\mathbf{x_j}) =e^{-l\cdot d(x_i, x_j)^2}, \end{equation} where \(d(x_i, x_j) = |\mathbf{x_i} - \mathbf{x_j}|\) (the euclidian distance).

Why is this important?#

When we want to use Gaussian process regression to create a model corresponding to some data \(X_1\), and use this model to predict output for some new data \(X_2\) we need some way to express the similarity between the data in \(X_1\) and the data in \(X_2\). This is where the kernel comes in. We construct a covariance matrix just like shown in the Notebook on covariance, but we let the kernel, k, determine the covariance! Given \(V = (\mathbf{v_{1}},\mathbf{v_{2}}, ..., \mathbf{v_{n}})\) and \(W = (\mathbf{w_{1}},\mathbf{w_{2}}, ..., \mathbf{w_{m}})\) we can construct a covariance matrix which will look like this:

\begin{equation} K(V, W) = \label{eqn:covmatrix} \begin{bmatrix} k(v_1,w_1) & k(v_1,w_2) & k(v_1,w_3) & \dots & k(v_1,w_n) \ k(v_2,w_1) & k(v_2,w_2) & k(v_2,w_3) & \dots & k(v_2,w_n)\ k(v_3,w_1) & k(v_3,w_2) & k(v_3,w_3) & \dots & k(v_3,w_n) \ \vdots & \vdots & \vdots & \ddots & \vdots \ k(v_m,w_1) & k(v_m,w_2) & k(v_m,w_3) & \dots & k(v_m,w_n) \end{bmatrix} \end{equation}

The module btjenesten (which will be used for the rest of the GPR-series) has a module names kernels, where some kernels are implemented. These can be downloaded using pip install btjenesten. Under is an example of how a covariance-matrix is constructed from two datasets.

from btjenesten import kernels
import numpy as np
v1 = [1, 2, 3]
v2 = [3, 2, 1]

w1 = [4, 3, 2]
w2 = [1, 2, 3]

V = [v1, v2]
V = np.array(V)

W = [w1, w2]
W = np.array(W)

cov_VW = kernels.RBF(V, W)

print(cov_VW)
[[1.67017008e-05 1.00000000e+00]
 [4.97870684e-02 3.35462628e-04]]

As you can see the top right element is equal to one, since \(\mathbf{v_1} = \mathbf{w_2}\) !

Now you have a grasp of the kernel used in Gaussian process regression, and how it is used.