Numerical integration#

There are many applications of integration in chemistry. Integration is used for everything from calculating molecular geometry in quantum chemistry to finding the area of a peak in an NMR spectrum or a chromatogram to find the concentration of analytes. In addition, we need to integrate to solve differential equations, which are really important in physical chemistry and quantum chemistry. You should therefore have a good grasp at what the integral means, what we get by integrating, as well as methods of calculating integrals. Let’s prepare you for exactly that!

What is the integral, really?#

Historically, integration is a way of calculating areas and volumes. Today, we use integrals for a lot of different things, but in many cases, it is useful to think of the integral as a sum of small pieces. The integral sign ∫ is in fact an elongated S, which means sum. In ancient Greek, over 1500 years before calculus was invented by Newton and Leibniz, the greek mathematicians calculated areas of geometric figures by partitioning them into smaller areas, often rectangles, and summing up the areas of all the smaller areas. This was an inefficient way of computing areas and volumes, since we need a lot of small areas to get a good approximation. Then came calculus, the study of infinitesimals and continous change.

With calculus (about 1700) came rigorous mathematical theorems and integration rules which you have learned to use in high school. This lets us compute integrals quickly (sometimes!) and exact. We call this analytical integration. Perhaps the most powerful result in calculus is the fundamental theorem of calculus (sounds important, right?!). This states that integration is the reverse of differentiating (taking the derivative):

Fundamental theorem of calculus

Let f be a continous function in the interval [a,b], and F be a function defined for x in [a,b]. That is

\[F(x) = \int_a^x f(t) dt\]

then we have

\[F'(x) = f(x)\]

Essentially, this means that the integral of a function is equal to the antiderivative to the function.

Calculus is a very important tool, but there are especially two caveats to using a set of rules to compute integrals:

  • Not all integrals can be solved analytically.

  • When using rules, you might forget what you are actually doing, that is, what the integral really means.

Do you sometimes feel that you get in a “plug-and-chug” mode, where you apply integration rules and do a lot of algebra, get the right result, but do not really understand what is going on? You are not alone. When the ancient greeks used sums of small parts, they used what we now know as the definition of the integral to approximate areas and volumes. They couldn’t have forgot what the integral meant in the process, since they kind of worked with the very definition of it. Couldn’t we use the same technique today? Yes, of course. But wasn’t it quite tedious to do all these calculations? Well, now we have another tool at our disposal: the computer. Summing thousands of areas is now done in a fraction of a second, greatly outperforming the ancient greeks. So let us get into the mindset of an ancient g(r)eek and start integrating!

Exercise

By using the concept of the integral as a sum of very small parts, interpret what the following integral, which describes the total distance (s) traveled by a particle following the velocity described by the function \(v(t)\) from the start time \(t_0\) to the end time \(t_n\).

\[s = \int_{t_0}^{t_n} v(t) dt\]

Rectangle methods#

Let us take a look at how we can approximate an integral as a sum of geometric figures, like the ancient greeks did. The simplest way of doing this, is by using rectangles:

Here, we use \(N = 10\) rectangles to approximate the integral \(f(x) = \cos{(x)} + 2\) for \(x\in[2,12]\). The width of all rectangles is then \((b-a)/N = (12-2)/10 = 1\). We can also see that the height of each rectangle is \(f(x_n)\), with \(n\in[2,11]\), that is, we only let the left side of the rectangle reach the function graph. If we calculate the sum of the areas of every rectangle, we get A = 18.046675645664006. This is a crude approximation to the analytical value (\((\sin{(12)} + 2\cdot 12)-(\sin{(2)} + 2\cdot 2) \approx 18.554129655173885\). But if we increase the number of rectangles, for example to 50, we get a better approximation:

It is obvious, especially in the first figure, that several areas of the rectangles lie outside the graph. They are both too far above and below the graph in several places. But the error is not as big as it might seem because we just have areas both above and below the graph. The relative error here is approximately 2.7 % with 10 rectangles and 0.65 % with 50 rectangles.

Let us illustrate the error with a linear function \(f(x) = 10x\):

In these models, we have measured the rectangle height on the left outer edge of the rectangle. In this instance, this gives us an underestimation of the exact area for each rectangle. We can also measure the height of the rectangles on the right edge. We then get a corresponding overestimation with this function:

We see here that we get a large underestimation with the left edge approximation and a large overestimation with the right edge approximation, with a relative error of approx. 7.1 %. The same happens for all functions in intervals that are either increasing or decreasing throughout the whole interval. One way to compensate for this is to use the height of the rectangles in the middle instead of the end point on the left or right:

The best approximation using rectangles is therefore attained with the midpoint method. For linear functions, we will actually get an exact value (even with one rectangle!) because the error of the rectangles are the same size above and below the function. But the midpoint approximation gives great results with other functions as well. Methods like these were used by the ancient Greeks, but they did not have a computer at their disposal! So, let’s take a look at how we can implement these methods in Python.

Implementing the rectangle methods#

Rectangle method (left corner approach)

The definite integral of a function \(f(x)\) from \(x = a\) to \(x = b\) can be approximated by the area of \(n\) rectangles with width \(h = \frac{b-a}{n}\):

\[\int_a^b f(x) \ \mathrm{d}x \approx h \sum_{k=1}^{n} f(x_k)\]

Exercise

The program below shows a function which uses the left corner approach of the rectangle rule to approximate the definite integral of a function f between a and b. Fill inn the lines that are missing in the code.

Similarly, we can describe the right-hand approximation as follows:

Rectangle method (right corner approach)

The definite integral of a function \(f(x)\) from \(x = a\) to \(x = b\) can be approximated by the area of \(n\) rectangles with width \(h = \frac{b-a}{n}\):

\[\int_a^b f(x) \ \mathrm{d}x \approx h \sum_{k=1}^{n} f(x_{k+1})\]

Exercise

Implement the algorithm for the right corner approach as a Python function. Test and compare with the left corner approach on the integral \(\int_2^8 f(x) = x^2 - 2x + 4 \ dx\).

We get the best approximation by combining the left and right corner approach, which gives us the midpoint approximation:

Rectangle method (midpoint approach)

The definite integral of a function \(f(x)\) from \(x = a\) to \(x = b\) can be approximated by the area of \(n\) rectangles with width \(h = \frac{b-a}{n}\):

\[\int_a^b f(x) \ \mathrm{d}x \approx h \sum_{k=1}^{n} f\left(\frac{1}{2}(x_k + x_{k+1})\right)\]

Exercise

Implement the algorithm for the right corner approach as a Python function. Test and compare with the left and right corner approach on the integral \(\int_2^8 f(x) = x^2 - 2x + 4 \ dx\).

If we need to integrate functions with very high or low slopes (\(|f'(x) >> 0|\)), we will need many rectangles to get a satisfactory result. And even with many rectangles, the approximation is not as good as with other methods. Let us therefore take a look at more advanced and improved methods of calculating an integral.

The Trapezoidal rule#

No, it’s not a trap. We can do better than rectangles. We know that the the top part of a rectangle is a straight, horizontal line. Such a line can be represented as a polynomial of degree zero, \(f(x) = ax^0 = a\), where \(a\) is a real number. Let’s look at the possibility of replacing this top part with a polynomial of the first degree, \(f(x) = ax^1 = ax\). Then we get trapezoids instead of rectangles. An algorithm for this is a bit less intuitive and a bit more strenous to derive, but we’ll give it a go. Let’s start with the trapezoidal method illustrated with one trapezoid in the interval \([a, b] = [2, 12]\) with \(f(x) = \cos(x) + 2\):

Let us derive an algorithm for the trapezoidal rule. Let’s start with the are of a trapezoid:

\[A_{trapezoid} = \frac{x_1 + x_2}{2}\cdot h\]

The two sides of the trapezoid, \(x_1\) and \(x_2\), can here be expressed as \(f(a)\) and \(f(b)\) respectively. The height of the trapezoid is \(b-a\) (flip the screen to the left if you don’t like that the height is where we said it is). The area can then be expressed as:

\[A = \frac{f(a) + f(b)}{2}\cdot(b-a)\]

Let us now expand this to \(n\) trapezoids. The heigh of every trapezoid is then \(h = (x_1+x_2)/n\), which gives us this expression for the area of every \(i\)-th trapezoid:

\[A_i = \frac{f(x_i) + f(x_i)}{2}\cdot h\]

If we sum the are of all \(n\) trapezoids, we get:

\[\frac{f(a)+f(a+h)}{2}h + \frac{f(a+h)+f(a+2h)}{2}h + \frac{f(a+2h)+f(a+3h)}{2}h + ... + \frac{f(a+ih)+f(b)}{2}h\]

This seems a bit too long and complicated. Let’s simplify it. First of all, we can multiplicate every term with \(h\) and divide them by 2:

\[\frac{h}{2}\cdot (f(a)+f(a+h)+f(a+h)+f(a+2h)+f(a+2h)+ ... + f(a+ih) + f(b))\]

If we add terms that are the same, we get:

\[\frac{h}{2}\cdot (f(a)+2f(a+h)+2f(a+2h)+2f(a+3h)+ ... + 2f(a+ih) + f(b))\]

Since \(f(a)\) and \(f(b)\) are the only terms not multiplied by 2, we can simplify further:

\[h\left(\frac{f(a)+f(b)}{2} + (f(a+h)+f(a+2h)+f(a+3h)+ ... + f(a+ih))\right)\]

The last collection of terms we can write as a sum. Then we get the trapezoidal rule:

Trapezoidal rule

The definite integral of a function \(f(x)\) from \(x = a\) to \(x = b\) can be approximated by the sum of the area of \(n\) trapezoids with the width \(h = \frac{b-a}{n}\):

\[\int_a^b f(x) dx \approx h\left( \frac{f(a)+f(b)}{2} + \sum_{i=1}^{n-1} f(x_i) \right)\]

Exercise

Implement the trapezoidal rule and test it on the functions \(f(x) = x^3 + 2x\) and \(f(x) = \sqrt(x)\) in the interval \([2,4]\). Compare the result with the rectangle rule(s) with the same \(n\).

Naturally, as we increase the number of trapezoids, we get an even better approximation:

Even though the trapezoidal rule is a general improvement of the rectangle rule, in most cases, we get more precise results with the midpoint rectangle rule than with the trapezoidal rule! Nevertheless, we used the trapezoidal rule to illustrate that we can increase the degree of the polynomial representing the “top part” of the geometric figures we use to split up and calculate the area. This can be generalized such that we can use any polynomial as the top part, improving our approximations further. We can use interpolation to derive such methods, but we will no

Increasing the polynomial degree comes at a cost, though. When we use a polynomial of a degree higher than 1, we need to find the function values of that polynomial in more than 2 points. This implies increased computing time. Therefore, we need to balance computational cost against how precise the method is. One great method in this regard, is Simpson’s method.

Simpson’s method#

Simpson’s method is an integration method using polynomials of degree 2 to approximate integrals. We will not derive the method here, but we will implement it and test it out. The main thing you need to understand is that the differe

Simpsons metode

The definite integral of a function \(f(x)\) from \(x = a\) to \(x = b\) can be approximated by the sum of the area of \(n\) geometric figures with the width \(h = \frac{b-a}{n}\):

\[\int_a^b f(x) dx \approx \frac{h}{3} \left( f(a) + f(b) + 2\sum_{k = 1}^{\frac{n}{2} - 1} f(x_{2k}) + 4\sum_{k=1}^{\frac{n}{2}} f(x_{2k-1}) \right)\]

We can implement the method like this:

def simpsons(f, a, b, n):
    h = (b-a)/n
    total = f(a) + f(b)
    for k in range(1,n,2):
        A += 4 * f(a + k*h)
    for k in range(2,n,2):
        A += 2*f(a + k*h)
    return A*h/3

Exercise

Study the code above and try to identify the terms in the definition of Simpson’s method.

The methods we now have looked at are based on the same concept, namely approximation of the area under a graph using geometric shapes with a rectangular base and a polynomial of degree \(n\) as the top part. Since the principle is the same, we call them a family of methods (nice, right?). This family is called Newton-Cotes. This means that, for example, the trapezoidal rule is called a Newton-Cotes method of the first degree, while Simpson’s method is a Newton-Cotes method of the second degree. There are many other methods and families within numerical integration, but we will leave those for now.

Using libraries to compute integrals#

“Have no one else implemented these methods”, you might have thought when sweating to implement the trapezoidal rule previously. The answer is “yes, they have!”. And we can use them by importing the library scipy (scientific python). Then why did you need to do it? There are two main reasons:

  1. Pedagogical reasons: By implementing numerical methods, you get a better grasp of how the methods work. Intepreting and writing code gives you insight into the mathematical principles behind each method, which might improve your understanding of what it means to integrate something.

  2. Creative and technical reasons: Self-implemented code can be altered in every way you see fit. Do you want an error message if your algorithm uses too many iterations? Or do you want your method to be a combination of different algorithms? When making a code of your own, you have total control (at least in principle) of what you make. You can be creative and solve problems in a way tailored for your specific needs.

Despite the benefits of making our own code, in our day to day lives it is useful to be able to utilize code that we know works. Therefore, we show you how to integrate a function using the trapezoidal rule, simpson’s rule and Gaussian quadrature. We haven’t introduced the last method, but you might learn to use it nonetheless (it’ great, we promise!).

from scipy import integrate
import numpy as np

def f(x):
    return x**3 - 1

n = 1000
x = np.linspace(0,5,n)
y = f(x)

# Integration
trapezoid = integrate.trapz(y,x) # Needs arrays as parameter
simpsons = integrate.simps(y,x)  # Needs arrays as parameter
gaussian_quadrature = integrate.quad(f,0,5) # Needs a function as parameter
print("Trapezoidal rule:",trapes)
print("Simpson's method:",simpsons)
print("Gaussian quadrature:",gaussian_quadrature) # Prints both the answer and the absolute error
Trapesmetoden: 151.2501565629694
Simpsons metode: 151.25000015671972
Gauss kvadratur: (151.25, 1.6959623319719519e-12)

Multiple integration#

We can also evaluate double and triple integrals with functions from the scipy library. Let’s solve the following integrals with the functions dblquad (for double integrals) and tplquad (for triple integrals):

\[\int_0^{\frac{\pi}{2}} \int_{-1}^1 x \sin(y) - y e^x \ dx dy\]
\[\int_0^{3} \int_{0}^2 \int_{0}^1 xyz \ dx dy dz\]
from scipy import integrate
import numpy as np

def f(y,x):
    return x*np.sin(y) - y*np.exp(x)

def g(z, y, x):
    return x*y*z

numerical_double = integrate.dblquad(f, -1, 1, 0, np.pi/2)
numerical_triple = integrate.tplquad(g, 0, 1, 0, 2, 0, 3)

analytical_double = np.pi**2/8 * (1/np.exp(1)-np.exp(1))
analytical_triple = 9/2
print(f'Numerical value of the double integral: {numerical_double[0]}')
print(f'Analytical value of the double integral:: {analytical_double}')

print(f'Numerical value of the triple integral:: {numerical_triple[0]}')
print(f'Analytical value of the triple integral:: {analytical_triple}')
Numerical value of the double integral: -2.899692718238082
Analytical value of the double integral:: -2.8996927182380823
Numerical value of the triple integral:: 4.5
Analytical value of the triple integral:: 4.5

Look at how precise the methods are! Quite great, right?

Symbolic integration#

In some instances, the integrals we need to compute might be solved symbolically. We can do this by hand, or we can do it by using the library sympy (symbolic python). This is a very “black box” solution, and you don’t get any insight into what is happening, but it is a useful tool for checking your math or comparing numerical methods on difficult integrals. Below is an example of indefinite integration followed by an example of definite integration.

from sympy import *

x = symbols("x") # Alternative to these two lines: x, y = symbols("x y")
y = symbols("y")

f = 4*x**3 + sin(x)
integrate(f, x)
\[\displaystyle x^{4} - \cos{\left(x \right)}\]
integrate(cos(x**2), (x, -oo, oo)) # oo means infinity, and -oo to oo are the integration limits
\[\displaystyle \frac{\sqrt{2} \sqrt{\pi}}{2}\]

Concluding remarks#

We hope you had some fun with numerical integration, and that you learned something! Here are some pointers on what you should know after working with this material:

Videos#

You can use the videos below (in Norwegian) to learn or repeat what you learned about numerical integration: