Root finding

Section 4 Root finding

In elementary or middle school, we learn how to solve algebraic equations. Equations that are algebraic can be solved with familiar operations: addition, multiplication, exponentiation. However, there are very simple equations that cannot be solved with algebra. Consider

\begin{equation*} \cos x = x. \end{equation*}

There is no way to extract the $x$ from inside the cosine to isolate it on one side of the equation. Even algebraic equations may be impossible to solve. Consider

\begin{equation*} x^6 - x - 1 = 0. \end{equation*}

There is no formula that can be used to factor this equation (like the quadractic formula). If we're not fortunate enough to get a polynomial that has obvious “nice” factors, the only way to proceed is to approximate the solutions. Questions like this fall under the general category of root finding problems. A root is a solution to the equation $f(x) = 0\text{.}$

Subsection 4.1 Bisection method

We're going to start with a straightforward equation to illustrate our first method. Let's find solutions to the equation

\begin{equation*} x^2 - 2 = 0. \end{equation*}

Now obviously, the answers are $\pm \sqrt{2}\text{,}$ but how much good is that for computation or application? Where do the values for $\sqrt{2}$ even come from? (Hint: $\sqrt{2}$ is defined to be the solution to this equation).

It's clear from the plot generated by the code above that the solutions to $x^2 - 2 = 0$ are the $x$-intercepts of the function $f(x) = x^2 - 2\text{.}$ If you were asked to guess what the roots were you might say “between -2 and -1 and also between 1 and 2”. That observation is the first step in a set of methods called bracketing. At this point, we can take advantage of one of the core theorems of calculus: the Intermediate Value Theorem.

Theorem 4.1.

Let $f$ be a continuous function on the interval $[a,b]\text{.}$ Then for every $y$ falling strictly between $f(a)$ and $f(b)\text{,}$ there exists a number $c$ strictly between $a$ and $b$ so that $f(x) = y\text{.}$

A more familiar formulation might state “if a continuous $f$ is positive at a point $a$ and negative at a point $b$ then it must have gone through 0 somewhere”. This can be restated into a condition easy to check in an algorithm.

Theorem 4.2.

Suppose that $f$ is continuous on $[a,b]\text{.}$ Then $[a,b]$ contains a root of $f$ if $f(a)f(b) \lt 0\text{.}$

So we arrive at the main idea of the bisection method, which you can think of as a narrowing process. In short, we'll bracket the root with an interval, cut it in half, then use the condition to figure out which half contains the root. Repeating this process will produce an arbitrarily small interval containing the root, which we'll denote $\alpha\text{.}$ We can treat the midpoint of this tiny interval as an approximation for the root $\alpha\text{.}$

Consider our example, which is to find a root of the function $f(x) = x^2 - 2\text{.}$

From the graph, we can bracket one of the roots with $a = 1, b = 2\text{.}$
We can check to see that we've chosen good values by looking at $f(a)f(b) = (-1)(2) = -2 \lt 0\text{,}$ and so the IVT guarantees a root lies in $[1,2]\text{.}$
Now bisect the interval. Compute the midpoint $m = \frac{a+b}{2} = 1.5\text{.}$
We'll check the interval $[1, 1.5]$ for the root. Since $f(1)f(1.5) = -.25\text{,}$ this interval contains the root.
We can repeat the process with our new guess, $m = 1.25\text{,}$ and so on until we're satisfied with the accuracy of our calculation.

Subsection 4.2 Error

So how do we decide when to stop the process listed above? Since we don't know what the answer is, we can't use “distance from the correct answer” as a measure of error. Instead, we'll have to measure something about the process of iteration itself. We need to make the measurements of error relative to the process, not just absolute, to be useful. This will be discussed in more detail when we introduce Newton's method.

Definition 4.3.

Given an iterative process that produces an approximation $m_k$ at each step $k\text{,}$ the absolute relative approximate error at step $k$ is

\begin{equation*} \abs{\epsilon_k} = \abs{\frac{m_k - m_{k-1}}{m_k}}. \end{equation*}

If an iterative approximation converges, then the error will decrease from step to step. Thus, we can use error to determine when to stop. Before we begin the process of approximation, we choose a tolerance - that is, a small number that is the maximum possible error we're willing to allow. Then, we can impose a stopping condition whenever $\abs{\epsilon_k} \lt tol\text{.}$

Subsection 4.3 Bisection algorithm

Algorithm 4.4.

Suppose that $f$ is continuous on $[a,b]$ and that $f(a)f(b) \lt 0\text{.}$ Let $tol$ be a given tolerance. Let $maxiter$ be the maximum number of iterations. Set $m = \frac{a+b}{2}$ and $iter = 0\text{.}$

Set $iter = iter + 1\text{.}$ If $iter \gt maxiter\text{,}$ exit and return failed to converge.
If $f(a)f(m) \lt 0\text{,}$ set $b = m\text{.}$ Otherwise set $a = m\text{.}$
Set $m' = \frac{a + b}{2}\text{.}$
Set $\abs{\epsilon} = \frac{m' - m}{m'}\text{.}$
If $\abs{\epsilon} \lt tol\text{,}$ return $m'\text{.}$
Otherwise, let $m = m'$ and go to step 1.

The bisection method always converges, so we do not need to include a maximum iteration counter, though we do anyway as later methods may not converge and can potentially loop forever.

We'll now present an example of the bisection method used to compute the value of $\sqrt{2}\text{.}$

Subsection 4.4 False position (Regula Falsi)

The method of false position is one of the oldest methods for guessing the solution to an equation. We're going to use a version of it that is specifically built for root finding. This method can have significantly faster convergence than the bisection method, as it takes functional behavior into consideraton.

Consider the equation $x^2 - 2 = 0\text{,}$ the solutions of which are the roots of the function $f(x) = x^2 - 2\text{.}$

As in the bisection method, we notice that the interval $[1,2]$ must contain a root, as the function passes through the axis between them. (That is, $f(1)f(2) \lt 0\text{.}$) To find an approximation of the zero, we construct the line between $(a, f(a))$ and $(b, f(b))\text{.}$ We can approximate the root of $f(x)$ by the $x$-intercept of the line that we've drawn.

Already from the picture above we can see that we've got a reasonable approximation of the zero.The slope the line is going to be given by $m = \frac{f(b) - f(a)}{b - a}\text{,}$ and some quick algebra with the point slope form of the line will give you that the $x$-intercept of the line segment is

\begin{equation*} m = \frac{a f(b) - b f(a)}{f(b) - f(a)}. \end{equation*}

If we iterate this method, we should see the intercepts of the line segments “march” towards the actual zero. At this point, the method is identical to the bisection method: we identify which of the two subintervals created by the new approximation is the bracking subinterval, we replace the relevant endpoint, and we repeat the line trick.

Algorithm 4.5.

Set $iter = iter + 1\text{.}$ If $iter \gt maxiter\text{,}$ exit and return failed to converge.
If $f(a)f(m) \lt 0\text{,}$ set $b = m\text{.}$ Otherwise set $a = m\text{.}$
Set $m' = m = \frac{a f(b) - b f(a)}{f(b) - f(a)}\text{.}$
Set $\abs{\epsilon} = \frac{m' - m}{m'}\text{.}$
If $\abs{\epsilon} \lt tol\text{,}$ return $m'\text{.}$
Otherwise, let $m = m'$ and go to step 1.

Subsection 4.5 Classes of differentiability

Our next set of techniques do not involve bracketing. In exchange for losing a bracket that is guaranteed to contain a root of a function, we get methods that converge significantly faster when certain hypotheses are met. Speed of convergence will be discussed in detail in a later section.

The main hypotheses of derivative based methods is that the functions being worked with are “nice enough”. Nice as a mathematical terms is an amusing catchall for “whatever is needed to make this theorem run”, but it usually refers to smoothness of some kind. That is, the slope of the function is well behaved as we move around the $x$-values.

In Calculus 1, we learn that

\begin{equation*} f'(x) = \lim_{h\to 0} \frac{f(x + h) - f(x)}{(x + h) -h} \end{equation*}

where we're using a form that emphasizes the fact that the derivative is the limit of secant lines. A function for which $f'$ exists for every $x \in (a,b)$ is called differentiable on $(a,b)\text{.}$ Be careful. It's easy to be lulled into a false sense of security about differentiable functions.

Let's consider a family of functions called the topologist's sine curves. Here is the first member of the family.

\begin{align*} f(x) \amp= \sin \frac{1}{x} \amp\text{ if } x \amp\neq 0 \\ \amp = 0 \amp\text{ if } x \amp= 0 \end{align*}

Notably, this function does not have a limit as $x \to 0$ from the left or the right. We can try to fill the singularity in the function with the point $(0,0)\text{,}$ but that doesn't make the function continuous. The reason why should be clear from the graph below: as the function approaches $x = 0$ it oscillates increasingly quickly, essentially filling the area on the $y$-axis between -1 and 1. Since the function has a value of 0 at 0, but no limit, $f$ is discontinuous there.

Let's look at the next member of the family:

\begin{align*} f(x) \amp= x \sin \frac{1}{x} \amp\text{ if } x \amp\neq 0 \\ \amp = 0 \amp\text{ if } x \amp= 0 \end{align*}

The $x$ in front forces the function to go to 0, capturing it between the lines $y = x$ and $y = -x\text{.}$

The graph above should make it obvious that now, the point $(0,0)$ fills the hole in the function and makes $f$ continuous at $0\text{.}$ What about the derivative?

The green function above is the derivative. The formula for the derivative makes sense everywhere but at 0. So we'll use the defintion to see if the derivative is defined there.

\begin{align*} f'(0) \amp= \lim_{h \to 0} \frac{f(0+h) - f(0)}{h}\\ \amp= \lim_{h \to 0} \frac{f(h)}{h}\\ \amp= \lim \frac{h \sin(1/h)}{h} \\ \amp= \lim_{h \to 0} \sin \frac{1}{h} \end{align*}

which does not exist. So $f$ is continuous, but not differentiable. Another member of the family:

\begin{align*} f(x) \amp= x^2 \sin \frac{1}{x} \amp\text{ if } x \amp\neq 0 \\ \amp = 0 \amp\text{ if } x \amp= 0 \end{align*}

Again, the derivative formula makes sense everywhere but 0. At 0, we can use the definition:

\begin{align*} f'(0) \amp= \lim_{h \to 0} \frac{f(0+h) - f(0)}{h}\\ \amp= \lim_{h \to 0} \frac{f(h)}{h}\\ \amp= \lim \frac{h^2 \sin(1/h)}{h} \\ \amp= \lim_{h \to 0} h \sin \frac{1}{h} = 0. \end{align*}

Here's the complete graph of the derivative, given that $f'(0) = 1\text{.}$ It should be obvious that even though the derivative exists everywhere, it is not continuous.

As we keep increasing the power of $x\text{,}$ we get functions with better properties. What happens if we increase the power one more time?

\begin{align*} f(x) \amp= x^3 \sin \frac{1}{x} \amp\text{ if } x \amp\neq 0 \\ \amp = 0 \amp\text{ if } x \amp= 0 \end{align*}

As before, the green function is the derivative. Notice that now the derivative is also converging to 0 as $x \to 0\text{.}$ That is, $f$ is continuous, differentiable everywhere, AND the derivative is continuous.

This is what we mean by “nice” in the sense of approximation. In order for derivative methods to be well-behaved, the derivatives should exist AND be continuous

Definition 4.6.

Let $f$ be a function on an interval $(a,b)\text{.}$ The function $f$ is said to be of class $C^k$ if $f$ can be differentiated $k$ times and $f^{(k)}$ is continuous on $(a,b)\text{.}$

$C^1$ functions are also called continuously differentiable. Thus, by filling in the point at $(0,0)\text{,}$ we can say

$\sin(1/x)$ is discontinuous on $(-a,a)\text{;}$
$x \sin(1/x)$ is continuous on $(-a,a)\text{;}$
$x^2 \sin(1/x)$ is differentiable on $(-a,a)\text{;}$
$x^3 \sin(1/x)$ is $C^1$ on $(-a,a)\text{.}$

This can be continued. Generally, $C^1$ functions are the bare minimum we require to call a function “nice”. You should also note that this is a hierarchy of regularity: every subsequent level has all the properties of the level that came prior.

Subsection 4.6 Newton-Raphson method

We come to a very powerful method that illustrates perhaps the central idea of numerical analysis (and indeed most of continuous mathematics):

Every function is a line.

This might seem like an absurd statement, but with some slight adjustments, maybe we can be convinced.

Every (nice enough) function is a line (if you look closely enough).

In fact, this is the essential point of working with tangent lines. Indeed, the same thinking applies in more variables as well (where lines get replaced by planes).

Functions are difficult to find $x$-intercepts for, unless those functions are lines. So what we'll do is just pretend the function is a line, and find its intercept. In more familiar terms, we'll replace the function with its tangent line approximation.

As an easy example, consider $f(x) = x^2 - 2\text{,}$ our old friend that lets us find the value of $\sqrt{2}\text{.}$ Suppose that we guess that the root is close to $x = 2\text{.}$

Indeed, the tangent line (in blue) lands very close to the root in question (on the red curve). We'd like to be able to iterate this procedure. Suppose that we're given a point $(x_0, f(x_0))\text{.}$ The tangent line to $f$ at that point is

\begin{equation*} y - f(x_0) = f'(x_0)(x - x_0). \end{equation*}

Solving for $x$ when $y = 0$ gives the equation

\begin{equation*} x = x_0 - \frac{f(x_0)}{f'(x_0)}. \end{equation*}

This simple equation is the heart of Newton's method. To iterate the approximation, let $x_1 = x$ and solve the equation again:

\begin{equation*} x_2 = x_1 - \frac{f(x_1)}{f'(x_1)}. \end{equation*}

The hope is that given a good enough guess, that this sequence of approximations will approach the root. Let's see how it bahaves in this case.

What the output above shows is that starting from $x = 2\text{,}$ Newton's method converges to within .0000000001% of the true value of $\sqrt{2}$ in just four steps.

At this point, excited, we decide to test another case.

That is one flat function. But that means the tangent line is going to have a slope very close to 0 if we guess to the left of the root...

That is, a guess of $x_0 = .7\text{,}$ which is pretty close to the root displayed on the graph, gives a second approximation of 88.3929, which means computing the second tangent line at the point $(88.3928, 8.47884 \times 10^{38})\text{.}$ Not good.

Subsection 4.7 Newton's method requirements and problems

So we've discovered the first problem in the last example. Newton's method does not work well near places where functions are very flat. In particular, this means that we'll want to avoid local extrema -- hitting near one with an approximation could send the tangent line way beyond the approximation area.

What other problems might arise?