Series expansion of a function with several variables¶

The starting point is the familiar Taylor series expansion for a single variable: \begin{equation} f(x+dx) = f(x) + \frac{df}{dx}dx + \frac{1}{2!}\frac{d^2 f}{dx^2}dx^2 + \frac{1}{3!}\frac{d^3 f}{dx^3}dx^3 + \cdots \end{equation}

For a function of multiple variables (which we shall take to be $x$, $y$ and $z$ in this case), we want to find the series expansion in terms of $dx$, $dy$ and $dz$. That is we want to develop a series expansion for $f(x+dx,y+dy,z+dz)$

First we keep $y+dy$ and $z+dz$ constant and expand only in terms of $x$ using the familiar series expansion for a single variable:

\begin{eqnarray} f(x+dx,y+dy,z+dz)=f(x,y+dy,z+dz)+\frac{\partial f(x,y+dy,z+dz)}{\partial x}dx + \frac{1}{2}\frac{\partial^2 f(x,y+dy,z+dz)}{\partial x^2}dx^2+... \end{eqnarray}

In the next step, expand the first term on the right hand side as a series in $y$

\begin{eqnarray} &=& f(x,y,z+dz) + \frac{\partial }{\partial y}f(x,y,z+dz)dy + \frac{1}{2}\frac{\partial^2 }{\partial y^2}f(x,y,z+dz)dy^2\nonumber \\ &+&\frac{\partial }{\partial x}f(x,y+dy,z+dz)dx + \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y+dy,z+dz)dx^2...\nonumber \\ \end{eqnarray}

And now expand the first term as a series in $z$.

\begin{eqnarray} && = f(x,y,z) + \frac{\partial}{\partial z}f(x,y,z)dz + \frac{1}{2}\frac{\partial^2}{\partial z^2}f(x,y,z)dz^2 \nonumber \\ &+& \frac{\partial }{\partial y}f(x,y,z+dz)dy + \frac{1}{2}\frac{\partial^2 }{\partial y^2}f(x,y,z+dz)dy^2\nonumber \\ &+&\frac{\partial }{\partial x}f(x,y+dy,z+dz)dx + \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y+dy,z+dz)dx^2... \end{eqnarray}

expanding the fourth term in a series in $z$ (the fifth term is not expanded as we want terms only up to second order in the differentials):

\begin{eqnarray} && f(x,y,z) + \frac{\partial}{\partial z}f(x,y,z)dz + \frac{1}{2}\frac{\partial^2}{\partial z^2}f(x,y,z)dz^2 \nonumber \\ &+& \frac{\partial}{\partial y}f(x,y,z)dy+\frac{\partial^2}{\partial y \partial z}f(x,y,z)dydz+\frac{1}{2}\frac{\partial^2}{\partial y^2}f(x,y,z)dy^2 \nonumber \\ &+& \frac{\partial }{\partial x}f(x,y+dy,z+dz)dx \nonumber \\ &+& \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y+dy,z+dz)dx^2 \end{eqnarray}

expanding the seventh term as a series in $y$:

\begin{eqnarray} && f(x,y,z) + \frac{\partial}{\partial z}f(x,y,z)dz + \frac{1}{2}\frac{\partial^2}{\partial z^2}f(x,y,z)dz^2 \nonumber \\ &+& \frac{\partial}{\partial y}f(x,y,z)dy+\frac{\partial^2}{\partial y \partial z}f(x,y,z)dydz+\frac{1}{2}\frac{\partial^2}{\partial y^2}f(x,y,z)dy^2 \nonumber \\ &+& \frac{\partial }{\partial x}f(x,y,z+dz)dx + \frac{\partial^2 }{\partial x\partial y}f(x,y,z+dz)dxdy\nonumber \\ &+& \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y+dy,z+dz)dx^2 \end{eqnarray}

expanding the seventh term as a series in $z$:

\begin{eqnarray} && f(x,y,z) + \frac{\partial}{\partial z}f(x,y,z)dz + \frac{1}{2}\frac{\partial^2}{\partial z^2}f(x,y,z)dz^2 \nonumber \\ &+& \frac{\partial}{\partial y}f(x,y,z)dy+\frac{\partial^2}{\partial y \partial z}f(x,y,z)dydz+\frac{1}{2}\frac{\partial^2}{\partial y^2}f(x,y,z)dy^2 \nonumber \\ &+& \frac{\partial }{\partial x}f(x,y,z)dx + \frac{\partial^2 }{\partial x\partial z}f(x,y,z)dxdz + \frac{\partial^2 }{\partial x\partial y}f(x,y,z+dz)dxdy\nonumber \\ &+& \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y+dy,z+dz)dx^2 \end{eqnarray}

any other expansions will lead to higher order terms, so omitting those, up to second order, we have:

\begin{eqnarray} && f(x,y,z) + \frac{\partial}{\partial z}f(x,y,z)dz + \frac{1}{2}\frac{\partial^2}{\partial z^2}f(x,y,z)dz^2 \nonumber \\ &+& \frac{\partial}{\partial y}f(x,y,z)dy+\frac{\partial^2}{\partial y \partial z}f(x,y,z)dydz+\frac{1}{2}\frac{\partial^2}{\partial y^2}f(x,y,z)dy^2 \nonumber \\ &+& \frac{\partial }{\partial x}f(x,y,z)dx + \frac{\partial^2 }{\partial x\partial z}f(x,y,z)dxdz + \frac{\partial^2 }{\partial x\partial y}f(x,y,z)dxdy\nonumber \\ &+& \frac{1}{2}\frac{\partial^2 }{\partial x^2}f(x,y,z)dx^2 \end{eqnarray}

The whole thing can be written as \begin{eqnarray} f(x+dx,y+dy,z+dz) &=& f(x,y,z) \nonumber \\ &+& \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz \nonumber \\ &+& \frac{1}{2}\frac{\partial^2 f}{\partial x^2}dx^2+\frac{1}{2}\frac{\partial^2 f}{\partial y^2}dy^2+ \frac{1}{2}\frac{\partial^2 f}{\partial z^2}dz^2 \nonumber \\ &+& \frac{\partial^2f}{\partial x\partial y} dxdy + \frac{\partial^2f}{\partial y\partial z} dydz + \frac{\partial^2f}{\partial z\partial x} dzdx \end{eqnarray}

The first order terms (terms 2, 3, 4 on the right hand side) can be written as:

\begin{eqnarray} \left(\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y} , \frac{\partial f}{\partial z}\right) \cdot \left( \begin{array}{r} dx & \\ dy &\\ dz &\\ \end{array} \right) \end{eqnarray}

The above is the first-order change in $f$ due to changes in $x$, $y$ and $z$.

At an extremal point (maximum or minimum) or a saddle point, the above is zero. Since $dx$, $dy$ and $dz$ are arbitrary, it follows that $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial y}$ and $\frac{\partial f}{\partial z}$ must be separately zero.

While the second order terms (terms 5 to 10) can be written as

\begin{eqnarray} \left(dx , dy , dz\right) \cdot \frac{1}{2} \cdot \left( \begin{array}{rrr} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \\ \end{array} \right) \cdot \left( \begin{array}{r} dx & \\ dy &\\ dz &\\ \end{array} \right) \end{eqnarray}

The above is the second-order change in $f$ due to changes in $x$, $y$ and $z$.

For a minimum, the above is positive and for a maximum, the above is negative, no matter what (dx, dy, dz) are.

The matrix above is the Hessian matrix denoted by $H$. To be explicit:

\begin{eqnarray} H = \frac{1}{2} \left( \begin{array}{rrr} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \\ \end{array} \right) \end{eqnarray}

Conditions for maxima or minima¶

The fact that at the point of minima (bzw. maxima), the above is positive (bzw. negative) no matter what the values of $dx$, $dy$ and $dz$ are, imposes certain conditions on $H$. Let's look at these conditions:

\begin{eqnarray} && f(x+dx,y+dy,z+dz) \nonumber \\ &=& f(x,y,z) + \left(\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y} , \frac{\partial f}{\partial z}\right) \cdot \left( \begin{array}{l} dx & \\ dy &\\ dz \\ \end{array} \right) \nonumber \\ &+& \left(dx , dy , dz \right)\cdot \frac{1}{2} \cdot \left( \begin{array}{rrr} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \\ \end{array} \right) \cdot \left( \begin{array}{l} dx & \\ dy &\\ dz \\ \end{array} \right) + \cdots \end{eqnarray}

At at an extremal point or a saddle point, the second term on the right hand side is zero. That leaves us with the third term, which is the second order change in $f$ which can be written as

\begin{equation} d^2f = (\Delta {\bf x})^T \cdot H \cdot (\Delta {\bf x}) \end{equation}

which actually is valid for any number of dimensions, not just three. In our present case of three dimensions we have

$\Delta {\bf x} = \left(\begin{array}{l} dx & \\ dy &\\ dz \\ \end{array}\right)$

and $H = \frac{1}{2} \cdot \left( \begin{array}{rrr} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \\ \end{array} \right)$.

At this stage, I'm going to do a trick. It's actually quite well known in the world of physics, and which I've used in this video on waves and oscillations (4:55 onwards). The trick involves inserting the identity matrix $S^{-1}S$, where $S$ is any invertible matrix with dimensions of $H$. With this, we have

\begin{equation} d^2f = (\Delta {\bf x})^T (S^{-1}S) \cdot H \cdot (S^{-1}S) (\Delta {\bf x}) \end{equation}

Since matrix multiplication is associative, the above is the same as

\begin{equation} d^2f = ((\Delta {\bf x})^T S^{-1} )\cdot (S H S^{-1})\cdot (S \Delta {\bf x}) \end{equation}

Now - this is where it gets really interesting - if we choose the columns of $S^{-1}$ to be the eigenvectors of $H$, then $SHS^{-1}$ is a diagonal matrix, with the eigenvalues of $H$ on the diagonals. (This is quite easy to prove using the rules of matrix multiplication.) But that is not all. Notice that the Hessian matrix $H$ is symmetric, that is, $H_{ij}= H_{ji}$ (i.e. $\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}$). There is a very interesting theorem that, if the columns of $S$ are the eigenvectors of a symmetric matrix, then the inverse of $S$ is the transpose of $S$, i.e. $S^{-1}=S^T$ (an equivalent statement is that $S$ is an orthogonal matrix). As mentioned, $H$ is already symmetric, so this theorem allows us to replace $S^{-1}$ by $S^T$ in the above equation:

\begin{equation} d^2f = ((\Delta {\bf x})^T S^T )\cdot (S H S^T)\cdot (S \Delta {\bf x}) \end{equation}

where $SHS^T$ is a diagonal matrix consisting of the eigenvalues of $H$. We also say that $SHS^T$ is an orthogonal transformation that diagonalizes the Hessian.

Further, from a basic statement of matrix algebra, $(\Delta {\bf x})^TS^T = (S \Delta {\bf x})^T$ which allows us to write

\begin{equation} d^2f = (S \Delta {\bf x})^T\cdot (S H S^T)(S \Delta {\bf x}) \end{equation}

The case of two dimensions¶

In the case of two dimensions, $S\Delta {\bf x}$ is a two-dimensional column vector, $H$ is a $2\times 2$ matrix, and $SHS^T$ is a $2\times 2$ diagonal matrix consisting of the eigenvalues of $H$. Let $\lambda_1$ and $\lambda_2$ be the two eigenvalues of $H$, and let $S\Delta {\bf x} = \left(\begin{array}{l} \alpha & \\ \beta \\ \end{array}\right)$, where the exact values of $\alpha$ and $\beta$ are not important. This allows us to write for $d^2f$:

\begin{eqnarray} d^2 f = (\alpha , \beta) \left( \begin{array}{rr} \lambda_1 & 0 \\ 0 & \lambda_2 \\ \end{array} \right) \left(\begin{array}{l} \alpha & \\ \beta &\\ \end{array}\right) \end{eqnarray}

Carrying out the matrix multiplication, it immediately yields: \begin{equation} d^2f = \alpha^2 \lambda_1 + \beta^2 \lambda_2 \end{equation}

For a local minimum, $d^2f$ is always positive and vice-versa. We now use the fact that $\alpha^2$ and $\beta^2$ are always positive to find some conditions that $\lambda_1$ and $\lambda_2$ must satisfy for a point to be a local maximum or minimum.

Local Minimum. For this case we must always have $\alpha^2 \lambda_1 + \beta^2 \lambda_2 >0$, regardless of what $dx$ and $dy$, and consequently $\alpha$ and $\beta$, are. Moreover, since $\alpha^2$ and $\beta^2$ are always positive, this is possible only if both $\lambda_1 > 0$ and $\lambda_2 > 0$. In this case $\lambda_1\lambda_2 > 0$.
Local Maximum. For this case we must always have $\alpha^2 \lambda_1 + \beta^2 \lambda_2 < 0$, regardless of what $dx$ and $dy$, and consequently $\alpha$ and $\beta$, are. Since $\alpha^2$ and $\beta^2$ are always positive, this is possible only if both $\lambda_1 < 0$ and $\lambda_2 < 0$. In this case also $\lambda_1\lambda_2 > 0$.
Hence if $\lambda_1\lambda_2 > 0$ at a point where all the first derivatives of $f$ are zero, then we are definitely dealing with either a local maximum or a local minimum, and not a saddle point. But it is not sufficient to tell us which of the two it is (maximum or minimum).
It is trivial to see that $\lambda_1\lambda_2= det(SHS^T)$. But $det(SHS^T) = det(SS^TH)=det(H)$ and hence $\lambda_1\lambda_2$ is also the determinant of $H$.
If $\lambda_1\lambda_2 < 0$, i.e. if $det(H) < 0$ at a point where all the first derivatives of $f$ are zero, then we are dealing with a saddle point, i.e. along some directions the point is a maximum and along other directions the point is a minimum - like the saddle of a horse. A simple example is $f(x,y)=x^2-y^2$, whose first derivatives vanish at the point $(0,0)$, and whose Hessian at this point is $\left( \begin{array}{rr} 1 & 0 \\ 0 & -1 \\ \end{array} \right)$ and whose determinant is consequently negative.

Further condition to determine if the extremum is a minimum or a maximum¶

If $\lambda_1\lambda_2 > 0$, i.e. if $det(H) > 0$, then either $\lambda_1$ and $\lambda_2$ are both negative, or are both postive. The former corresponds to a maximum, and the latter to a minumum, as the first and second points in this list show. But to find out which of the two it is, we need one more condition. What is it? Well, let the extremum point be a minimum, to be concrete, let $(a,b)$ be the point. Now choose a line in the $xy$ plane which goes through this point and whose direction is parallel to the $x$-axis. Since a local minimum means that the point will be a minimum no matter which direction we choose, it will be a minimum along this line also. And along this line, $y$ has the constant value of $b$, and hence $f(x,y)=f(x,b)$ is just a function of $x$ ($y$ being constant). Now we know from the theory of functions of one variable that at a minimum the second derivative is positive. Keeping $y$ constant at the value $b$, the second derivative of $f$ with respect to $x$ at $x=a$ is thus positive at this point of minimum. However this second derivative is nothing else than $\frac{\partial^2 f}{\partial x^2}$, evaluated at the point $(a,b)$, which must thus be positive at a local minimum. And $\frac{\partial^2 f}{\partial x^2}$ is the element in the first row and first column of the Hessian: $H_{11}=\frac{\partial^2 f}{\partial x^2}$. Hence if we have found that the point where the first derivatives vanish and established that the determinant of $H$ at this point is positive, then if we see that $H_{11}$ at this point is also positive, then we have established that we have a local minimum. By the way, a similar reasoning shows that $H_{22}$ also must be positive, because the point $(a,b)$ is a local minimum even when we take a path parallel to the $y$-axis and passing through $x=a$, for in this case again we have a function of one variable, the variable being $y$ in this case. Hence it suffices to show that either $H_{11}$ or $H_{22}$ is positive - the positivity of one ensures the positivity of the other.

In the same way, if $det(H) > 0$, and if $H_{11}<0$, then we are dealing with a local maximum.

Random Forests

Hessian matrix and functions of multiple variables

Series expansion of a function with several variables¶

Conditions for maxima or minima¶

The case of two dimensions¶

Further condition to determine if the extremum is a minimum or a maximum¶

Series expansion of a function with several variables¶

Conditions for maxima or minima¶

The case of two dimensions¶

Further condition to determine if the extremum is a minimum or a maximum¶

social