In this post, we learn how to construct the Hessian matrix of a function and find out how the Hessian helps us determine minima and maxima.
What is a Hessian Matrix?
The Jacobian matrix helps us find the local gradient of a non-linear function. In many applications, we are interested in optimizing a function. If our function were modeling a production system, we would like to get the largest possible output for the smallest possible combination of inputs (the function variables).
The Jacobian only points us in the direction of the next local maximum, but it doesn’t tell us whether we’ve actually reached the global optimum. This problem is commonly described via the hill walking analogy:
Suppose you are walking around in the hills at night and you would like to find the highest peak. You can’t see further than a few meters because it is dark. If you followed the direction of the highest slope, you’d eventually end up on a saddle or on a hill, but it might not be the highest point. The Hessian gives you a way to determine whether the point you are standing on is, in fact, the highest hill.
How to Find the Hessian Matrix?
The Hessian matrix is a matrix of the second-order partial derivatives of a function.
The easiest way to get to a Hessian is to first calculate the Jacobian and take the derivative of each entry of the Jacobian with respect to each variable. This implies that if you take a function of n variables, the Jacobian will be a row vector of n entries. The Hessian will be an n \times n matrix.
If you have a vector-valued function with n variables and m vector entries, the Jacobian will be m \times n , while the Hessian will be m \times n \times n .
Let’s do an example to clarify this starting with the following function.