Simple and Steepest Ascent Hill Climbing
Introduction
Hill climbing is one of the simplest metaheuristic optimization methods that, given a state space and an objective function to maximize (or minimize), tries to find a sufficiently good solution. Often the solution found is not the best solution (global optimum) to the problem at hand, but it is the best solution given a reasonable amount of time.
The Jupyter Notebook can be found HERE.
The state space
The state space is a graphical representation of the set of all possible configurations (or nodes, or states) of a system and the value of the objective function. A one-dimensional state space diagram is depicted below.
Figure 1. State space diagram
The state space is characterized by:
Starting state: the state where we start at.
Global maximum: that's the best solution, the best possible state. At this state the objective function has reached its maximum.
Local maximum: it is a state that is better than its neighboring state, but it is not as good at the global maximum. However, given a reasonable amount of time it could be an acceptable solution.
Plateau: it is a particular type of local maximum. It is a flat region where the neighboring states have the same value of the objective function.
Ridge: it is another particular type of local maximum, that is located between a better maximum and a minimum of the state space diagram.
The operator
In simple words, the operator is the function that identifies or generates the neighbors given the current state. The operator can be very complex or very simple. In this tutorial I will leverage two operators: one that merely adds and subtracts a constant number from the current state and one that alters the order of a sequence. More details below.
The hill climbing algorithms
In this article we will code two versions of the hill climbing algorithm: simple hill climbing and steepest ascent hill climbing. Both are greedy algorithms in contrast to stochastic hill climbing which performs some exploration.
Before jumping into the details and pseudo code of both algorithms, we need to talk about states (or nodes) and neighboring states (or nodes). A node is an identifiable, often discrete, state of the state space where we can calculate the value of the objective function. What a state looks like strongly depends on the problem we are solving.
A neighbor state is a state that is algorithmically close and similar to the current state. What a neighbor state is and how is generated also depends on the use case.
In this article we will solve two problems and each has its own definition of state and neighboring state. More details below.
1 - Simple hill climbing
Pseudocode:
STEP1: Select a current state
STEP2: Generate one neighbor of the current state
STEP3: Evaluate the objective function at the proposed neighbor
STEP4: If the value of the objective function at the proposed neighbor
is better than at the current state, the proposed neighboring state
becomes the current state
STEP5: Repeat STEP2-STEP4 for n iterations or until the value of the
objective at the current state is higher than all neighbors'.
STEP6: Return current state and its objective function's value
Simple hill climbing is the least computational consuming but does not generate the best optimal solution.
2 - Steepest ascent hill climbing
Steepest ascent hill climbing is a variation of the simple hill climbing algorithm. Instead of evaluating one single neighboring state, steepest ascent hill climbing evaluates all the neighboring states and select the one that improves the objective function the most (steepest).
STEP1: Select a current state
STEP2: Generate all neighboring states of the current state
STEP3: Evaluate the objective function at all neighbors
STEP4: If the objective function at the current state has higher value
than all neighboring states', than the current state is already a
maximum: TERMINATE THE SEARCH and jump to STEP6.
Otherwise the neighbor with the highest improve of the objective
function becomes the current state
STEP5: Repeat STEP2-STEP4 for n iterations
STEP6: Return current state and its objective function's value
This algorithm is more computation expensive because it evaluates multiple neighbors at once.
Implementation of simple hill climbing
I will utilize the simple hill climbing to solve two problems: a classic maximization problem of an objective function with known shape and the Traveling Salesman Problem (TSP).
Example 1: Maximization of an objective function
The simple hill climbing will find the maximum (hopefully the global maximum) of the following objective function:
for x in the interval [-8, 20]. The shape of the objective function in the given interval is:
Figure 2. Objective function state space diagram
Note the presence of one global maximum, 2 local maxima and one plateau. A discretized representation of x is shown at the bottom of the graph (blue dots): each dot is a state (or node) of the state space. Each state (except the first and last one) has two neighbors, one on each side. The distance between two consecutive states is 0.2. Thus from each node, we con mode left or right by an amount of 0.2: that's the operator function for this use case.
Representing x as a discrete variable, helps building an intuition of what a state can be, facilitates the coding of the algorithm and helps applying hill climbing to more complicated problems.
The simple hill climbing algorithm is enclosed inside a single function which expects as inputs: the objective function, the list of all states, the step size and the number of iterations. A boolean variable specifies whether to display which states the algorithm walked through.
The first step is to randomly pick a starting state, among all the nodes (discrete Xs). The empty list named 'path' will store all the states the algorithm had stepped through:
Figure 3. Simple hill climbing - part 1
Inside the for loop, both the left and right neighboring states are evaluated. The neighbor that has higher value of the objective function than the current state, becomes the new current state. If both the right and left neighbors are less optimal than the current state, we reached a maximum and the loop is interrupted:
Figure 4. Simple hill climbing - part 2
Finally the steps taken by the algorithm during training are plotted, and the optimal state and the value of the objective function at the optimal state are returned:
Figure 5. Simple hill climbing - part 3
The result of two independent runs is shown below:
Figure 6. Simple hill climbing - result
In both runs the algorithm stopped at a local maximum, which as stated before can be satisfactory for most optimization problems. All hill climbing algorithms have this limitation but there is a strategy that increases the chances of finding the global maximum: multiple restarts. As the name suggests we run the algorithm several times and keep the best state found, presumably the global maximum. Running simple hill climbing 30 times was enough to find the global maximum:
Figure 7. Multiple restarts - global maximum found
Next, let's check among all the 30 runs, how many led to the global maximum:
Figure 8. Multiple restarts - maximum frequency
Only 2 out of the 30 runs led to the discovery of the global maximum. This should not surprise because only a small portion of the random initialized initial states (when X > 17) can climb to the global maximum.
Example 2: The Traveling Salesman Problem (TSP)
This is a very famous and important problem that can be efficiently solved by hill climbing. The problem goes like this:
- A salesman has to visit each city of his itinerary and returns to the original city
- Each city has to be visited only once
- The goal of the algorithm is to find the shortest path connecting all cities.
As before, to run hill climbing we must define state, neighboring states and the operator. In this problem a state is a path that connects all cities: at the beginning of the algorithm this path, assuming we have no information on the best itinerary, is randomly initialized. A neighboring state is a path that is similar to the current path: swapping two cities in the current path would create a neighboring state. The function that swaps two cities is the operator. That's all we need to solve the TSP.
First let's create a random virtual map of 20 cities:
Figure 9. Cities' location a 2D map
The x and y coordinates of the 20 cities are stored inside a pandas data frame.
Figure 10. Pandas data frame with cities' information
The column 'City' of the data frame specifies the path the salesman will follow: using Figure 9 as example, the salesman will start with city_1, visits city 2, then city 3, city 4 ... you get the point.
Next, three key functions are defined: one that generates a random path given the data frame of Figure 10; one that calculates the total distance traveled by the salesman according based the data frame's path; and the function that swaps 2 cities:
Figure 11. Functions to create random path, calculate distance, swap cities
Note that, the swap_cities function creates a generator that yields "adjacent" swaps at each iteration. This neighbors-generating function is the operator.
The function below display the data frame's path on the cities' map:
Figure 12. Function that displays the path on the map
It is time for the main function of the hill climbing algorithm: first a random path is generated and its total length calculated. Next, inside a nested for loop, 2-city swaps (neighbor) are repeatedly generated and evaluated: if the swap shortens the distance traveled, it becomes the new current tour and the inner loop is interrupted, initiating another iteration of the algorithm.
Figure 13. Main function of simple hill climbing algorithm
Running the simple hill climbing algorithm for 10 and 100 iterations found paths that are 644 and 375 units long, respectively. By comparison the path shown in Figure 10 is 703 units long. The optimal paths found after 10 and 100 iterations are depicted below:
Figure 14. TSP solved with simple hill climbing
While 10 iterations are not sufficient to generate a good enough solution, with 100 iterations simple hill climbing found a reasonable short enough path. From the graph it is hard to tell if it is also the best solution, but if it is not, is for the sure the second best. Multiple restarts could be implemented to maximize the chance of finding the shortest path, but I will skip implementing that.
Implementation of steepest ascent hill climbing
Example 1: The traveling salesman problem (TSP)
The difference between simple and steepest ascent hill climbing is that the latter evaluates all the neighboring states at once and picks the one that is closest to the best solution: for the TSP that's the one that shortens the path the most. The code for steepest ascent hill climbing is very similar to the one of the simple hill climbing.
The function to generate the starting state and calculate the total distance are the same. The operator function is modified to return all the neighboring states at once:
Figure 15. Function that generates all neighbors of the current path
The main function was modified to evaluate all the neighbors at once and pick the best one (shortest path):
Figure 16. Main function of the steepest ascent hill climbing algorithm
With just 10 iterations the algorithm was able to find a path that is 389 units long, just a little bit longer than what the simple hill climbing algorithm found with 100 iterations.
Figure 17. TSP solved with steepest ascent hill climbing
It goes without saying that we could perform multiple restarts with steepest ascent hill climbing as well.
Concluding remarks
We implemented both simple hill climbing and steepest ascent hill climbing algorithms from scratch. We familiarized with the concepts of state, neighboring state and operator. To reiterate one more time, both algorithms are prone to get stuck at a local maximum, but that's not a deal breaker for most problems we are trying to solve: a good solution in a reasonable amount of time is often good enough.
In a future article I will introduce simulated annealing which is another hill climbing like algorithm that varies exploration and exploitation over time such that the probability of hitting the global maximum is maximized.
I hope you enjoyed this article. Feel free to drop comments, questions and criticisms.