Growing with the Web

A* pathfinding algorithm

Published , updated
Tags:

Game development introduced me to programming when I was around 10 years old, and I’ve loved it ever since. One of the first formal algorithms I learned before entering university was A* (pronounced “A star”), and I really had a great time learning about it. It’s one of the most widely used pathfinding algorithms and is one that you would likely be introduced to first when approaching the subject of pathfinding. A pathfinding algorithm takes a start point (also known as a node) and a goal and attempts to make the shortest path between the two given possible obstacles blocking the way.

I’ve always thought the simplest example of pathfinding is a 2D grid in a game, it can be used to find a path from A to B on any type of graph. For example a graph where vertices are airports and edges are flights, A* could be used to get the shortest trip between one airport and another.

I’ve always thought the simplest example of pathfinding is a 2D grid in a game, but it can be used to find a path from A to B on any type of graph. For example a graph where vertices are airports and edges are flights, A* could be used to get the shortest trip between one airport and another.

Properties

A* has the following properties:

  • It is complete; it will always find a solution if it exists.
  • It can use a heuristic to significantly speed up the process.
  • It can have variable node to node movement costs. This enables things like certain nodes or paths being more difficult to traverse, for example an adventurer in a game moves slower across rocky terrain or an airplane takes longer going from one destination to another.
  • It can search in many different directions if desired.

How it works

It works by maintaining two lists; the open list and the closed list. The open list’s purpose is to hold potential best path nodes that have not yet been considered, starting with the start node. If the open list becomes empty then there is no possible path. The closed list starts out empty and contains all nodes that have already been considered/visited.

The core loop of the algorithm selects the node from the open list with the lowest estimated cost to reach the goal. If the selected node is not the goal it puts all valid neighboring nodes into the open list and repeats the process.

Part of the magic in the algorithm is that all created nodes keep a reference to their parent. This means that it’s possible to backtrack back to the start node from any node created by the algorithm.

Node

A node has a positioning value (eg. x, y), a reference to its parent and three ‘scores’ associated with it. These scores are how A* determines which nodes to consider first.

G score

The g score is the base score of the node and is simply the incremental cost of moving from the start node to this node.

g(n) = g(n.parent) + cost(n.parent, n)
cost(n_1, n_2) = \text{the movement cost from }n_1\text{ to }n_2

H score - the heuristic

The heuristic is a computationally easy estimate of the distance between each node and the goal. This being computationally easy is very important as the H score will be calculated at least once for every node considered before reaching the goal. The implementation of the H score can vary depending on the properties of the graph being searched, here are the most common heuristics.

Manhattan distance

This is the simplest heuristic and is ideal for grids that allow 4-way movement (up, down, left, right).

h(n) = |n.x - goal.x| + |n.y - goal.y|

Diagonal distance (uniform cost)

This heuristic is used for 8-way movement (diagonals) when the cost of diagonal movement is the same as the cost of non-diagonal. Note that this is very inaccurate if the costs for diagonal and non-diagonal are not the same.

h(n) = c\cdot max(|n.x - goal.x|, |n.y - goal.y|)
c = \text{cost of movement}

Diagonal distance

This heuristic is used for 8-way movement when the cost of diagonal movement differs from the non-diagonal cost. Remember that the cost of diagonal distance doesn’t need to be exact and is usually worth it to use a constant multiplier rather than the square root as the square root operation is quite expensive.

h(n) = c\_d d\_{min} + c\_n ( d\_{max} - d\_{min})
d\_{max} = max(|n.x - goal.x|, |n.y - goal.y|)
d\_{min} = min(|n.x - goal.x|, |n.y - goal.y|)
c\_n = \text{cost of non-diagonal movement}
c_d = \text{cost of diagonal movement} = c_n \sqrt{2} \approx c_n \cdot 1.414

Euclidean distance

This heuristic can be used when movement is possible at any angle, it can also be quite expensive and may be worth considering a less expensive function.

h(n) = \sqrt{(n.x - goal.x)^2 + (n.y - goal.y)^2}

You should avoid using the square root all together and calculate h(n)^2. Then make sure that g(n)^2 is used to ensure both are relative.

h(n)^2 = (n.x - goal.x)^2 + (n.y - goal.y)^2

F score

The f score is simply the addition of g and h scores and represents the total cost of the path via the current node.

f(n) = g(n) + h(n)

Pseudocode

This pseudocode allows for 8-way directional movement.

function A*(start, goal)
  open_list = set containing start
  closed_list = empty set
  start.g = 0
  start.f = start.g + heuristic(start, goal)
  while open_list is not empty
    current = open_list element with lowest f cost
    if current = goal
      return construct_path(goal) // path found
    remove current from open_list
    add current to closed_list
    for each neighbor in neighbors(current)
      if neighbor not in closed_list
        neighbor.f = neighbor.g + heuristic(neighbor, goal)
        if neighbor is not in open_list
          add neighbor to open_list
        else
          openneighbor = neighbor in open_list
          if neighbor.g < openneighbor.g
            openneighbor.g = neighbor.g
            openneighbor.parent = neighbor.parent
  return false // no path exists

function neighbors(node)
  neighbors = set of valid neighbors to node // check for obstacles here
  for each neighbor in neighbors
    if neighbor is diagonal
      neighbor.g = node.g + diagonal_cost // eg. 1.414 (pythagoras)
    else
      neighbor.g = node.g + normal_cost // eg. 1
    neighbor.parent = node
  return neighbors

function construct_path(node)
  path = set containing node
  while node.parent exists
    node = node.parent
    add node to path
  return path

Optimisation

The primary opportunity for optimisation is in selecting the right data structure for the open list. While it’s possible to implement the open list with any list data structure, the best choice is a list that has both a fast insert and extract minimum operation. The performance of the insert operation is more important than extract minimum since the algorithm typically doesn’t extract all nodes inserted into the open list, with the exception being when no path exists).

These properties are typical of priority queues, such as a binary heap which features an Θ(\log n) time complexity for both its insert and extract minimum operations. The Fibonacci heap can optimise this even further with its Θ(1) insert and O(\log n) extract minimum.

Another less frequent operation that occurs is decrease key, when the g cost of a node in the open list needs updating. The Fibonacci heap again comes out on top in this regard with a Θ(1) decrease key time complexity.

Data structure Insert Extract minimum Decrease key
Array Θ(1)* O(n) Θ(1)
Sorted array O(\log n) Θ(1) O(n)
Binary heap Θ(\log n) Θ(\log n) Θ(\log n)
Fibonacci heap Θ(1) O(\log n)* Θ(1)*

* Amortised

Demo

See the algorithm in action along with Dijkstra’s algorithm in the pathfinding visualiser web app I created. The source is available on GitHub.

Textbooks

Here are two CS textbooks I personally recommend; the Algorithm Design Manual (Steven S. Skiena) is a fantastic introduction to data structures and algorithms without getting to deep into the maths side of things, and Introduction to Algorithms (CLRS) which provides a much deeper, math heavy look.

 

Like this article?
Subscribe for more!