Dijkstra’s Algorithm with Adjacency Lists
How to implement a standard Graph algorithm in Python utilizing various data structures.
Background
Dijkstra’s algorithm is a method for finding the shortest path between nodes in a graph. In the instance below, we will be using Dijkstra’s algorithm to find the minimum distance between Node 1 and Node 5.
You can find the python code in the link below for following along!
Before jumping into the pseudo-code and implementations of the algorithm, let’s break down some of the means of representing our data.
Adjacency Matrix
The first option, an adjacency matrix, is perhaps the most intuitive representation of our graph. Each node, or vertex, is represented as a column and row entry. For a graph of V vertices, the matrix representation has a dimension of V x V. The matrix values correspond to the weights, or cost, of traversal to another vertex. The following adjacency matrix represents our example problem:
[[0, 7, 9, 0, 0, 14],
[7, 0, 10, 15, 0, 0],
[9, 10, 0, 11, 0, 2],
[0, 15, 11, 0, 6, 0],
[0, 0, 0, 6, 0, 9],
[14, 0, 2, 0, 9, 0]]
Note that because the graph is undirected, the weight at the [i][j] entry is equal to the weight at the [j][i] entry. If there is no direct path from [i] to [j], then the weight of [i][j] = 0. Although an Adjacency Matrix is more intuitive, it poses a significant storage issue for large and sparse graphs. Namely, we have O(n²) additional storage requirements.
Several online tutorials exist to guide the heap, O(V + E log(V)), and the array, O(V²), implementations of Dijkstra’s algorithm. Please refer to the links below to explore this design method!
Adjacency List
The option we will explore in this article, an adjacency list, will cut down on storage requirements by only storing weights of connected vertices! To do so, we will implement a default dictionary, keeping track of connections and weights. Below is the default dictionary implementation for our graph.
{1: [[2, 7], [3, 9], [6, 14]],
2: [[1, 7], [3, 10], [4, 15]],
3: [[1, 9], [2, 10], [6, 2], [4, 11]],
4: [[3, 11], [2, 15], [5, 6]],
5: [[6, 9], [4, 6]],
6: [[1, 14], [3, 2], [5, 9]]}
The key values are the source nodes, and the list values are all connected [destination, weight] pairs. We have eliminated the need to store the sparse 0’s from a matrix representation of our graph!
Now that we have worked through some of the prerequisites before implementing our algorithm let’s jump right in!
Dijkstra’s Algorithm
- Create a seen set. The set will keep track of all previously visited nodes.
- Initialize a distance array, representing the total distance/cost to visit a given node by distance[node]. The initial distances will be set to infinity for each node.
- Set the distance for the source node equal to 0.
distance[source] = 0 - Set source as the current node.
- Iterate through all connected nodes to current. If distance[current] + weight from current to connected < distance[connected], then update distance[connected] = distance[current] + weight. Remember, we are storing the connected nodes and weights as [destination, weight] pairs in our adjacency list!
- Add current to seen. We will now preclude revisiting the current node.
- Select the node with the smallest distance, distance[node], that is not in seen. Mark this node as current and repeat from step 5.
- Once all nodes are in seen, there are no remaining nodes connected to current, or there are no reductions in distance, then the algorithm is finished.
Let’s explore this with our graph above! We will start with a slower, O(n²), implementation before exploring the advantages of the heap data structure.
Implementation
Distance array, dist, is instantiated with initial values of infinity for all nodes.
Node 1 is marked as current.
dist[1] = 0
Iterate through all connected nodes and update distances.
Node 1 is added to seen.
Node 2 has the smallest distance for any node not in seen, so it is marked as current.
Repeat the algorithm starting from step 5!
Node 2 is added to seen.
Node 3 has the next smallest distance and is not in seen.
Because the distance from (Node 1 → Node 3 → Node 6) < the distance from (Node 1 → Node 6), we update dist[6] = 11 as opposed to the previous value of 14!
Node 6 is now updated to be the current node, updating all possible connected nodes.
We have now updated Node 5 to a value of 20. Does this mean our algorithm is complete?
NO!
Although it is easy enough to look at this simple example and know that 20 is the smallest distance value possible for Node 5, our algorithm will continue one more step. The algorithm can only terminate once the conditions of Step 8 are satisfied. There are two more omitted iterations before algorithm completion. The process for these last two steps remains the same.
Improvements?
Of course there are! The bottleneck operation in our current algorithm implementation is locating the node with the next smallest distance. If only we had a data structure that focused on O(1) time to find the minimum, which could be sorted in O(log(n)) time. Oh, wait… THE HEAP!
Without going into complete detail on the heap data structure, it allows us the exact location and sort times detailed above. Using a heap will drastically improve our algorithm performance. I opted to utilize Python’s built-in heapq module vice constructing my own. A previous link contains the information needed to build the functions and methods for implementing your Heap data structure. Instead, below is how I adopted a heap for implementing an improved Dijkstra Algorithm.
Closing
As we have seen, there are many decisions when it comes to modeling a graph and implementing Dijkstra’s algorithm. In this article, we primarily explored storing our graph in an Adjacency List by using a dictionary. An adjacency list allows us to reduce storage costs while still upgrading our algorithm through heap application.
Please feel free to explore and use the linked code for any personal projects!