Union Find: Introduction
Let’s understand the Union Find pattern, its real-world applications, and some problems we can solve with it.
We'll cover the following
Overview#
The union find pattern is used to group elements into sets based on a specified property. Each set is non-overlapping, that is, it contains unique elements that are not present in any other set. The pattern uses a disjoint set data structure such as an array, to keep track of which set each element belongs to.
Each set forms a tree data structure and has a representative element that resides at the root of the tree. Every element in this tree maintains a pointer to its parent. The representative’s parent pointer points to itself. If we pick any element in a set and follow its parent pointers, we’ll always reach the set representative.
The pattern is composed of two operations performed on the disjoint data structure:
find(x): Finds the representative of the set that contains x.
union(x, y): Merges the sets of x and y into one.
Here is how the union find pattern uses these operations to form and extract information from sets of elements:
1 of 7
2 of 7
3 of 7
4 of 7
5 of 7
6 of 7
7 of 7
Note: In the above illustration, we skipped the union(3,4) function to demonstrate how trees of heights greater than can be merged.
For now, the worst-case time complexity of this approach is , since we might have to traverse the entire tree to find the representative of an element.
We can improve the union find pattern through an optimization called union by rank. We maintain a rank for each tree. The larger the size of the tree, the greater the rank. The idea is that when merging two trees with the union method, we always attach a tree of a lower rank to one with a higher rank. This ensures that when two trees are merged, each element in this merged tree has the shortest possible path to the root.
1 of 7
2 of 7
3 of 7
4 of 7
5 of 7
6 of 7
7 of 7
This brings the time complexity down to in the worst case.
Another optimization to the union find pattern is path compression, where on each find operation on a node of a tree, we update the parent of that node to point directly to the root. This reduces the length of the path of that node to the root, ensuring we don’t have to travel all the intermediate nodes on future find operations.
The following illustration shows how path compression works:
1 of 2
2 of 2
Both these optimizations used together bring down the worst-case time complexity to lower than
Note: The time complexities explained above were for a single Union Find operation. When we perform
Union Find operations, the time complexity becomes .
Examples#
The following examples illustrate some problems that can be solved with this approach:
1 of 4
2 of 4
3 of 4
4 of 4
Does my problem match this pattern?#
-
Yes, if any of these conditions is fulfilled:
- The problem requires arranging elements with a certain property into groups, or, to use graph terminology, into connected components.
- We have been given a problem that contains elements represented as separate sets initially where we have to combine pairs of sets or find whether two elements belong to the same set or not.
- The problem data is best organized in the form of a graph, yet the data has not been provided in the form of an adjacency list/matrix.
-
No, if either of these conditions is fulfilled:
- Solving the problem does not require considering the input data as a graph.
- A graph has already been provided in the form of an adjacency list/matrix, and no new edges need to be added, so it would be more efficient to use either breadth-first or depth-first search.
Real-world problems#
Many problems in the real world share the union find pattern. Let’s look at some examples.
Image segmentation through region agglomeration: Divide a digital image into regions of similar colors. Initialize each pixel to be its own region and then merge the two adjacent regions that have the most similar color. Union find lets us know which region each pixel belongs to and updates that information when two regions are merged.
Image manipulation: Image editing applications often use union find to locate different objects within the image. This allows for several functionalities, such as to grab different objects in an image, select regions by color, differentiate between the background and the image etc.
Network connectivity: Connect devices to each other and find out if there is a path connecting one device to another.
Percolation: Identify the percolation threshold of a liquid through a filter. We randomize an x grid for the filter, which contains blocked and open sites. Union find is used to connect open sites together. If a connected component of open sites is present from the top of the grid to the bottom, the liquid can percolate.
Hex (game): Make connections between pieces of the same color to identify whether a player has won.
Strategy time!#
Match the problems that can be solved using the union find pattern.
Note: Select a problem in the left-hand column by clicking it, and then click one of the two options in the right-hand column.
Check if a graph is bipartite.
Union Find
Detect a cycle in an undirected graph.
Some other pattern
Find the longest subarray of 1’s after deleting one element.
Identify the missing elements in a given range.
Solution: Longest Substring without Repeating Characters
Number of Islands