Disjoint-Set Forest
Introduction
A Disjoint-Set Forest (Union-Find Forest) is a data structure that keeps track of a partition of a set into disjoint subsets. It supports three main operations:
- Add: Add a new tree to the forest.
- Union: Merge two trees into one.
- Find: Determine which tree a particular element belongs to.
This data structure is particularly useful in applications such as network connectivity, clustering, and Kruskal’s algorithm for finding the Minimum Spanning Tree (MST).
Explanation
The Disjoint-Set Forest is implemented using a collection of trees, where each tree represents a disjoint set.
Let’s denote the union operation as union(x, y)
which merges the sets containing elements x
and y
,
and the find operation as find(x)
which returns the representative (or root) of the set containing x
.
And also let’s denote the node as the form value: parent
, where value
is the element and parent
is the parent node in the tree.
Now we will seek the most efficient way to implement these operations. Consider the forest below:
graph TD
A((1:1)):::root;B((2:2)):::root;C((3:3)):::root;
D((4:4)):::root;E((5:5)):::root;F((6:6)):::root;G((7:7)):::root;
classDef root fill:#aaa,stroke:#444,stroke-width:2px;
Root nodes of each tree are indicated with a darker color.
After performing union(1, 2)
, union(3, 4)
and union(5, 6)
, the forest becomes:
graph TD
A((1:1)):::root;B((2:1));C((3:3)):::root;
D((4:3));E((5:5)):::root;F((6:5));G((7:7)):::root;
A---B; C---D; E---F;
classDef root fill:#aaa,stroke:#444,stroke-width:2px;
After performing union(1, 4)
, the forest becomes:
graph TD
A((1:1)):::root;B((2:1));C((3:1));
D((4:3));E((5:5)):::root;F((6:5));G((7:7)):::root;
A---B; A---C---D; E---F;
classDef root fill:#aaa,stroke:#444,stroke-width:2px;
After performing union(4, 5)
, the forest becomes:
graph TD
A((1:1)):::root;B((2:1));C((3:1));
D((4:3));E((5:3));F((6:5));G((7:7)):::root;
A---B; A---C---D; E---F; C---E;
classDef root fill:#aaa,stroke:#444,stroke-width:2px;
After performing find(6)
, we can see that 6
belongs to the tree with root 1
,
so we can update the parent of nodes on the path from 6
to 1
to point directly to 1
.
graph TD
A((1:1)):::root;B((2:1));C((3:1));
D((4:3));E((5:1));F((6:1));G((7:7)):::root;
A---B & E & F; A---C---D;
classDef root fill:#aaa,stroke:#444,stroke-width:2px;
This process is known as path compression and helps to flatten the structure of the tree, making future find
operations faster.
Complexity
The time complexity of the find
operation is nearly constant, specifically $O(α(n))$, where α is the inverse Ackermann function.
The union
operation also has a time complexity of $O(α(n))$ when using path compression and union by size or rank,
since it involves two find
operations followed by a constant-time merge.
Code
Let’s see the sample code.
const int MAX;
int par[MAX];
void init(){
for(int i=0; i<MAX; i++) par[i]=i;
}
int Find(int x){
if(par[x]==x) return x;
return par[x]=Find(par[x]);
}
void Union(int u,int v){
u=Find(u), v=Find(v);
par[v]=u;
}
We can optimize the Union
operation further by using union by size or union by rank.
This means that when we merge two trees, we always attach the smaller tree under the root of the larger tree.
- Union by size
int size[MAX];
void init(){
for(int i=0; i<MAX; i++) par[i]=i, size[i]=1;
}
void Union(int u,int v){
u=Find(u), v=Find(v);
if(u==v) return;
if(size[u]<size[v]) swap(u,v);
par[v]=u;
size[u]+=size[v];
}
- Union by rank
int rank[MAX];
void init(){
for(int i=0; i<MAX; i++) par[i]=i, rank[i]=0;
}
void Union(int u,int v){
u=Find(u), v=Find(v);
if(u==v) return;
if(rank[u]<rank[v]) swap(u,v);
par[v]=u;
if(rank[u]==rank[v]) rank[u]++;
}
Applications
Disjoint-Set Forests are widely used in various applications, including:
- Network Connectivity: To determine if two nodes are in the same connected component.
- Kruskal’s Algorithm: To find the Minimum Spanning Tree of a graph.
- Image Processing: To segment images into connected components.