set union time complexity

stackoverflow.com › questions › 313422 › running-time-of-set-union-operation

The complexity of add/find(collision), would depend on the implementation of union.

If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.

Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.

Answer from Akusete on Stack Overflow

CP-Algorithms

cp-algorithms.com › data_structures › disjoint_set_union.html

Disjoint Set Union - Algorithms for Competitive Programming

The simple implementation does ... to the representative. This simple modification of the operation already achieves the time complexity $O(\log n)$ per call on average (here without proof). There is a second modification, that will make it even faster. In this optimization ...

Stack Overflow

stackoverflow.com › questions › 313422 › running-time-of-set-union-operation

algorithm - Running time of set union operation - Stack Overflow

Top answer

1 of 4

The complexity of add/find(collision), would depend on the implementation of union.

If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.

Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.

2 of 4

First answer: If you are dealing with sets of numbers, you could implement a set as a sorted vector of distinct elements. Then you could implement union(S1, S2) simply as a merge operation (checking for duplicates), which takes O(n) time, where n = sum of cardinalities.

Now, my first answer is a bit naive. And Akusete is right: You can, and you should, implement a set as a hash table (a set should be a generic container, and not all objects can be sorted!). Then, both search and insertion are O(1) and, as you guessed, the union takes O(n) time.

(Looking at your Python code) Python sets are implemented with hash tables. Read through this interesting thread. See also this implementation which uses sorted vectors instead.

Running time of set union operation

stackoverflow.com › questions › 313422 › running-time-of-set-union-operation

The complexity of add/find(collision), would depend on the implementation of union.

If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.

Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.

Answer from Akusete on Stack Overflow

Discussions

algorithm - Why is the time complexity of performing n union find (union by size) operations O(n log n)? - Stack Overflow

Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find. I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity ... More on stackoverflow.com

stackoverflow.com

data structures - Why time complexity of union-find is $O(lgN)$ with only "Union by Rank"? - Computer Science Stack Exchange

I saw time complexity of union and find function depends on some conditions. ... I understood $O(N)$ because of skewed tree of disjoint set. But, I cannot understand $O(\log N)$. I looked on Wikipedia and Stack Overflow, but I couldn't find it. More on cs.stackexchange.com

cs.stackexchange.com

August 20, 2018

time complexity - Union of multiple overlapping sets efficiently? - Computer Science Stack Exchange

I have $n$ sets, each of which overlaps heavily with the other sets, and I want the union of all of them. The obvious solution is to take the union of each set, one by one, which results in $O(n^2)$ More on cs.stackexchange.com

cs.stackexchange.com

December 12, 2021

c++ - What is the time complexity of this code Union Of two Arrays using set_union? - Stack Overflow

set_union assumes sorted arrays - which is not guaranteed in your example. with this assumption it's linear complexity O(N+M) where N and M are the sizes of the arrays. ... @StPiere It is actually guaranteed that these are sorted arrays as this is an unordered set, so it gets sorted automatically when inserted(i dont know how, if u can explain this too) . OK so time ... More on stackoverflow.com

stackoverflow.com

Wikipedia

en.wikipedia.org › wiki › Disjoint-set_data_structure

Disjoint-set data structure - Wikipedia

2 weeks ago - They do not perform a complexity analysis. Variants of disjoint-set data structures with better performance on a restricted class of problems have also been considered. Gabow and Tarjan showed that if the possible unions are restricted in certain ways, then a truly linear time algorithm is possible.

History Representation Operations Time complexity Other structures Backtracking Applications

Finxter

blog.finxter.com › home › learn python blog › python set union()

Python Set union() - Be on the Right Side of Change

May 1, 2021 - The runtime complexity of the set.union() method on a set with n elements and a set argument with m elements is O(n + m) because you need to create an empty set and insert all n elements, and then insert all m elements into the newly created set.

Princeton CS

cs.princeton.edu › ~wayne › kleinberg-tardos › pdf › UnionFind.pdf pdf

January 15, 2020 - Theorem. Using link-by-size, any UNION or FIND operation takes O(log n) time

Stack Overflow

stackoverflow.com › questions › 53149097 › why-is-the-time-complexity-of-performing-n-union-find-union-by-size-operations

algorithm - Why is the time complexity of performing n union find (union by size) operations O(n log n)? - Stack Overflow

Top answer

1 of 3

Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?

If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).

Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.

The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.

Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.

2 of 3

You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.

Note that O(n lg* n) is better than O(n log n)

In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.

Stack Exchange

cs.stackexchange.com › questions › 96401 › why-time-complexity-of-union-find-is-olgn-with-only-union-by-rank

data structures - Why time complexity of union-find is $O(lgN)$ with only "Union by Rank"? - Computer Science Stack Exchange

Top answer

1 of 1

Firstly, here is the precise statement of the case 2 in OP's question.

A sequence of m MAKE-SET, UNION, and FIND operations, n of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank in worst-case time $\text{[math]}$ .

In fact, it is trivial to see that the MAKE-SET and UNION operations run in $\text{[math]}$ time. So the question is really about the time-complexity of the FIND operation.

Here are the properties of the disjoint-set forest you can prove.

Given any node $\text{[math]}$ that has been added to the set of union-find data structure, the ranks of the nodes in a path from $\text{[math]}$ to its root is strictly increasing.
A node $\text{[math]}$ which is root of a sub-tree with rank $\text{[math]}$ has at least $\text{[math]}$ nodes. (This fact can be proved by induction on $\text{[math]}$ ).
Every node has rank at most $\text{[math]}$ .

With properties 1 and 3, it should be easy to prove the desired time complexity.

The above explanation follows roughly the section 21.4 in Introduction To Algorithms, third edition by CLRS. Readers are encouraged to read the chapter 21 of that book.

Find elsewhere

Google Bing Mojeek

Medium

medium.com › @yusufaksoyeng › optimized-disjoint-set-union-find-data-structure-12af9493f874

Disjoint-Set (Union-Find) Data Structure with Optimizations | by Yusuf Aksoy | Medium

May 30, 2022 - Even though time complexity is approximately constant, it is not exactly constant. It is O(m α(N)) where m is a number of Find or Union operations and α is Inverse Ackermann Function.

Medium

tarunjain07.medium.com › union-find-disjoint-set-union-dsu-notes-24f3e228858d

Union Find | Disjoint Set Union (DSU) — [Notes] | by Tarun Jain | Medium

February 15, 2023 - Assume that n is the number of make set operations and m is the total number of make set, union, and find operations combined[n-1 union operations]. The amortized asymptotic complexity works out to O(mα(n)) Where α is the inverse Ackerman function.

Stack Exchange

cs.stackexchange.com › questions › 146595 › union-of-multiple-overlapping-sets-efficiently

time complexity - Union of multiple overlapping sets efficiently? - Computer Science Stack Exchange

Top answer

1 of 1

The answer to your question depends heavily on the representation of the sets. For simplicity, I am assuming that we have $n$ sets of integers, where all the integers are between $0$ and $n-1$ (conforming to your statement that the $n$ sets each have up to $n$ elements).

If the sets are represented in a generic set data structure such as a hash table, as pointed out in the comments, that means even reading the input takes $\Omega(n^2)$ time in the worst case. This implies that it is completely impossible to avoid the $n^2$ algorithm of unioning all the sets together.
One common compact representation of sets for practical results is using bit-vectors. If the length of a machine integer is 8 bytes (64 bits), then we represent a set of up to $n$ elements using a bit-vector of length $n$ which is represented as an array of $\frac{n}{64}$ integers. As bitwise xor is an $O(1)$ hardware operation, unioning two of these sets only takes $\frac{n}{64}$ time, and doing this $n$ times we have $O(\frac{n^2}{64})$ for the whole union. Still quadratic, but much better in practice.
There are other interesting representations of sets. One representation is using logical predicates: a set here is a formula $\varphi(n)$ which is true exactly when $n$ is in the set. For example, the set of all integers between $1$ and $10000$ would be represented as the predicate "$1 < n < 10000$" (expressed as a syntax tree). Unioning this kind of predicate together can be much faster than unioning the sets, if the sets are of a simple structure, like ranges ("$1 < n < 10000$"), unions of a few different cases ("$n < 5 \text{ or } n = 10$"), or specific elements ("$n = 5$"). When unioning, we apply simplifications to the resulting predicates (e.g., "$x \text{ or } x$" simplifies to "$x$").
Finally, extending this, one way of representing predicates that happens to be especially effective is using a binary decision diagram, which is kind of like a decision tree. These structures lead to a pretty compact encoding of common sets like ranges and predicates. They can be unioned together efficiently in $O(m)$ where $m$ is the size of the BDD structure, so the total time will be $O(m \cdot n)$. For both this and the formula representation, there is a necessary assumption that the sets we are unioning together are "not too complex" and have a more compact representation than just listing out the elements. But, even in the worst case, it only takes $O(n \log n)$ space to store a set of $n$ elements explicitly as a BDD, so we aren't paying much of a cost, and in practice, the sets will often overlap significantly and we will get a more compact representation.

Stack Overflow

stackoverflow.com › questions › 64926723 › what-is-the-time-complexity-of-this-code-union-of-two-arrays-using-set-union

c++ - What is the time complexity of this code Union Of two Arrays using set_union? - Stack Overflow

Top answer

1 of 1

In your code, you are using std::set's. In the C++ standard library, unfortunately, std::set's are ordered (and we have std::unordered_set). Thus, most of the "hard work" in your code is actually converting the vectors into ordered sets; that takes O(n log(n) + m log(m)) time. The union is - almost certainly - linear, as @StPiere suggests, so an additional O(n+m) time.

Betterdatascience

betterdatascience.com › python-set-union

Python Set union() - A Complete Guide in 5 Minutes | Better Data Science

January 24, 2022 - Inserting an element into a set has a complexity of O(1). Here’s the Python code that calculates and displays a figure of set size on the X-axis and runtime in seconds on the Y-axis.

GeeksforGeeks

geeksforgeeks.org › dsa › union-by-rank-and-path-compression-in-union-find-algorithm

Union By Rank and Path Compression in Union-Find Algorithm - GeeksforGeeks

10:17

It turns out, that the final amortized time complexity is O(α(n)), where α(n) is the inverse Ackermann function, which grows very steadily (it does not even exceed for n<10600 approximately).

Published January 17, 2026

Codeforces

codeforces.com › blog › entry › 117038

Sets union in O(N) - Codeforces

June 5, 2023 - The ordinary solution using ... x2 ∩ x3| Obviously the time complexity to compute union of N sets by this approach is O(2 ^ N)....

Linux Kernel

docs.kernel.org › core-api › union_find.html

Union-Find in Linux — The Linux Kernel documentation

June 21, 2024 - These optimizations reduce the average time complexity of each find and union operation to O(α(n)), where α(n) is the inverse Ackermann function.

Stack Overflow

stackoverflow.com › questions › 34460492 › time-complexity-of-array-based-disjoint-set-data-structure

algorithm - Time Complexity of Array based Disjoint-Set data structure - Stack Overflow

Top answer

1 of 2

Without path compression : When we use linked list representation of disjoint sets and the weighted-union heuristic, a sequence of m MAKE-SET, UNION by rank , FIND-SET operations takes place where n of which are MAKE-SET operations. So , it takes O(m+ nlogn).

With only path compression : The running time is theta( n + f * ( 1 + (log (base( 2 + f/n)) n ) ) ) where f is no of find sets operations and n is no of make set operations

With both union by rank and path compression: O( m*p(n )) where p(n) is less than equal to 4

2 of 2

The time complexity of both union and find would be linear if you use neither ranks nor path compression, because in the worst case, it would be necessary to iterate through the entire tree in every query.

If you use only union by ranks, without path compression, the complexity would be logarithmic.
The detailed solution is quite difficult to understand, but basically you wouldn't traverse the entire tree, because the depth of the tree would only increase if the ranks of the two sets are equal. So the iteration would be O(log*n) per query.

If you use the path compression optimization, the complexity would be even lower, because it "flattens" the tree, thus reducing the traversal. Its amortized time per operation is even faster than O(n), as you can read here.

reddit.com › r/learnprogramming › confused about time and space complexity of union find by rank with path compression

r/learnprogramming on Reddit: Confused about time and space complexity of union find by rank with path compression

September 1, 2022 -

Does path compression eventually make operations O(1)? But if it starts at log(n), is that the overall time complexity? If we have to loop over every edge, shouldn't that be accounted for?

Top answer

1 of 2

Technically, the Union Find algorithm has an amortized runtime of α(n) where α is the inverse ackermann function. This means that if you invoke the function N times for large values of N, the total time it takes is N * α(n). Obviously, if you stop doing union operations at some point and then just do nothing but find operations, then the find operations will eventually take constant time. But a sequence of operations involving both union and find operations means that the amortized runtime will never reach O(1). This is only a distinction that academics will care about since the inverse ackermann function never grows above a value of 5 even for incredibly large values of N. So it is not technically constant in theory but may as well be in practice.

2 of 2

Good question. Yes, the first call is potentially O(N), but since we're modifying the data structure so that the answer will be known immediately on the next call, the second call and on will be O(1). Explaining the complexity when it changes over multiple calls can be somewhat complex. To answer that question, we need to be careful about exactly what we're describing. If we assume that we have a pre-existing data set and we're just calling find() on the same thing over and over again, then yeah, as we keep calling it, the complexity averages out to O(1) as we call it infinite times. But that may not be super realistic if we have to first measure the cost of building the data structure and then also measure the cost of calling Find(). A common way of measuring a sequence of calls is called "amortized time complexity." We can use that to describe a constant time MakeSet method that makes the next call to Find() more expensive each time it's called, but it's somewhat more mathy and confusing than measuring basic time complexity.

Stack Exchange

cs.stackexchange.com › questions › 48649 › complexity-of-union-find-with-path-compression-without-rank

Complexity of union-find with path-compression, without rank - Computer Science Stack Exchange

Top answer

1 of 2

Seidel and Sharir proved in 2005 [1] that using path compression with arbitrary linking roughly on $m$ operations has a complexity of roughly $O((m+n)\log(n))$.

See [1], Section 3 (Arbitrary Linking): Let $f(m,n)$ denote the runtime of union-find with $m$ operations and $n$ elements. They proved the following:

Claim 3.1. For any integer $k>1$ we have $f(m, n)\leq (m+(k−1)n)\lceil \log_k(n) \rceil$.

According to [1], setting $k = \lceil m/n \rceil + 1$ gives $$f(m, n)\leq (2m+n) \log_{\lceil m/n\rceil +1}n$$.

A similar bound was given using a more complex method by Tarjan and van Leeuwen in [2], Section 3:

Lemma 7 of [2]. Suppose $m \geq n$. In any sequence of set operations implemented using any form of compaction and naive linking, the total number of nodes on find paths is at most $(4m + n) \lceil \log_{\lfloor 1 + m/n \rfloor}n \rceil$ With halving and naive linking, the total number of nodes on find paths is at most $ (8m+2n)\lceil \log_{\lfloor 1 + m/n \rfloor} (n) \rceil $.

Lemma 9 of [2]. Suppose $m < n$. In any sequence of set operations implemented using compression and naive linking, the total number of nodes on find paths is at most $ n + 2m \lceil \log n\rceil + m$.

[1]: R. Seidel and M. Sharir. Top-Down Analysis of Path Compression. Siam J. Computing, 2005, Vol. 34, No. 3, pp. 515-525.

[2]: R. Tarjan and J. van Leeuwen. Worst-case Analysis of Set Union Algorithms. J. ACM, Vol. 31, No. 2, April 1984, pp. 245-281.

2 of 2

I don't know what the amortized running time is, but I can cite one possible reason why in some situations you might want to use both rather than just path compression: the worst-case running time per operation is $\Theta(n)$ if you use just path compression, which is much larger than if you use both union by rank and path compression.

Consider a sequence of $n$ Union operations maliciously chosen to yield a tree of depth $n-1$ (it is just a sequential path of nodes, where each node is the child of the previous node). Then performing a single Find operation on the deepest node takes $\Theta(n)$ time. Thus, the worst-case running time per operation is $\Theta(n)$.

In contrast, with the union-by-rank optimization, the worst-case running time per operation is $O(\log n)$: no single operation can ever take longer than $O(\log n)$. For many applications, this won't matter: only the total running time of all operations (i.e., the amortized running time) will matter, not the worst-case time for a single operation. However, in some cases the worst-case time per operation might matter: for instance, reducing the worst-case time per operation to $O(\log n)$ might be useful in an interactive application where you want to make sure that no single operation can cause a long delay (e.g., you want a guarantee that no single operation can cause the application to freeze for a long time) or in a real-time application where you want to ensure that you will always meet the real-time guarantees.

Python-bloggers

python-bloggers.com › 2022 › 01 › python-set-union-a-complete-guide-in-5-minutes

Python Set union() – A Complete Guide in 5 Minutes | Python-bloggers

January 24, 2022 - Inserting an element into a set has a complexity of O(1). Here's the Python code that calculates and displays a figure of set size on the X-axis and runtime in seconds on the Y-axis.

University of Washington

courses.cs.washington.edu › courses › cse332 › 16sp › lectures › Lecture25 › 25_ho.pdf pdf

1 CSE 332: Data Abstractions Union/Find II Richard Anderson Spring 2016

Roots are the names of each set. Observation: we will only traverse these trees upward ... Find(x) follow x to the root and return the root. ... Union(i, j) - assuming i and j roots, point j to i. ... Find(1) n steps!! ... Can we do better? Yes! ... Find takes amortized time of almost Θ(1). ... Negative up-values correspond to sizes of roots. ... Amortized complexity is then Q(a(m, n)) .