The complexity of add/find(collision), would depend on the implementation of union.
If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.
Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.
Answer from Akusete on Stack OverflowThe complexity of add/find(collision), would depend on the implementation of union.
If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.
Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.
First answer: If you are dealing with sets of numbers, you could implement a set as a sorted vector of distinct elements. Then you could implement union(S1, S2) simply as a merge operation (checking for duplicates), which takes O(n) time, where n = sum of cardinalities.
Now, my first answer is a bit naive. And Akusete is right: You can, and you should, implement a set as a hash table (a set should be a generic container, and not all objects can be sorted!). Then, both search and insertion are O(1) and, as you guessed, the union takes O(n) time.
(Looking at your Python code) Python sets are implemented with hash tables. Read through this interesting thread. See also this implementation which uses sorted vectors instead.
The complexity of add/find(collision), would depend on the implementation of union.
If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.
Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.
Answer from Akusete on Stack Overflowalgorithm - Why is the time complexity of performing n union find (union by size) operations O(n log n)? - Stack Overflow
data structures - Why time complexity of union-find is $O(lgN)$ with only "Union by Rank"? - Computer Science Stack Exchange
c++ - What is the time complexity of this code Union Of two Arrays using set_union? - Stack Overflow
time complexity - Union of multiple overlapping sets efficiently? - Computer Science Stack Exchange
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
Does path compression eventually make operations O(1)? But if it starts at log(n), is that the overall time complexity? If we have to loop over every edge, shouldn't that be accounted for?
Without path compression : When we use linked list representation of disjoint sets and the weighted-union heuristic, a sequence of m MAKE-SET, UNION by rank , FIND-SET operations takes place where n of which are MAKE-SET operations. So , it takes O(m+ nlogn).
With only path compression : The running time is theta( n + f * ( 1 + (log (base( 2 + f/n)) n ) ) ) where f is no of find sets operations and n is no of make set operations
With both union by rank and path compression: O( m*p(n )) where p(n) is less than equal to 4
The time complexity of both union and find would be linear if you use neither ranks nor path compression, because in the worst case, it would be necessary to iterate through the entire tree in every query.
If you use only union by ranks, without path compression, the complexity would be logarithmic.
The detailed solution is quite difficult to understand, but basically you wouldn't traverse the entire tree, because the depth of the tree would only increase if the ranks of the two sets are equal. So the iteration would be O(log*n) per query.
If you use the path compression optimization, the complexity would be even lower, because it "flattens" the tree, thus reducing the traversal. Its amortized time per operation is even faster than O(n), as you can read here.