The identity value is a value, such that x op identity = x. This is a concept which is not unique to Java Streams, see for example on Wikipedia.
It lists some examples of identity elements, some of them can be directly expressed in Java code, e.g.
reduce("", String::concat)reduce(true, (a,b) -> a&&b)reduce(false, (a,b) -> a||b)reduce(Collections.emptySet(), (a,b)->{ Set<X> s=new HashSet<>(a); s.addAll(b); return s; })reduce(Double.POSITIVE_INFINITY, Math::min)reduce(Double.NEGATIVE_INFINITY, Math::max)
It should be clear that the expression x + y == x for arbitrary x can only be fulfilled when y==0, thus 0 is the identity element for the addition. Similarly, 1 is the identity element for the multiplication.
More complex examples are
Reducing a stream of predicates
reduce(x->true, Predicate::and) reduce(x->false, Predicate::or)Reducing a stream of functions
reduce(Function.identity(), Function::andThen)
The identity value is a value, such that x op identity = x. This is a concept which is not unique to Java Streams, see for example on Wikipedia.
It lists some examples of identity elements, some of them can be directly expressed in Java code, e.g.
reduce("", String::concat)reduce(true, (a,b) -> a&&b)reduce(false, (a,b) -> a||b)reduce(Collections.emptySet(), (a,b)->{ Set<X> s=new HashSet<>(a); s.addAll(b); return s; })reduce(Double.POSITIVE_INFINITY, Math::min)reduce(Double.NEGATIVE_INFINITY, Math::max)
It should be clear that the expression x + y == x for arbitrary x can only be fulfilled when y==0, thus 0 is the identity element for the addition. Similarly, 1 is the identity element for the multiplication.
More complex examples are
Reducing a stream of predicates
reduce(x->true, Predicate::and) reduce(x->false, Predicate::or)Reducing a stream of functions
reduce(Function.identity(), Function::andThen)
The @holger answer greatly explain what is the identity for different function but doesn't explain why we need identity and why you have different results between parallel and sequential streams.
Your problem can be reduced to summing a list of element knowing how to sum 2 elements.
So let's take a list L = {12,32,10,18} and a summing function (a,b) -> a + b
Like you learn at school you will do:
(12,32) -> 12 + 32 -> 44
(44,10) -> 44 + 10 -> 54
(54,18) -> 54 + 18 -> 72
Now imagine our list become L = {12}, how to sum this list? Here the identity (x op identity = x) comes.
(0,12) -> 12
So now you can understand why you get +1 to your sum if you put 1 instead of 0, that's because you initialize with a wrong value.
(1,12) -> 1 + 12 -> 13
(13,32) -> 13 + 32 -> 45
(45,10) -> 45 + 10 -> 55
(55,18) -> 55 + 18 -> 73
So now, how can we improve speed? Parallelize things
What if we can split our list and give those splitted list to 4 different thread (assuming 4-core cpu) and then combined it? This will give us L1 = {12}, L2 = {32}, L3 = {10}, L4 = {18}
So with identity = 1
- thread1:
(1,12) -> 1+12 -> 13 - thread2:
(1,32) -> 1+32 -> 33 - thread3:
(1,10) -> 1+10 -> 11 - thread4:
(1,18) -> 1+18 -> 19
and then combine, 13 + 33 + 11 +19, which is equal to 76, this explain why the error is propagated 4 times.
In this case parallel can be less efficient.
But this result depends on your machine and input list. Java won't create 1000 threads for 1000 elements and the error will propagate more slowly as the input grows.
Try running this code summing one thousand 1s, the result is quite close to 1000
public class StreamReduce {
public static void main(String[] args) {
int sum = IntStream.range(0, 1000).map(i -> 1).parallel().reduce(1, (r, e) -> r + e);
System.out.println("sum: " + sum);
}
}
Now you should understand why you have different results between parallel and sequential if you break the identity contract.
See Oracle doc for proper way to write your sum
What's the identity of a problem?
Videos
Look in particular at the third method: Here the accumulation (wrong word, really; this is a conclist more than a conslist, but I hope you follow the meaning here) is a different type than the stream.
For example, you have a stream of strings, and you reduce it into an op that counts up the string lengths; the accumulation is done as an integer.
In this case, the obvious 'initial / default value' is 0, and you CANT go with the 'first value from the stream'; that is a string and not an int.
Note how this matches up with the types: in the third sig, the type of the initial/default value is 'U', whereas the stream is a stream of T objects.
Even for the first signature, and if the stream isn't empty, you very occasionally don't want to 'just' start with a random (streams are not neccessarily ordered; 'first' doesn't make sense for various streams, so, 'some value' is a better way to think about it than 'the first value', that's already a bit of an issue) value from the stream, but with some known value.
So, let's recap:
If you want to go with 'hey, an arbitrary element from the stream can serve as initial value just fine, BUT, I want to specify a specific default to avoid optional', then that's simple:
stream.reduce(accumulatorFunction).orElse(defaultValue)will get you this.If you want to with 'I have a specific value that serves as initial value', then, I'm having a real tough time imagining how that wouldn't also be the default value if zero elements are in the stream, so, you just use the second form.
If you use a different type as accumulation value vs. what the stream is from, you must use the 3rd form and you must specify an initial value, because an arbitrary item from the stream cannot serve as one; it would have the wrong type. As with the previous: If you specify an explicit initial value, that's.. pretty much always also the proper default value in case of an empty stream.
Thus they all make sense.
The second method is used when you want/need an identity value to start the sequence.
The third method is used when the reduced value is of a difference type than the values being accumulated. E.g. could be used to add values to a collection-type object, or could be used to aggregate more complex values such as an average value, which needs both a sum and a count.
If you want a default value only, use the first one:
reduce(...).orElse(defaultValue)
As you can see, there is a method for default value. If you changed the second method is have default value, but not an initial value, then you'd be out of luck if you actually needed an initial value. Those 3 methods gives you more flexibility than what you're proposing.
I am trying to learn reduce method of stream Api . For the below example, I wanted to print sum of square of the list.
List<Integer> numbers = Arrays.asList(2,2);
Integer squareSum= numbers.stream().reduce(0,(a,b)-> {//identity Accumulator, combiner
//identity = 0; initial value of reduction operation, 0 for empty stream
//Accumulator: function that takes two parameters
//1. partial result of the reduction operation =a
// 2.next element of the stream=b
// (a,b)-> a*a + b*b ;
//Combiner: a function used to combine the partial result of the reduction operation
int i = a*a + b*b ;//2*2+(2^2+2^2) (Please help me understand this line)
return i;
});
System.out.println(squareSum);//20
So, if I pass list (3,3), why it is not 3*3+(3^2+3^2)? Probably, I am wrong in understanding when I was passing(2,2).