Interesting that the interview question asks about the advantages, without asking about disadvantages, for there are are both.
Streams are a more declarative style. Or a more expressive style. It may be considered better to declare your intent in code, than to describe how it's done:
return people
.filter( p -> p.age() < 19)
.collect(toList());
... says quite clearly that you're filtering matching elements from a list, whereas:
List<Person> filtered = new ArrayList<>();
for(Person p : people) {
if(p.age() < 19) {
filtered.add(p);
}
}
return filtered;
Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.
Streams are often terser. The same example shows this. Terser isn't always better, but if you can be terse and expressive at the same time, so much the better.
Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.
Streams encourage less mutability. This is sort of related to the functional programming aspect -- the kind of programs you write using streams tend to be the kind of programs where you don't modify objects.
Streams encourage looser coupling. Your stream-handling code doesn't need to know the source of the stream, or its eventual terminating method.
Streams can succinctly express quite sophisticated behaviour. For example:
stream.filter(myfilter).findFirst();
Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst() drives the whole operation, so it efficiently stops after finding one item.
Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory Lists or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.
But streams scale. As well as Java's built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.
Disadvantages?
Performance: A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse.
Familiarity.The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that's familiar to that kind of person.
Cognitive overhead. Because of its declarative nature, and increased abstraction from what's happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it "just works", it just works.
Debuggers are improving, but even now, when you're stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.
Answer from slim on Stack OverflowInteresting that the interview question asks about the advantages, without asking about disadvantages, for there are are both.
Streams are a more declarative style. Or a more expressive style. It may be considered better to declare your intent in code, than to describe how it's done:
return people
.filter( p -> p.age() < 19)
.collect(toList());
... says quite clearly that you're filtering matching elements from a list, whereas:
List<Person> filtered = new ArrayList<>();
for(Person p : people) {
if(p.age() < 19) {
filtered.add(p);
}
}
return filtered;
Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.
Streams are often terser. The same example shows this. Terser isn't always better, but if you can be terse and expressive at the same time, so much the better.
Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.
Streams encourage less mutability. This is sort of related to the functional programming aspect -- the kind of programs you write using streams tend to be the kind of programs where you don't modify objects.
Streams encourage looser coupling. Your stream-handling code doesn't need to know the source of the stream, or its eventual terminating method.
Streams can succinctly express quite sophisticated behaviour. For example:
stream.filter(myfilter).findFirst();
Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst() drives the whole operation, so it efficiently stops after finding one item.
Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory Lists or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.
But streams scale. As well as Java's built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.
Disadvantages?
Performance: A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse.
Familiarity.The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that's familiar to that kind of person.
Cognitive overhead. Because of its declarative nature, and increased abstraction from what's happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it "just works", it just works.
Debuggers are improving, but even now, when you're stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.
Syntactic fun aside, Streams are designed to work with potentially infinitely large data sets, whereas arrays, Collections, and nearly every Java SE class which implements Iterable are entirely in memory.
A disadvantage of a Stream is that filters, mappings, etc., cannot throw checked exceptions. This makes a Stream a poor choice for, say, intermediate I/O operations.
Why is the Java Stream.forEach method faster than other loops in certain situations? - Stack Overflow
java - What is difference between Collection.stream().forEach() and Collection.forEach()? - Stack Overflow
java - Is Collection.stream().filter().forEach() inefficient compared to a standard for each loop? - Software Engineering Stack Exchange
.forEach() on a list vs .forEach() on a stream. What's the difference?
Videos
Alternative 2 looks best to me in this case
1. Readability
- Alternative 2 has less number of lines
- Alternative 2 read more close to
return a list containing number divisible by 6 from numList, while forEach approach means add number divisible by 6 from numList tosecondInts filter(i -> i % 6 == 0)is straight forward and
require some time for human brain to process.if(i % 6 != 0) { return; }
2. Performance
From Stream.toList()
Implementation Note: Most instances of Stream will override this method and provide an implementation that is highly optimized compared to the implementation in this interface.
We benefit from optimization from JDK by using Stream API.
And in this case, using forEach and adding element one by one will be slower, especially when the list is large. It is because ArrayList will need to extend it capacity whenever the list full, while Stream implementation ImmutableCollections.listFromTrustedArrayNullsAllowed just store the result array into ListN.
One more point to note about parallelism:
From Stream#forEach
The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.
numbersList.stream().parallel()
.forEach(i -> {
if(i % 6 != 0) {
return;
}
secondInts.add(i);
});
Will provide unexpected result, while
List<Integer> third = numbersList.stream().parallel()
.filter(i -> i % 6 == 0).sorted().forEach()
.toList();
is totally fine.
3. Flexibility
Imagine you want the filtered list to be sorted, in forEach approach, you can do it like:
numbersList.stream().sorted().
.forEach(i -> {
if(i % 6 != 0) {
return;
}
secondInts.add(i);
});
Which is much slower compared to
numbersList.stream()
.filter(i -> i % 6 == 0)
.sorted()
.toList();
As we need to sort the whole numbersList instead of filtered.
Or if you want to limit your result to 10 elements, it is not straight forward to do so with forEach, but just as simple as adding limit(10) when using stream.
4. Less error prone
Stream API usually return Immutable object by default.
From Stream.toList()
Implementation Requirements: The implementation in this interface returns a List produced as if by the following: Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())))
Meaning that the returned list is immutable by default. Some advantages of immutability are:
- You can safely pass the list around to different method without worrying the list is modified.
- Immutable list are thread safe.
Read Pros. / Cons. of Immutability vs. Mutability for further discussion.
I don’t think it’s an issue of advantages. Each mechanism has a specific purpose.
.forEach() returns void so it doesn’t have an output. The intent is that the elements that the forEach iterates through are not modified. The data in the elements are used for some sort of calculation. I find that forEach is used much less than map. It’s a terminal point in a pipeline.
.filter() takes a stream as input and emits a filtered stream as output. It is for filtering.
.map() is like forEach but it emits a stream of modified objects. It allows the same modification to be done on each each element so that it can be saved, filtered or manipulated further.
.toList is a handy shortcut to turn a stream into a list. Using forEach(List::add) where a toList() will do the work is a terrible idea. You’re preventing Java from bulking the activity.
For simple cases such as the one illustrated, they are mostly the same. However, there are a number of subtle differences that might be significant.
One issue is with ordering. With Stream.forEach, the order is undefined. It's unlikely to occur with sequential streams, still, it's within the specification for Stream.forEach to execute in some arbitrary order. This does occur frequently in parallel streams. By contrast, Iterable.forEach is always executed in the iteration order of the Iterable, if one is specified.
Another issue is with side effects. The action specified in Stream.forEach is required to be non-interfering. (See the java.util.stream package doc.) Iterable.forEach potentially has fewer restrictions. For the collections in java.util, Iterable.forEach will generally use that collection's Iterator, most of which are designed to be fail-fast and which will throw ConcurrentModificationException if the collection is structurally modified during the iteration. However, modifications that aren't structural are allowed during iteration. For example, the ArrayList class documentation says "merely setting the value of an element is not a structural modification." Thus, the action for ArrayList.forEach is allowed to set values in the underlying ArrayList without problems.
The concurrent collections are yet again different. Instead of fail-fast, they are designed to be weakly consistent. The full definition is at that link. Briefly, though, consider ConcurrentLinkedDeque. The action passed to its forEach method is allowed to modify the underlying deque, even structurally, and ConcurrentModificationException is never thrown. However, the modification that occurs might or might not be visible in this iteration. (Hence the "weak" consistency.)
Still another difference is visible if Iterable.forEach is iterating over a synchronized collection. On such a collection, Iterable.forEach takes the collection's lock once and holds it across all the calls to the action method. The Stream.forEach call uses the collection's spliterator, which does not lock, and which relies on the prevailing rule of non-interference. The collection backing the stream could be modified during iteration, and if it is, a ConcurrentModificationException or inconsistent behavior could result.
This answer concerns itself with the performance of the various implementations of the loops. Its only marginally relevant for loops that are called VERY OFTEN (like millions of calls). In most cases the content of the loop will be by far the most expensive element. For situations where you loop really often, this might still be of interest.
You should repeat this tests under the target system as this is implementation specific, (full source code).
I run openjdk version 1.8.0_111 on a fast Linux machine.
I wrote a test that loops 10^6 times over a List using this code with varying sizes for integers (10^0 -> 10^5 entries).
The results are below, the fastest method varies depending on the amount of entries in the list.
But still under worst situations, looping over 10^5 entries 10^6 times took 100 seconds for the worst performer, so other considerations are more important in virtually all situations.
public int outside = 0;
private void iteratorForEach(List<Integer> integers) {
integers.forEach((ii) -> {
outside = ii*ii;
});
}
private void forEach(List<Integer> integers) {
for(Integer next : integers) {
outside = next * next;
}
}
private void forCounter(List<Integer> integers) {
for(int ii = 0; ii < integers.size(); ii++) {
Integer next = integers.get(ii);
outside = next*next;
}
}
private void iteratorStream(List<Integer> integers) {
integers.stream().forEach((ii) -> {
outside = ii*ii;
});
}
Here are my timings: milliseconds / function / number of entries in list. Each run is 10^6 loops.
1 10 100 1000 10000
iterator.forEach 27 116 959 8832 88958
for:each 53 171 1262 11164 111005
for with index 39 112 920 8577 89212
iterable.stream.forEach 255 324 1030 8519 88419
If you repeat the experiment, I posted the full source code. Please do edit this answer and add you results with a notation of the tested system.
Using a MacBook Pro, 2.5 GHz Intel Core i7, 16 GB, macOS 10.12.6:
1 10 100 1000 10000
iterator.forEach 27 106 1047 8516 88044
for:each 46 143 1182 10548 101925
for with index 49 145 887 7614 81130
iterable.stream.forEach 393 397 1108 8908 88361
Java 8 Hotspot VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
1 10 100 1000 10000
iterator.forEach 30 115 928 8384 85911
for:each 40 125 1166 10804 108006
for with index 30 120 956 8247 81116
iterable.stream.forEach 260 237 1020 8401 84883
Java 11 Hotspot VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
(same machine as above, different JDK version)
1 10 100 1000 10000
iterator.forEach 20 104 940 8350 88918
for:each 50 140 991 8497 89873
for with index 37 140 945 8646 90402
iterable.stream.forEach 200 270 1054 8558 87449
Java 11 OpenJ9 VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
(same machine and JDK version as above, different VM)
1 10 100 1000 10000
iterator.forEach 211 475 3499 33631 336108
for:each 200 375 2793 27249 272590
for with index 384 467 2718 26036 261408
iterable.stream.forEach 515 714 3096 26320 262786
Java 8 Hotspot VM - 2.8GHz AMD, 64 GB, Windows Server 2016
1 10 100 1000 10000
iterator.forEach 95 192 2076 19269 198519
for:each 157 224 2492 25466 248494
for with index 140 368 2084 22294 207092
iterable.stream.forEach 946 687 2206 21697 238457
Java 11 Hotspot VM - 2.8GHz AMD, 64 GB, Windows Server 2016
(same machine as above, different JDK version)
1 10 100 1000 10000
iterator.forEach 72 269 1972 23157 229445
for:each 192 376 2114 24389 233544
for with index 165 424 2123 20853 220356
iterable.stream.forEach 921 660 2194 23840 204817
Java 11 OpenJ9 VM - 2.8GHz AMD, 64 GB, Windows Server 2016
(same machine and JDK version as above, different VM)
1 10 100 1000 10000
iterator.forEach 592 914 7232 59062 529497
for:each 477 1576 14706 129724 1190001
for with index 893 838 7265 74045 842927
iterable.stream.forEach 1359 1782 11869 104427 958584
The VM implementation you choose also makes a difference Hotspot/OpenJ9/etc.
I read that .forEach() on a list can modify the underlying list, so I tried this:
List<Integer> nums = Arrays.asList(1, 2, 3);
nums.forEach(num -> num++);
System.out.println(nums);The printed numbers are 1, 2, 3, so nothing changed! Why not?
I understand that someStream.forEach() is based on functional programming, which doesn't allow side effects, but I still don't really grasp the difference.
Stop using
LinkedListfor anything but heavy removing from the middle of the list using iterator.Stop writing benchmarking code by hand, use JMH.
Proper benchmarks:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
public static final int N = 10000;
static List<Integer> sourceList = new ArrayList<>();
static {
for (int i = 0; i < N; i++) {
sourceList.add(i);
}
}
@Benchmark
public List<Double> vanilla() {
List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
for (Integer i : sourceList) {
if (i % 2 == 0){
result.add(Math.sqrt(i));
}
}
return result;
}
@Benchmark
public List<Double> stream() {
return sourceList.stream()
.filter(i -> i % 2 == 0)
.map(Math::sqrt)
.collect(Collectors.toCollection(
() -> new ArrayList<>(sourceList.size() / 2 + 1)));
}
}
Result:
Benchmark Mode Samples Mean Mean error Units
StreamVsVanilla.stream avgt 10 17.588 0.230 ns/op
StreamVsVanilla.vanilla avgt 10 10.796 0.063 ns/op
Just as I expected stream implementation is fairly slower. JIT is able to inline all lambda stuff but doesn't produce as perfectly concise code as vanilla version.
Generally, Java 8 streams are not magic. They couldn't speedup already well-implemented things (with, probably, plain iterations or Java 5's for-each statements replaced with Iterable.forEach() and Collection.removeIf() calls). Streams are more about coding convenience and safety. Convenience -- speed tradeoff is working here.
1) You see time less than 1 second using you benchmark. That means there can be strong influence of side effects on your results. So, I increased your task 10 times
int max = 10_000_000;
and ran your benchmark. My results:
Collections: Elapsed time: 8592999350 ns (8.592999 seconds)
Streams: Elapsed time: 2068208058 ns (2.068208 seconds)
Parallel streams: Elapsed time: 7186967071 ns (7.186967 seconds)
without edit (int max = 1_000_000) results were
Collections: Elapsed time: 113373057 ns (0.113373 seconds)
Streams: Elapsed time: 135570440 ns (0.135570 seconds)
Parallel streams: Elapsed time: 104091980 ns (0.104092 seconds)
It's like your results: stream is slower than collection. Conclusion: much time were spent for stream initialization/values transmitting.
2) After increasing task stream became faster (that's OK), but parallel stream remained too slow. What's wrong? Note: you have collect(Collectors.toList()) in you command. Collecting to single collection essentially introduces performance bottleneck and overhead in case of concurrent execution. It is possible to estimate the relative cost of overhead by replacing
collecting to collection -> counting the element count
For streams it can be done by collect(Collectors.counting()). I got results:
Collections: Elapsed time: 41856183 ns (0.041856 seconds)
Streams: Elapsed time: 546590322 ns (0.546590 seconds)
Parallel streams: Elapsed time: 1540051478 ns (1.540051 seconds)
That' s for a big task! (int max = 10000000) Conclusion: collecting items to collection took majority of time. The slowest part is adding to list. BTW, simple ArrayList is used for Collectors.toList().