One of the prime motivations for the introduction of Java streams was to allow parallel operations. This led to a requirement that operations on Java streams such as map and filter be independent of the position of the item in the stream or the items around it. This has the advantage of making it easy to split streams for parallel processing. It has the disadvantage of making certain operations more complex.
So the simple answer is that there is no easy way to do things such as take every nth item or map each item to the sum of all previous items.
The most straightforward way to implement your requirement is to use the index of the list you are streaming from:
List<String> list = ...;
return IntStream.range(0, list.size())
.filter(n -> n % 3 == 0)
.mapToObj(list::get)
.toList();
A more complicated solution would be to create a custom collector that collects every nth item into a list.
class EveryNth<C> {
private final int nth;
private final List<List<C>> lists = new ArrayList<>();
private int next = 0;
private EveryNth(int nth) {
this.nth = nth;
IntStream.range(0, nth).forEach(i -> lists.add(new ArrayList<>()));
}
private void accept(C item) {
lists.get(next++ % nth).add(item);
}
private EveryNth<C> combine(EveryNth<C> other) {
other.lists.forEach(l -> lists.get(next++ % nth).addAll(l));
next += other.next;
return this;
}
private List<C> getResult() {
return lists.get(0);
}
public static Collector<Integer, ?, List<Integer>> collector(int nth) {
return Collector.of(() -> new EveryNth(nth),
EveryNth::accept, EveryNth::combine, EveryNth::getResult));
}
This could be used as follows:
Stream.of("Anne", "Bill", "Chris", "Dean", "Eve", "Fred", "George")
.parallel().collect(EveryNth.collector(3)).toList();
Which returns the result ["Anne", "Dean", "George"] as you would expect.
This is a very inefficient algorithm even with parallel processing. It splits all items it accepts into n lists and then just returns the first. Unfortunately it has to keep all items through the accumulation process because it's not until they are combined that it knows which list is the nth one.
Given the complexity and inefficiency of the collector solution I would definitely recommend sticking with the indices based solution above in preference to this if you can. If you aren't using a collection that supports get (e.g. you are passed a Stream rather than a List) then you will either need to collect the stream using Collectors.toList or use the EveryNth solution above.
One of the prime motivations for the introduction of Java streams was to allow parallel operations. This led to a requirement that operations on Java streams such as map and filter be independent of the position of the item in the stream or the items around it. This has the advantage of making it easy to split streams for parallel processing. It has the disadvantage of making certain operations more complex.
So the simple answer is that there is no easy way to do things such as take every nth item or map each item to the sum of all previous items.
The most straightforward way to implement your requirement is to use the index of the list you are streaming from:
List<String> list = ...;
return IntStream.range(0, list.size())
.filter(n -> n % 3 == 0)
.mapToObj(list::get)
.toList();
A more complicated solution would be to create a custom collector that collects every nth item into a list.
class EveryNth<C> {
private final int nth;
private final List<List<C>> lists = new ArrayList<>();
private int next = 0;
private EveryNth(int nth) {
this.nth = nth;
IntStream.range(0, nth).forEach(i -> lists.add(new ArrayList<>()));
}
private void accept(C item) {
lists.get(next++ % nth).add(item);
}
private EveryNth<C> combine(EveryNth<C> other) {
other.lists.forEach(l -> lists.get(next++ % nth).addAll(l));
next += other.next;
return this;
}
private List<C> getResult() {
return lists.get(0);
}
public static Collector<Integer, ?, List<Integer>> collector(int nth) {
return Collector.of(() -> new EveryNth(nth),
EveryNth::accept, EveryNth::combine, EveryNth::getResult));
}
This could be used as follows:
Stream.of("Anne", "Bill", "Chris", "Dean", "Eve", "Fred", "George")
.parallel().collect(EveryNth.collector(3)).toList();
Which returns the result ["Anne", "Dean", "George"] as you would expect.
This is a very inefficient algorithm even with parallel processing. It splits all items it accepts into n lists and then just returns the first. Unfortunately it has to keep all items through the accumulation process because it's not until they are combined that it knows which list is the nth one.
Given the complexity and inefficiency of the collector solution I would definitely recommend sticking with the indices based solution above in preference to this if you can. If you aren't using a collection that supports get (e.g. you are passed a Stream rather than a List) then you will either need to collect the stream using Collectors.toList or use the EveryNth solution above.
EDIT - Nov 28, 2017
As user @Emiel suggests in the comments, the best way to do this would be to use Stream.itearate to drive the list through a sequence of indices:
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
int skip = 3;
int size = list.size();
// Limit to carefully avoid IndexOutOfBoundsException
int limit = size / skip + Math.min(size % skip, 1);
List<Integer> result = Stream.iterate(0, i -> i + skip)
.limit(limit)
.map(list::get)
.collect(Collectors.toList());
System.out.println(result); // [1, 4, 7, 10]
This approach doesn't have the drawbacks of my previous answer, which comes below (I've decided to keep it for historical reasons).
Another approach would be to use Stream.iterate() the following way:
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
int skip = 3;
int size = list.size();
// Limit to carefully avoid IndexOutOfBoundsException
int limit = size / skip + Math.min(size % skip, 1);
List<Integer> result = Stream.iterate(list, l -> l.subList(skip, l.size()))
.limit(limit)
.map(l -> l.get(0))
.collect(Collectors.toList());
System.out.println(result); // [1, 4, 7, 10]
The idea is to create a stream of sublists, each one skipping the first N elements of the previous one (N=3 in the example).
We have to limit the number of iterations so that we don't try to get a sublist whose bounds are out of range.
Then, we map our sublists to their first element and collect our results. Keeping the first element of every sublist works as expected because every sublist's begin index is shifted N elements to the right, according to the source list.
This is also efficient, because the List.sublist() method returns a view of the original list, meaning that it doesn't create a new List for each iteration.
EDIT: After a while, I've learnt that it's much better to take either one of @sprinter's approachs, since subList() creates a wrapper around the original list. This means that the second list of the stream would be a wrapper of the first list, the third list of the stream would be a wrapper of the second list (which is already a wrapper!), and so on...
While this might work for small to medium-sized lists, it should be noted that for a very large source list, many wrappers would be created. And this might end up being expensive, or even generating a StackOverflowError.
You can build a custom Collector for this task.
Map<String, String> map =
Stream.of("a", "b", "err1", "c", "d", "err2", "e", "f", "g", "h", "err3", "i", "j")
.collect(MappingErrors.collector());
with:
private static final class MappingErrors {
private Map<String, String> map = new HashMap<>();
private String first, second;
public void accept(String str) {
first = second;
second = str;
if (first != null && first.startsWith("err")) {
map.put(first, second);
}
}
public MappingErrors combine(MappingErrors other) {
throw new UnsupportedOperationException("Parallel Stream not supported");
}
public Map<String, String> finish() {
return map;
}
public static Collector<String, ?, Map<String, String>> collector() {
return Collector.of(MappingErrors::new, MappingErrors::accept, MappingErrors::combine, MappingErrors::finish);
}
}
In this collector, two running elements are kept. Each time a String is accepted, they are updated and if the first starts with "err", the two elements are added to a map.
Another solution is to use the StreamEx library which provides a pairMap method that applies a given function to the every adjacent pair of elements of this stream. In the following code, the operation returns a String array consisting of the first and second element of the pair if the first element starts with "err", null otherwise. null elements are then filtered out and the Stream is collected into a map.
Map<String, String> map =
StreamEx.of("a", "b", "err1", "c", "d", "err2", "e", "f", "g", "h", "err3", "i", "j")
.pairMap((s1, s2) -> s1.startsWith("err") ? new String[] { s1, s2 } : null)
.nonNull()
.toMap(a -> a[0], a -> a[1]);
System.out.println(map);
You can write a custom collector, or use the much simpler approach of streaming over the list's indexes:
Map<String, String> result = IntStream.range(0, data.size() - 1)
.filter(i -> data.get(i).startsWith("err"))
.boxed()
.collect(toMap(data::get, i -> data.get(i+1)));
This assumes that your data is in a random access friendly list or that you can temporarily dump it into one.
If you cannot randomly access the data or load it into a list or array for processing, you can always make a custom pairing collector so you can write
Map<String, String> result = data.stream()
.collect(pairing(
(a, b) -> a.startsWith("err"),
AbstractMap.SimpleImmutableEntry::new,
toMap(Map.Entry::getKey, Map.Entry::getValue)
));
Here's the source for the collector. It's parallel-friendly and might come in handy in other situations:
public static <T, V, A, R> Collector<T, ?, R> pairing(BiPredicate<T, T> filter, BiFunction<T, T, V> map, Collector<? super V, A, R> downstream) {
class Pairing {
T left, right;
A middle = downstream.supplier().get();
boolean empty = true;
void add(T t) {
if (empty) {
left = t;
empty = false;
} else if (filter.test(right, t)) {
downstream.accumulator().accept(middle, map.apply(right, t));
}
right = t;
}
Pairing combine(Pairing other) {
if (!other.empty) {
this.add(other.left);
this.middle = downstream.combiner().apply(this.middle, other.middle);
this.right = other.right;
}
return this;
}
R finish() {
return downstream.finisher().apply(middle);
}
}
return Collector.of(Pairing::new, Pairing::add, Pairing::combine, Pairing::finish);
}
What is a Java Stream?
How to get every nth item from the list using the Stream API?
You can actually use an IntStream to simulate your list's pagination.
List<String> list = Arrays.asList("a","b","c","d","e","f","g","h","i","j");
int pageSize = 3;
IntStream.range(0, (list.size() + pageSize - 1) / pageSize)
.mapToObj(i -> list.subList(i * pageSize, Math.min(pageSize * (i + 1), list.size())))
.forEach(System.out::println);
which outputs:
[a, b, c]
[d, e, f]
[g, h, i]
[j]
If you want to generate Strings, you can use String.join since you are dealing with a List<String> directly:
.mapToObj(i -> String.join("", list.subList(i * pageSize, Math.min(pageSize * (i + 1), list.size()))))
You can create your own Collector. The easiest way is to call Collector.of().
Since your use case requires values to be processed in order, here is an implementation that simply doesn't support parallel processing.
public static Collector<String, List<List<String>>, List<List<String>>> blockCollector(int blockSize) {
return Collector.of(
ArrayList<List<String>>::new,
(list, value) -> {
List<String> block = (list.isEmpty() ? null : list.get(list.size() - 1));
if (block == null || block.size() == blockSize)
list.add(block = new ArrayList<>(blockSize));
block.add(value);
},
(r1, r2) -> { throw new UnsupportedOperationException("Parallel processing not supported"); }
);
}
Test
List<String> input = Arrays.asList("a","b","c","d","e","f","g","h","i","j");
List<List<String>> output = input.stream().collect(blockCollector(3));
output.forEach(System.out::println);
Output
[a, b, c]
[d, e, f]
[g, h, i]
[j]
One of the things to keep in mind is that Stream was primarily designed to be a way of taking advantage of parallel processing. An implication of this is that they have a number of conditions associated with them that are aimed at giving the VM a lot of freedom to process the elements in any convenient order. An example of this is insisting that reduction functions are associative. Another is that local variables manipulated are final. These types of conditions mean the stream items can be evaluated and collected in any order.
A natural consequence of this is that the best use cases for Stream involve no dependencies between the values of the stream. Things such as mapping a stream of integers to their cumulative values are trivial in languages like LISP but a pretty unnatural fit for Java streams (see this question).
There are clever ways of getting around some of these restrictions by using sequential to force the Stream to not be parallel but my experience has been that these are more trouble than they are worth. If your problem involves an essentially sequential series of items in which state is required to process the values then I recommend using traditional collections and iteration. The code will end up being clearer and will perform no worse given the stream cannot be parallelised anyway.
Having said all that, if you really want to do this then the most straightforward way is to have a collector that stores every third item then sends them out as a stream again:
class EveryThird {
private final List<Integer> list = new ArrayList<>();
private int count = 0;
public void accept(Integer i) {
if (count++ % 3 == 0)
list.add(i);
}
public EveryThird combine(EveryThird other) {
list.addAll(other.list);
count += other.count;
return this;
}
public Stream<Integer> stream() {
return list.stream();
}
}
This can then be used like:
IntStream.range(0, 10000)
.collect(EveryThird::new, EveryThird::accept, EveryThird::combine)
.stream()
But that's not really what collectors are designed for and this is pretty inefficient as it's unnecessarily collecting the stream. As stated above my recommendation is to use traditional iteration for this sort of situation.
My StreamEx library enhances standard Stream API. In particular it adds the headTail method which allows recursive definition of custom operations. It takes a function which receives stream head (first element) and tail (stream of the rest elements) and should return the resulting stream which will be used instead of the original one. For example, you can define every3 operation as follows:
public static <T> StreamEx<T> every3(StreamEx<T> input) {
return input.headTail(
(first, tail1) -> tail1.<T>headTail(
(second, tail2) -> tail2.headTail(
(third, tail3) -> every3(tail3))).prepend(first));
}
Here prepend is also used which just prepends the given element to the stream (this operation is just a best friend of headTail.
In general using headTail you can define almost any intermediate operation you want, including existing ones and new ones. You may find some samples here.
Note that I implemented some mechanism which optimizes tails in such recursive operation definition, so properly defined operation will not eat the whole stack when processing the long stream.
You can build a custom Collector for this task.
Map<String, String> map =
Stream.of("a", "b", "err1", "c", "d", "err2", "e", "f", "g", "h", "err3", "i", "j")
.collect(MappingErrors.collector());
with:
private static final class MappingErrors {
private Map<String, String> map = new HashMap<>();
private String first, second;
public void accept(String str) {
first = second;
second = str;
if (first != null && first.startsWith("err")) {
map.put(first, second);
}
}
public MappingErrors combine(MappingErrors other) {
throw new UnsupportedOperationException("Parallel Stream not supported");
}
public Map<String, String> finish() {
return map;
}
public static Collector<String, ?, Map<String, String>> collector() {
return Collector.of(MappingErrors::new, MappingErrors::accept, MappingErrors::combine, MappingErrors::finish);
}
}
In this collector, two running elements are kept. Each time a String is accepted, they are updated and if the first starts with "err", the two elements are added to a map.
Another solution is to use the StreamEx library which provides a pairMap method that applies a given function to the every adjacent pair of elements of this stream. In the following code, the operation returns a String array consisting of the first and second element of the pair if the first element starts with "err", null otherwise. null elements are then filtered out and the Stream is collected into a map.
Map<String, String> map =
StreamEx.of("a", "b", "err1", "c", "d", "err2", "e", "f", "g", "h", "err3", "i", "j")
.pairMap((s1, s2) -> s1.startsWith("err") ? new String[] { s1, s2 } : null)
.nonNull()
.toMap(a -> a[0], a -> a[1]);
System.out.println(map);
You can write a custom collector, or use the much simpler approach of streaming over the list's indexes:
Map<String, String> result = IntStream.range(0, data.size() - 1)
.filter(i -> data.get(i).startsWith("err"))
.boxed()
.collect(toMap(data::get, i -> data.get(i+1)));
This assumes that your data is in a random access friendly list or that you can temporarily dump it into one.
If you cannot randomly access the data or load it into a list or array for processing, you can always make a custom pairing collector so you can write
Map<String, String> result = data.stream()
.collect(pairing(
(a, b) -> a.startsWith("err"),
AbstractMap.SimpleImmutableEntry::new,
toMap(Map.Entry::getKey, Map.Entry::getValue)
));
Here's the source for the collector. It's parallel-friendly and might come in handy in other situations:
public static <T, V, A, R> Collector<T, ?, R> pairing(BiPredicate<T, T> filter, BiFunction<T, T, V> map, Collector<? super V, A, R> downstream) {
class Pairing {
T left, right;
A middle = downstream.supplier().get();
boolean empty = true;
void add(T t) {
if (empty) {
left = t;
empty = false;
} else if (filter.test(right, t)) {
downstream.accumulator().accept(middle, map.apply(right, t));
}
right = t;
}
Pairing combine(Pairing other) {
if (!other.empty) {
this.add(other.left);
this.middle = downstream.combiner().apply(this.middle, other.middle);
this.right = other.right;
}
return this;
}
R finish() {
return downstream.finisher().apply(middle);
}
}
return Collector.of(Pairing::new, Pairing::add, Pairing::combine, Pairing::finish);
}