I had a Stream of data coming in from an input source and had to do some heavily-CPU bound work. The code looked like this:
stream.map(item => cpuIntensiveWork(item))
Behind the scenes, the input stream is enumerated and items are acted upon one-by-one to create a lazily loaded list.
Wanting to parallelize this, I added par:
stream.par.map(item => cpuIntensiveWork(item))
What I imagined this would do was that it would load the lines into a buffer and have workers read from that buffer an execute the map while it was still writing. Turns out that’s not what happens.
What really happens is that the whole stream is read into memory and then that gets parallelized. Hardly what I was aiming for.