Async programming models part II
As promised, this is the second part of my pocket-book introduction to async programming. Let’s continue!
If this article does not make sense, please check part I before shouting at me!
Map-reduce models
The map reduce approach is widely used in doing big data analysis. The model itself it’s a variation on the split-apply-combine approach, which is pretty much self-explanatory. You split the data, you send to multiple threads to tinker with it, you combine the resulting data and you get the result.
What are they good for
Map-reduce models are good at doing the same operations on large sets of data. You need to analyze numerous files, do it with a map-reduce model. Any kind of data which you can safely split into chunks, you can process with map-reduce.
Problems with map-reduce
With this model, you don’t have that much control over the threads. You just tell the thread what to do, and, generally speaking, it’s the framework’s job to split the data. Additionally, all of your threads are applying the same operation to the data set.
Map-reduce is not that good at handling interactions from the outside. If you need to handle user input, maybe this is not the way to go.
Map reduce usually implements the fork-join model. I.e. before the data is processed, you need to wait for all the threads need to finish, so if you have long tasks, then you’re going to keep lots of resources busy. Moreover, if you have network calls inside the map thread and one (or more) of those requests hang, you’re going to be in some trouble.
Where can I find this
Java parallel streams
Scala parallel collections
C++ OpemMP (albeit OpenMP offers more options than just map-reduce)
Promises
Oh boy! Promises are my new favorite toys for a while now. With this model you need to define a chain of actions that need to be performed on the data. Each action generates data that will be used by the next action and so on. After you do that, each action will be completed eventually (presumably on a thread that is not doing anything important), then the data is going to be passed to the next action and the cycle repeats.
What are they good for
Promises are a good way to parallelize a wide variety of independent flows. You have precise control over the actions in the flow and the order in which they are executed (inside the same flow at least). They can be returned from functions and that helps you keep your code cleaner and organized. Even more, you can bundle them together into bigger and more complex promises. You can pass any data to each of the actions in the flow. However, at the same time, you can do type checking on the data.
To me, the promises model is the most flexible model out there.
Problems with promises
They are hard to keep. Well, maybe not, it depends! However, the promises model is a bit hard to get the hang of.
Even though actions in a flow are executed din sequence, you have no control on the order of separate flows. This can be a problem.
Before you can use the data from the promise in the main thread you need to wait for the promises to complete. However, if you do lots of waits, (and in the wrong order) you may in fact hurt the overall performance.
Where can I find this
Javascript promise
Scala futures
Java futures
Part I here