Async programming models part II

As promised, this is the second part of my pocket-book introduction to async programming. Let’s continue!

If this article does not make sense, please check part I before shouting at me!

Map-reduce models

The map reduce approach is widely used in doing big data analysis. The model itself it’s a variation on the split-apply-combine approach, which is pretty much self-explanatory. You split the data, you send to multiple threads to tinker with it, you combine the resulting data and you get the result.

What are they good for

Map-reduce models are good at doing the same operations on large sets of data. You need to analyze numerous files, do it with a map-reduce model. Any kind of data which you can safely split into chunks, you can process with map-reduce.

Problems with map-reduce

With this model, you don’t have that much control over the threads. You just tell the thread what to do, and, generally speaking, it’s the framework’s job to split the data. Additionally, all of your threads are applying the same operation to the data set.

Map-reduce is not that good at handling interactions from the outside. If you need to handle user input, maybe this is not the way to go.

Map reduce usually implements the fork-join model. I.e. before the data is processed, you need to wait for all the threads need to finish, so if you have long tasks, then you’re going to keep lots of resources busy. Moreover, if you have network calls inside the map thread and one (or more) of those requests hang, you’re going to be in some trouble.

Where can I find this

Java parallel streams

Scala parallel collections

C++ OpemMP (albeit OpenMP offers more options than just map-reduce)


Oh boy! Promises are my new favorite toys for a while now. With this model you need to define a chain of actions that need to be performed on the data. Each action generates data that will be used by the next action and so on. After you do that, each action will be completed eventually (presumably on a thread that is not doing anything important), then the data is going to be passed to the next action and the cycle repeats.

What are they good for

Promises are a good way to parallelize a wide variety of independent flows. You have precise control over the actions in the flow and the order in which they are executed (inside the same flow at least). They can be returned from functions and that helps you keep your code cleaner and organized. Even more, you can bundle them together into bigger and more complex promises. You can pass any data to each of the actions in the flow. However, at the same time, you can do type checking on the data.

To me, the promises model is the most flexible model out there.

Problems with promises

They are hard to keep. Well, maybe not, it depends! However, the promises model is a bit hard to get the hang of.

Even though actions in a flow are executed din sequence, you have no control on the order of separate flows. This can be a problem.

Before you can use the data from the promise in the main thread you need to wait for the promises to complete. However, if you do lots of waits, (and in the wrong order) you may in fact hurt the overall performance.

Where can I find this

Javascript promise

Scala futures

Java futures


Async programming models part I

Asynchronous programming is everywhere around us and it’s here to stay. You might have noticed that CPU’s keep getting more cores as time goes by. Therefore, it would be a shame to let all those cores go unused in your app. In this article I shall be your guide through the various async programming patterns that I have used.

Plain old threads

These are the most powerful tools at your disposal. A kind of all-purpose swiss army knife that you can do everything with. The basic pattern is that you have a function that is going to run on a separate thread. You pass in any data that the thread needs and you are good to go.  Every pattern we will cover here can be reduced to this model (because this is what computers understand).

What are they good for

Plain threads are good for long, custom tasks that you need to run in the background. Do you want to build a server that handles multiple requests at the same time, then this is the way to go.

The problem with plain old threads

Albeit they are very powerful tools, threads can be a bit hard to use. You need to make sure you are passing the right data. Then you need to make sure you are getting the right data from the thread. And you need to all of this while make sure you are not causing any synchronization issues. That is not as easy as it seems. All the patterns share the synchronization issues.

Additionally, you need to decide on the number of threads to use and this is not a trivial task.

Where can I find this

All programming languages worth a damn provide some sort of API you can use for this

Queue of events

In this async programming model you begin with a bunch of threads that idly wait for something to happen. You also have a queue of events that keep track of what happens. When something happens, one of the threads picks up the work(usually the first idle thread). Then it does the work. Then it goes back to twiddling its thumbs until something new happens.

What are they good for

You should use this pattern when you need to respond to something happening in the world. These events can be HTTP requests if you’re building a server, user actions or other threads work.

Threads can be added or removed from the pool as needs demand. Most of the times this can be done automatically, so this is one less thing you need to worry about.

In simpler terms: If your app can be described as When this happens, then this should happen!, you can use this pattern.

Problems with a queue of events

You have significantly less control on the threads than you would normally do. Additionally, you need to add all the info required by the thread in the event data. From time to time, this can be tricky.

The queue itself can be a problem. Events can keep piling up during busy times or if there is a deadlock with the threads. If the queue grows too big events can either be dropped (causing loss of data) or they can block the system (causing much worse damage)

Where can I find this

This is all for today. Map-reduce and promises coming soon. Are there any async patterns you want me to cover?