Greenfield project – how to get it right from the get-go

You lucky programmer you. You get to work on a new project. A shiny new toy for you. All the possibilities that lay ahead. Simply limitless! Now the only question is: how do we do this right? How do we make sure our little project does not end up a big ol’ mess that is impossible to handle.

Decide on the spec

Before you can get your hands dirty, you have to have a plan. Right? You should have a very, very good idea what you have to do. Here are a couple of questions that you should have the answer to before you start any work:

  1. What are you trying to solve with the project ?
  2. What is the workflow of the project ?
  3. What is the expected load (now and in the future)
  4. What is the most important feature ? If you have to settle for three features, what would they be ?
  5. Are there any risks you need to look out for ?
Start reading

Now that you know what you’re doing, start reading! Honestly, there is a lot of reading involved before starting a new project. You should know what the available frameworks are, what are their limitations, what are they good/bad at. Additionally, you should have a good idea what add-ons you should use for what tasks and how well documented they are. Another important thing you should look into if known issues. There is a lot of less-than-stellar code out there, you should make sure you’re not relying on one for your awesome project. So, start reading now and don’t stop until you know everything. Well, maybe not everything, but you have a good idea what adding a particular framework means. Little word of advice here, if I may, just because you have experience with something, it does not make it the right tool for the job. Don’t skip over this stage.

Decide on the tech stack

Decisions, decisions! There are a lot of choices out there. Some are more mature (like Spring), while others are quickly changing offering new shiny toys at every turn (like React). You need to make a decision now that will affect the project it’s whole life. No pressure, right ? This is also the stage where you decide the infrastructure: what database will you use ? Will it be hosted on your server, or somewhere in the cloud ? What OS will the server be running on ? How will you do the deployment ? All of these are important questions, so take your time thinking about them now. You’ll thank yourself later. This is also the stage where you are in a safe position to make promises about timelines for your project, set deadlines, milestones, you know, all those things.

Set some ground rules

The beginning an important time for your project! The rules and conventions you set here are going to stick around for a long, long time. Be careful what patterns you add to your project here, they can save you or hurt you easily later down the line. Read about your tech stack’s best practices and make them part of your project. It’s going to be a lot of chaos in the early days of the project. You should be vigilant, so that you spot the changes that go against your rules and correct them.

Good job bearing with me this far. Now it’s time to go out there and make your dream project happen! Cheers!

8 git commands I use everyday

If you are already using git, then you can skip this paragraph. If you are fist hearing this word, then I can give you the basics for now. Git is the most used versioning control system. It basically is a way you can track all the versions of the program without having to get creative with folder names (version1, version1final, version1finalfinal etc.). Git also provides a way for multiple people to collaborate on the same project without stepping on each other’s toes (most of the time). You can get started with this tutorial. If you don’t know git you should start learning it now. I can’t stress enough how important and widely used it is.

If you already know about git, then you should have realized by now that keeping the git history clean is really helpful. It makes it easier to see what is merged and what needs work. You see when the changes were made and you can search through them for something that you want. It worth the effort to keep the history nice and organized for everyone. Without further ado, there are 8 git commands I use every day

git log

This is a nice and easy one. It shows all the history of the project, commit hashes, commit times and commit messages. You can search through the history by typing “/” if you want to look for something in particular.

git checkout -b branch_name

This creates a new branch of my current branch and checks the new branch out. Nice and easy way to get the task going.

git diff --cached

Once you add the changes you want for your task, you can review the staged changes to make sure everything is as expected. This command, together with git status, should save you from any surprised when doing a commit.

git commit -m "first message" -m "Second message"

The first message is a short description of what the changes are in this commit. I provide more details on the changes on the second message, which usually spans across multiple lines.

git rebase -i hash

Generally, when I work on a task I end up with a lot of commits on my branch. You know the ones: attempt at something, first iteration on something else, some minor change, oh now there is a typo. I like to have my tickets in a single commit when merging. This way, in the unlikely event, that I need to revert a ticket I can do that with just one commit.

git rebase master

I rebase a lot. Anytime I want to merge something I do a rebase on master first. This way you neatly stack all the commits for a ticket together. Makes following the git history much easier and reverting a change is easy peasy lemon squeezy.

git merge branch --no-ff

I very much prefer merging without fast forward. You get a nice commit message to celebrate and an easy to find point in the past to checkout if the need arises.

git push --delete origin/branch

Now it’s time to remove your work in progress and start anew. Great job! I prefer the more verbose way of deleting a branch. It’s harder to mess it up.

Let me know what other git commands you use everyday ?

Async programming models part II

As promised, this is the second part of my pocket-book introduction to async programming. Let’s continue!

If this article does not make sense, please check part I before shouting at me!

Map-reduce models

The map reduce approach is widely used in doing big data analysis. The model itself it’s a variation on the split-apply-combine approach, which is pretty much self-explanatory. You split the data, you send to multiple threads to tinker with it, you combine the resulting data and you get the result.

What are they good for

Map-reduce models are good at doing the same operations on large sets of data. You need to analyze numerous files, do it with a map-reduce model. Any kind of data which you can safely split into chunks, you can process with map-reduce.

Problems with map-reduce

With this model, you don’t have that much control over the threads. You just tell the thread what to do, and, generally speaking, it’s the framework’s job to split the data. Additionally, all of your threads are applying the same operation to the data set.

Map-reduce is not that good at handling interactions from the outside. If you need to handle user input, maybe this is not the way to go.

Map reduce usually implements the fork-join model. I.e. before the data is processed, you need to wait for all the threads need to finish, so if you have long tasks, then you’re going to keep lots of resources busy. Moreover, if you have network calls inside the map thread and one (or more) of those requests hang, you’re going to be in some trouble.

Where can I find this

Java parallel streams

Scala parallel collections

C++ OpemMP (albeit OpenMP offers more options than just map-reduce)

Promises

Oh boy! Promises are my new favorite toys for a while now. With this model you need to define a chain of actions that need to be performed on the data. Each action generates data that will be used by the next action and so on. After you do that, each action will be completed eventually (presumably on a thread that is not doing anything important), then the data is going to be passed to the next action and the cycle repeats.

What are they good for

Promises are a good way to parallelize a wide variety of independent flows. You have precise control over the actions in the flow and the order in which they are executed (inside the same flow at least). They can be returned from functions and that helps you keep your code cleaner and organized. Even more, you can bundle them together into bigger and more complex promises. You can pass any data to each of the actions in the flow. However, at the same time, you can do type checking on the data.

To me, the promises model is the most flexible model out there.

Problems with promises

They are hard to keep. Well, maybe not, it depends! However, the promises model is a bit hard to get the hang of.

Even though actions in a flow are executed din sequence, you have no control on the order of separate flows. This can be a problem.

Before you can use the data from the promise in the main thread you need to wait for the promises to complete. However, if you do lots of waits, (and in the wrong order) you may in fact hurt the overall performance.

Where can I find this

Javascript promise

Scala futures

Java futures

 

Async programming models part I

Asynchronous programming is everywhere around us and it’s here to stay. You might have noticed that CPU’s keep getting more cores as time goes by. Therefore, it would be a shame to let all those cores go unused in your app. In this article I shall be your guide through the various async programming patterns that I have used.

Plain old threads

These are the most powerful tools at your disposal. A kind of all-purpose swiss army knife that you can do everything with. The basic pattern is that you have a function that is going to run on a separate thread. You pass in any data that the thread needs and you are good to go.  Every pattern we will cover here can be reduced to this model (because this is what computers understand).

What are they good for

Plain threads are good for long, custom tasks that you need to run in the background. Do you want to build a server that handles multiple requests at the same time, then this is the way to go.

The problem with plain old threads

Albeit they are very powerful tools, threads can be a bit hard to use. You need to make sure you are passing the right data. Then you need to make sure you are getting the right data from the thread. And you need to all of this while make sure you are not causing any synchronization issues. That is not as easy as it seems. All the patterns share the synchronization issues.

Additionally, you need to decide on the number of threads to use and this is not a trivial task.

Where can I find this

All programming languages worth a damn provide some sort of API you can use for this

Queue of events

In this async programming model you begin with a bunch of threads that idly wait for something to happen. You also have a queue of events that keep track of what happens. When something happens, one of the threads picks up the work(usually the first idle thread). Then it does the work. Then it goes back to twiddling its thumbs until something new happens.

What are they good for

You should use this pattern when you need to respond to something happening in the world. These events can be HTTP requests if you’re building a server, user actions or other threads work.

Threads can be added or removed from the pool as needs demand. Most of the times this can be done automatically, so this is one less thing you need to worry about.

In simpler terms: If your app can be described as When this happens, then this should happen!, you can use this pattern.

Problems with a queue of events

You have significantly less control on the threads than you would normally do. Additionally, you need to add all the info required by the thread in the event data. From time to time, this can be tricky.

The queue itself can be a problem. Events can keep piling up during busy times or if there is a deadlock with the threads. If the queue grows too big events can either be dropped (causing loss of data) or they can block the system (causing much worse damage)

Where can I find this

This is all for today. Map-reduce and promises coming soon. Are there any async patterns you want me to cover?