Databases are everywhere and they’re here to stay. If you’re a bit familiar with relational databases you are probably familiar with transactions. Transactions are pretty powerful tools when working with databases. They allow multiple users to play nicely with each other while working on the same database. However, with more and more users connecting to the same database, you’re bound to run into performance issues sooner or later.
Powerful as they are, transactions are a double edge sword. Used incorrectly they can, and will eat up your database’s resources. Transaction isolation levels, are a way to fine-tune the I (Isolation) bit of ACID (Atomicity, Consistency, Isolation, Durability). You can increase concurrency if you are willing to make a compromise about what Isolated actually means.
Table of Contents
First things first, what can possibly go wrong ? Assuming that each statement abides by the ACID principles (which they are), when multiple statements are run by different users at the same time, you can run into issues such as:
- Dirty reads – Data is being read while in the process of being updated. For example: Bob changes a row a part of a transaction. Alice reads that row and gets the data that is saved in the database. Bob then rolls back the transaction. In this case Alice has data that technically never existed.
- Non-repeatable reads – Subsequent reads don’t return consistent results. For example: Alice starts a transaction that reads data on a row. Bob changes that row and commits the changes. Alice, while in the same transaction, reads the same row again and gets a new value for the data.
- Phantom reads – Subsequent select statements may return different results. For example: Alice runs a select statement with a where clause that returns multiple rows. Bob inserts a row that fits with the where clause of the statement that Alice ran previously and commits the result. Alice runs the same select statement that she previously ran, and she now gets new rows as a result.
We saw what can go wrong, now let’s see how we can defend against it. Firstly, the tools: database locks. Once a transaction acquires a lock on a resource, they can limit the access other transactions have on that same resource by preventing other transactions from acquiring their locks. This, combined with the fact that each transaction needs to acquire the appropriate lock before they make operations, gives us a pretty comprehensive and mechanism that we can play with.
Different databases have different models for locks, but they share common patterns:
- Row level: Locks are placed on a single table row
- Table level: Locks are placed on the table
- Page level: This is an intermediary lock (between row and table) and applies to a page – a subset of the table rows, typically, the amount of data that can be read from the disk in a single disk operation.
- Database level: As you can imagine, a lock that applies to the whole database.
- Shared locks, also known as read locks. A shared lock on a resource is usually requested when a transaction wants to read data (i.e. select operations). Multiple transactions can acquire shared locks on a resource, meaning that multiple transactions can read data at the same time.
- Exclusive locks, also known as write-locks. An exclusive lock can only be acquired by a single transaction at a time. If there is another transaction who has a lock on the resource (shared or exclusive), the transaction will wait for the other transaction to release the locks. While a transaction holds an exclusive lock, no other transactions can acquire any locks on the resource (shared or exclusive)
- Update locks, are a more relaxed version of the exclusive lock. An update lock can be acquired on an object that has a shared lock on it. When the transaction is ready to update the object, it will convert the update lock into an exclusive lock. Please note that update locks can be acquired an object that has a shared lock on it, but a shared lock cannot be acquired on an object that has an update lock.
Transaction isolation level
Transactions make use of these locks to implement different isolation levels. At a really high level, isolation levels are a way to balance concurrency with isolation. The higher the isolation level, the more restrictive locks are going to be, thus the more transactions will have to wait to acquire said locks, which will result in a lower total number of transaction. As they say, there’s no such a thing as a free lunch. These are the transaction isolation levels:
This is the most relaxed isolation level. It allows the current transaction to read data that hasn’t been committed by other transactions yet. Transactions don’t acquire any locks for resources. This can potentially result in reading that that is in the middle of modified by other transactions.
The transaction can only read data that has been committed. The transaction acquires shared locks when reading resources and it releases them as soon as the read statement is complete. This means that we are only read data that has completed updating, but since we’re releasing the lock after each instruction, we can get different results on subsequent reads.
The transaction acquires a shared lock when it wants to read a resource and keeps the locks to the end of the transaction. As we keep shared locks throughout the transaction, we ensure that all reads return the same data, since we are keeping other transactions from modifying the data until the transaction ends.
* This is available in SQL Server.
All the reads inside the transaction return the data as it was available at beginning of the transaction. It’s as if the transaction takes a snapshot of the data at the beginning and uses that throughout the transaction. It’s functionally similar to the Serializable level described below, but it makes use of a slightly different mechanism to achieve that.
This is the highest isolation level available. The transaction acquires read locks for resources that need to be read and keeps them for the duration of the transaction, thus ensuring that data, once read, it stays the same for the duration of the transaction. It also acquires exclusive locks when updating the data and releases them at the end of the transaction. Additionally, the transaction places row range locks on of any rows matching conditions in the current transaction, so that we avoid phantom reads.
To summarize, every time you use a transaction in your code, you should take a moment and consider the appropriate isolation level that is needed. You can just put everything as serializable and call it a day, but in most cases, such a strong restriction is not needed and you would only be introducing unnecessary delays into your database. For most cases, a choice between Read Committed and Repeatable Reads would be enough. Keep this table in your mind as it should help you decide. Always use the most relaxed isolation level your app can afford.
|Dirty Reads||Non-Repeatable Reads||Phantom Reads|