My Two Cents on Postgres Isolation Levels

Table of Content

tldr

PostgreSQL is awesome
read committed takes a new snapshot for every SQL statement
repeatable read only has one snapshot
repeatable read fails when the row it tries to modify was changed by other transactions
repeatable read hang -> wakeup -> dies when the row of the lock it’s waiting is updated by other transactions
serializable == repeatable read + serializable snapshot isolation
serializable handles read write cases
transactions are nodes in precedence graph
t1 reads row1 and t2 writes row1 means t1 -> t2 in the precedence graph
PostgreSQL serializable is optimistic

Overview

I always had a vague understanding of database isolation levels, recently I discovered that I had a major misunderstanding of it. I talked to Gemini for hours and “I think” I got it. This post is to verify that I actually got it with examples.

This post is specific for PostgreSQL.

When you go to the official PostgreSQL Transaction Isolation documentation it throws a lot of concepts to you and (at least for me) it was very confusing.

For SQL there is a standard, isolation levels are defined by the standard. The standard uses what “phenomena” is allowed to define the isolation level. This is like C++ has standards and each compiler implements it differently.

Honestly I sometimes still get confused by this table, I think the main reason is I don’t understand the “phenomena” by the bottom of my heart. Also it’s very hard to relate this to my day to day problems. I’m going to try to use the way that I hope someone would have explained this to me to write this post.

For now the things to understand is

There is a SQL Standard about isolation levels. It is defined by what phenomena is allowed at each level.
Each database implementation might implement it differently. Some might guarantee more than what the standards asks for. Also the standard doesn’t say if the implementation should use optimistic concurrency control or pessimistic concurrency control.

Optimistic Concurrency Control vs Pessimistic Concurrency Control

Before we go a bit further I think it’s nice to understand what is Optimistic Concurrency Control(OCC) vs Pessimistic Concurrency Control(PCC). In my understanding, OCC is where you are optimistic that there won’t be much conflicts, and if a conflict happens, you simply die and retry it. PCC is where you are pessimistic, and you believe that conflicts are gonna happen a lot, so you use locks to sort things out. When there is a conflict, you just wait for your turn.

The mistake I made was I thought PostgreSQL’s Serializable was pessimistic (it’s optimistic)

Example

Assume there is a bathroom in a building to be shared, an optimistic concurrency control scenario would be like there isn’t many people in the building, so we don’t even have a lock on the bathroom door, anyone could just walk in. Most of the times there won’t be anyone in there so you just walk in and use the bathroom, if you got unlucky and opened the door where someone else was in the bathroom, you come out (abort) and retry later.

Now assume that there are a lot of people in the building, it is highly likely that when you are using the bathroom someone wants to use it as well. We would add a lock to the bathroom door and have the key at the front desk, anyone who wants to use the bathroom has to wait at the front desk for the key then go to the bathroom. This is pessimistic concurrency control.

Now we have an understanding of optimistic concurrency control and pessimistic concurrency control let’s have a look at PostgreSQL’s isolation levels.

Let’s setup the environment

Right now postgres 18.3 is the latest version, let’s just use that.

Run docker pull postgres:18.3

Then let’s spin up the database with

docker run --name pg_isolation_test \
  -e POSTGRES_USER=testuser \
  -e POSTGRES_PASSWORD=testpass \
  -e POSTGRES_DB=isolation_db \
  -p 5432:5432 \
  -d postgres:18.3

You can use docker ps to confirm that postgres is up and running

CONTAINER ID   IMAGE           COMMAND                  CREATED              STATUS              PORTS                                         NAMES
883e64ddfcbc   postgres:18.3   "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5432->5432/tcp, [::]:5432->5432/tcp   pg_isolation_test

Now open two terminals. In both run docker exec -it pg_isolation_test psql -U testuser -d isolation_db

In any of them let’s use the following schema and test data

CREATE TABLE account (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    balance INT
);

INSERT INTO account (name, balance) VALUES ('Alice', 100);
INSERT INTO account (name, balance) VALUES ('Bob', 200);

Now we should have the test table, test data, and two connections to try out things.

How to reason about transactions on paper

I found the most easy way to reason about transactions on paper is to use a two column table, where each column represents a single transaction and time goes from top to bottom. You could use excel or google sheets to try this out.

Given our test table, assume we are running the following SQL.

Two transactions trying to withdraw 100 from Alice’s account.

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

UPDATE account
SET balance = balance-100
WHERE name = 'Alice' and balance >= 100;

COMMIT;

Would the SQL above drop the balance in database to below 0? Before we go and work it out on paper, I’d like to touch on one more thing.

Postgres uses MVCC, and the golden rule of MVCC is

Readers do not block Writers, and Writers do not block Readers. But Writers ALWAYS block Writers

When you run UPDATE, DELETE, SELECT FOR UPDATE they all trigger an exclusive row lock. This is irrelevant to the isolation level. Also the lock is only released when the transaction ends (COMMIT; or ROLLBACK;)

Read Committed Scenario On Paper: Tx1 and Tx2 both update before commit

See the table below and let’s walk through it

At T3, Tx1 UPDATE the balance to 0. And Tx1 locks Alice’s row because it is doing an UPDATE
At T4, Tx2 is blocked by the row lock
At T5, Tx1 commits and Alice’s balance is set to 0. Also Tx1 releases the lock. This would unblock Tx2, where Tx2 would re-evalulate the where clause and since balance is 0 it fails the where clause hence update 0 rows
At T6, Tx2 commits, nothing interesting happened since it updated 0 rows

So after this Alice’s balance is 0, records in database is still correct.

Read Committed Scenario On Paper: Tx1 commits before Tx2’s update

Let’s try to let Tx1 commit before Tx2’s statement runs

At T3, Tx1 sets the balance to 0, Tx1 locks Alice’s row
At T4, Tx1 commits, Tx1 releases Alice’s row lock
At T5, Tx2 sees balance is 0 (since the isolation level is read committed), so where clause fails, note that since the where clause fails, Tx2 does not lock any rows
At T6, Tx2 commits but nothing changed

So after this Alice’s balance is 0, records in database is still correct.

How to try out transactions with a test database

No matter how much you convinced yourself on paper, without trying it out there is still a chance what you believed is wrong. I can’t tell how many times I was dead sure about something and it turns out to be wrong.

So let’s try to verify the above in our testing environment.

The Real Deal: Read Committed, Tx1 and Tx2 both update before commit

The Real Deal: Read Committed, Tx1 commits before Tx2’s update

The Four Isolation Levels in Postgres

Read Uncommitted

This is in the SQL standard but PostgreSQL does not allow this, even if you set the transaction isolation level to Read Uncommitted it acts as Read Committed due to MVCC. Honestly I don’t have enough knowledge of MVCC, but I could probably write something about it later when I learn about it.

For now I’ll put the “golden rule of MVCC” which I have above here again for emphasis.

Readers do not block Writers, and Writers do not block Readers. But Writers ALWAYS block Writers

Read Committed

Say you have a transaction with multiple selects and the isolation level is READ COMMITTED, if anything was committed by another transaction between your SELECTs, it would be showing in your up coming SELECT.

Every statement uses a new snapshot of the database.

[Gemini Review] Nuance: Query vs. Statement While we say “every statement,” technically Postgres takes a new snapshot at the start of every query. For simple commands, they are the same. But if you run a complex query with CTEs (Common Table Expressions) or a query that calls a VOLATILE function multiple times, that entire query shares a single snapshot.

You would be able to see uncommitted results within your own transaction.

Single SQL statements all default to READ COMMITTED. Say you do a UPDATE account SET balance=100 WHERE name='Alice', this by default runs as READ COMMITTED, to change the default to say SERIALIZABLE you would need to SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;


BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Alice's balance starts as 100

SELECT balance FROM account WHERE name = 'Alice'; -- 100

-- another transaction updated Alice's balance to 70
SELECT balance FROM account WHERE name = 'Alice'; -- 70

-- another transaction updated Alice's balance to 30
UPDATE account SET balance = 100 where name = 'Alice' AND balance=30; -- this goes through but this also grabs a lock
SELECT balance FROM account WHERE name = 'Alice'; -- 100

UPDATE account SET balance = 20 where name = 'Alice'; -- this goes through because the current transaction has the lock
SELECT balance FROM account WHERE name = 'Alice'; -- 20

COMMIT;

The actual tests also show how the UPDATE in our transaction got the locks

When would using READ COMMITTED isolation level in postgres cause issues?

READ COMMITTED is a noice default, but there are two cases where it falls short.

Application relying on stale data

When the application logic stores information in memory and instead of using SQL where clauses it uses a “stale” in memory value to check if it could update or not.

// pseudo go code, ignoring error checking and rollback
func withdraw100() {
    tx.Begin()

    var currentBalance int
    tx.QueryRow(ctx, "SELECT balance FROM account WHERE name = 'Alice'", accountName).Scan(&currentBalance)

    if currentBalance >= 100 {
        newBalance := currentBalance - 100

        giveClient100Bucks()

        tx.Exec(ctx, "UPDATE account SET balance = $1 WHERE name = 'Alice'", newBalance)
    }

    tx.Commit()
}

Here imagine two withdraw100() were ongoing at the same time without committing, both of them would withdraw the money and set the new balance to 0. Where we allowed withdrew for 200 but in the system it shows that the balance is 0.

To fix the above we could

Do SELECT FOR UPDATE and lock the row, pessimistic concurrency control.
Add a balance check in UPDATE’s WHERE to make sure balance is 100, tx.Exec(ctx, "UPDATE account SET balance = $1 WHERE name = 'Alice' AND balance=$2", newBalance, currentBalance) If updated rows is 0 then we retry. This is optimistic concurrency control.
Use REPEATABLE READ, we will go through the details more but the gist is the transaction would fail if any of the following is true
- Rows our transaction is writing to is different than the snapshot took at the beginning. (Technically in REPEATABLE READ the snapshot is taken at the first non transactional control statement executes, first SELECT/UPDATE/DELETE etc, BEGIN doesn’t take the snapshot)
- Rows our transaction is writing to is Locked by another transaction

When SQL is doing statistics/aggregation calculation

Say in our example we want to calculate the SUM and AVG of balance in account table. Depend on how you write the SQL you might run into issues.

If you run ONE sql statement to get statistics it would always be accurate. Even if there is 1 million rows because in postgres when you run a SELECT statement it gives you the snapshot at that exact time, even if you are running with READ COMMITTED, you won’t read other committed messages because you are running ONE sql statement.

1
2

-- SAFE
SELECT AVG(balance), SUM(balance) from account;

However if you run two selects using READ COMMITTED, you would run into issues.

-- NOT SAFE
SELECT AVG(balance) from account;

-- Any other commits to account would be included in the SUM but not the AVG
SELECT SUM(balance) from account;

Repeatable Read

Repeatable Read in postgres falls into optimistic concurrency control, meaning if you use this isolation level your transaction might fail and you would need to be ready to rerun it.

When transactions run in REPEATABLE READ postgres takes a snapshot at the beginning of the transaction.

A REPEATABLE READ transaction would fail if any of the following is true

Rows our transaction is writing to is different than the snapshot took at the beginning. (Technically in REPEATABLE READ the snapshot is taken at the first non transactional control statement executes, first SELECT/UPDATE/DELETE etc, BEGIN doesn’t take the snapshot)
Our transaction is waiting for a lock on a row, the moment the other transaction commits the change our transaction wakes up and dies. Note that even the other transaction committed the same value as the snapshot we still die.

If multiple transactions collide on the exact same row under REPEATABLE READ, it becomes a “One Winner, All Others Lose” scenario.

It is important to remember that a Repeatable Read transaction fails when it tries to modify a row and that row was changed since the snapshot.

Below I’m listing a few properties of repeatable read.

The snapshot is taken at the first non transactional control statement executes
The snapshot is the entire database, even if you only selected 1 row (note that this is an illusion but you can think of it as the entire database in your mind)
Reads in a repeatable read won’t be affected by commits by other transactions
Postgres repeatable read won’t get newly inserted rows after the snapshot is taken

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- now there are two rows in account but since there is no non transactional control statement the snapshot is not taken

-- other transaction inserts new rows

-- even only selecting one row creates the snapshot of the entire database
SELECT * from account where name='Alice';

-- other transaction updates Alice's balance

-- this still shows the snapshot taken
SELECT * from account where name='Alice';

-- other transaction updates Bob's balance

-- this still shows the snapshot taken even if Bob wasn't the first selected row
SELECT * from account where name='Bob';

-- other transaction inserts new rows

SELECT * from account; -- this still shows the one and only snapshot

COMMIT;

After the properties let’s check out cases where your Repeatable Read would abort

In the repeatable read transaction if you write to a row but that row was committed by another transaction.

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SELECT * FROM account WHERE name='Pog'; -- creates snapshot for transaction

-- other transaction updates Pog's balance to 100, note that the other transaction finished the commit

-- current transaction attempting to update the same row would fail
UPDATE account SET balance=10 WHERE name='Pog';

COMMIT;

In the repeatable read transaction if you are waiting for a lock of a row but that row was changed by another transaction. As long as the other transaction updated the value(doesn’t even need to be different than snapshot, could be updated to the same value), it would kill the waiting transaction. In short it hang -> wakeup -> dies

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SELECT * FROM account WHERE name='Pog'; -- creates snapshot for transaction

-- other transaction holds the lock for Bob

-- our transaction tries to get the lock for Bob
-- note that we are only trying to get the lock not even updating it
-- because the other transaction has the lock this would block
-- however the moment the other transaction commits we wake up and die
-- EVEN if the other transaction updated the value to be same as the snapshot
SELECT * FROM account where name='Bob' FOR UPDATE;

COMMIT;

Where Repeatable Read passes but it’s wrong

Let’s have a look where repeatable read passes but the result isn’t correct. The famous example is the oncall doctor example. Note that we are not using database constraints. Given the following table and data

-- 1. Create the table
CREATE TABLE doctors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL,
    on_call BOOLEAN NOT NULL
);

-- 2. Insert our initial state (Alice and Bob are both on call)
INSERT INTO doctors (name, on_call) 
VALUES 
    ('Alice', true),
    ('Bob', true);

Right now both Alice and Bob are on call. The business rule is: at any given time there must be at least one doctor on call.

Now let’s see the following scenario

This would pass under repeatable read, but the final result is all doctors are not on call.

The reason this is passing under repeatable read is both transactions wrote to different rows and those rows didn’t change.

Tx1 wrote to Alice’s row which passes the snapshot check
Tx2 wrote to Bob’s row which passes the snapshot check

We’ll see how serializable can help to solve this.

Serializable

I’ve always been confused about the word “Serializable” and why/how is it related to database isolation levels.

The strict definition is

A schedule of concurrent transactions is “Serializable” if its final outcome (the state of the data) is mathematically identical to the outcome that would have been produced if those exact same transactions had been run strictly one at a time, in some sequential order.

In other words this means as long as there exists one sequential order of the transactions in play that could produce the same final result as the actual result then the transactions are serializable.

When multiple transactions are running concurrently it might yield different results, for them to be serializable, as long as the actual final outcome could be achieved by one sequential execution order then the transactions are serializable.

The above is very important, assume that we have two transactions Tx1, Tx2. When they run concurrently there might be two possible final results, resultA and resultB. Among them assume resultA can be achieved by running Tx1 and Tx2 in a specific order, but resultB can not be achieved by any sequential order of Tx1 and Tx2. This would result in transactions passing when the final result is resultA and transactions failing when the final result is resultB. Meaning that the same transcations running in serialization may or may not trigger the serialization error based on how they were executed.

Now let’s (1) look at a real example and (2) think more about how can this confusing definition help us in practice. Like when would I know if I should mark my isolation level serializable.

For (1), let’s check the doctor example again

There are two transactions

Tx1 which tries to take Alice off oncall only if after that there is at least one other doctor on call.
Tx2 which tries to take Bob off oncall only if after that there is at least one other doctor on call.

The final result is

Had we run these transactions one by one can we achieve this result?

The answer is no

If we run Tx1 first and commit, Tx2 won’t let Bob leave on call because he is the last doctor on call.
If we run Tx2 first and commit, Tx1 won’t let Alice leave on call because she is the last doctor on call.

So here we see the actual final result is in a state where no possible sequential execution of Tx1 and Tx2 could achieve, hence these two transactions are not serializable.

When these two transactions are ran under serializable, the first one committed would pass and the other one won’t be able to commit.

[Gemini Review] Nuance: Who actually gets killed? Saying “the first one to commit wins” is a simplification. Under the hood, Postgres’s SSI attempts to abort the transaction that causes the least amount of wasted work. It specifically looks to kill the “pivot” transaction in a conflict. Depending on the timing of locks and reads, Postgres might abort the first transaction that tries to commit, or it might kill a transaction mid-flight before a commit is even attempted.

Let’s see this in action so you know I’m not capping.

The transaction that tries to remove Alice from on call

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT count(*) FROM doctors WHERE on_call = true;

-- App logic evaluates: 2 - 1 = 1 (Safe)

UPDATE doctors SET on_call=false WHERE name='Alice';

COMMIT;

The transaction that tries to remove Bob from on call

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT count(*) FROM doctors WHERE on_call = true;

-- App logic evaluates: 2 - 1 = 1 (Safe)

UPDATE doctors SET on_call=false WHERE name='Bob';

COMMIT;

This feels like “some sort of cycle happened” and postgres caught it. The following is a way for you to be able to look at a few transactions and determine in your mind (or on paper) if the transactions are serializable in postgres. Just bear with me and I promise I’ll explain it later.

If there exists an ongoing transaction, you create a node in the graph
If a transaction Tx1 reads from a row, and another transaction writes to the same row, draw an edge from Tx1 to Tx2
If a cycle forms in the graph, then these transactions are not serializable

Let’s walk through the doctor on call example on paper.

At T2, there exists two serializable transactions in the system. So we draw two nodes in the graph. No edges yet
At T3, Tx1 read from two rows, the Alice row and the Bob row. No edges yet
At T4, Tx2 read from two rows, the Alice row and the Bob row. No edges yet
At T7, Tx1 write to the Alice row, we know that Tx2 read from Alice row, so we draw a directed edge from Tx2 to Tx1. Even if we have edges there are no cycles yet
At T8, Tx2 write to the Bob row, we know that Tx1 read from Bob row, so we draw a directed edge from Tx1 to Tx2. Now we have a cycle. (Tx1 <-> Tx2), hence the transactions are not serializable.

The graph we created above is called Precedence Graph in database theory. The actual technique postgres used to implement serializable is Serializable Snapshot Isolation, this is also known as SSI.

[Gemini Review] Deep Dive: Dangerous Structures and False Positives While thinking about “cycles” is the mathematically correct way to understand serializability, actually traversing a massive graph of concurrent transactions to find cycles is too slow for a database. Instead, Postgres looks for a specific heuristic called a “dangerous structure” (specifically, two consecutive read-write conflicts: T1 -rw-> T2 -rw-> T3). Because this is a heuristic, Postgres will sometimes abort a transaction that matches this pattern even if a full cycle never actually formed. SSI trades perfect precision for performance, meaning it occasionally yields “false positives” and kills perfectly safe transactions.

One other way to think about serializable in postgres is

In postgres Serializable == REPEATABLE READ + Serializable Snapshot Isolation

Hope you haven’t forgot about (2) “when do I actually need serializable?”.

The answer is whenever your need to read and write based on the results of read you should consider using serializable. I used the word consider here because a lot of problems could be solved by Locks and Constraints. You don’t always need to slap on serializable to achieve your goal.

SSI only works on transactions that run under the SERIALIZABLE isolation level

One thing to be careful is that the graph cycle check algorithm we just went through above only applies when transactions are running under serializable isolation level. In the doctor on call example, if one transaction runs in serializable and the other runs in repeatable read, we would still get to the no one on call final state.

The reason is because the read/write tracking and edge drawing from read to write only tracks transactions that are running in serializable isolation level. If a transaction is not running in serializable, it is invisible to SSI. Let’s see this in action.

ww, wr, rw

If you look at the actual error that was thrown when I was trying to commit the serializable transaction you would see

ERROR: could not serialize access due to read/write dependencies among transactions

Why does it say “read/write”? Why isn’t this “write/read”? Are there any implications of this?

Turns out there actually is. We just mentioned that the underlying theory of SSI is Precedence Graph. In Precedence Graph there are three cases where we would draw an edge from one transaction to the other. The three cases are

write/write: Tx1 writes a row first and then Tx2 writes the same row
write/read: Tx1 writes a row first and then Tx2 reads the same row
read/write: Tx1 reads a row first and then Tx2 writes the same row

Now there are three questions

What is the difference between read/write and write/read again?
Why isn’t there a read/read conflict?
Why in the algorithm above you only draw an edge when read/write happened?

For (1), the difference between read/write and write/read is the order. read/write means for the same row Tx1 read it first and Tx2 wrote to it. Where as write/read means for the same row Tx1 wrote first then Tx2 read from it.

The answer of (2) is kinda trivial, read/read does happen but it won’t affect the final result hence the transactions are by definition serializable.

Things are more interesting when we look at (3).

Postgres’s SSI only tracks read/write conflicts. The reason is because write/write and write/read are handled by different mechanisms.

Remember this one liner?

In postgres serializable == repeatable read + SSI

When you run a transaction under serializable, it has all the properties of repeatable read. And it turns out that repeatable read + locks can handle both write/write and write/read. This is the reason why SSI only cares about read/write.

Let’s look at write/write.

In postgres write/write is handled by locks, we’ve seen this in repeatable read hang -> wakeup -> dies.

In the repeatable read transaction if you are waiting for a lock of a row but that row was changed by another transaction. As long as the other transaction updated the value(doesn’t even need to be different than snapshot, could be updated to the same value), it would kill the waiting transaction. In short it hang -> wakeup -> dies

Now lets look at write/read.

A write/read happens when Tx1 first writes and Tx2 reads from the same row, but if you scroll up a bit you would see that in repeatable read postgres takes one snapshot for the entire transaction. When Tx2 reads after Tx1 wrote it actually reads from the snapshot, so whatever Tx1 wrote doesn’t really matter to Tx2.

With this sorted out all is left for SSI to solve is read/write.

Hence why we only draw the directed edge when there is a read/write.

Further things to think about

At this point we can reason about transactions with abstractions like “snapshot” and “precedence graph”, but all these are hard to actually implement. There also exists many other questions to learn about, I’ll probably try to learn them at some point, but for now I’m just listing them as baits for you to be curious.

What exactly is MVCC? How exactly does it work in postgres?
How does postgres actually get the entire snapshot for a single repeatable read transaction?
How does postgres keep track of the precedence graph effectively?
How does postgres do cycle checks of the precedence graph so fast?
How does other databases handle repeatable read and serialization?
For databases that has Read Uncommitted (Microsoft SQL Server), why is it the case?
What does MySQL or Microsoft SQL Server guarantee for repeatable read and serialization?
How does MySQL or Microsoft SQL Server handle repeatable read and serialization? Can you see why/how the implementation relates to the guarantee they provide?

Update: 2026-04-27

Tags:

<< Learn Bubbletea Part 3