MHordecki
today at 1:44 PM
I’ve found this article lacking. Like some other articles in this space, it introduces isolation levels through the lens of the phenomena described in the SQL standard, but I find that there’s a different, more intuitive approach.
I think it’s more tractable to define this problem space starting from the concept of (strict) serializability, which is really a generalization of the concept of thread safety. Every software engineer has an intuitive understanding of it. Lack of serializability can lead to execution-dependent behavior, which usually results in hard-to-diagnose bugs. Thus, all systems should strive towards serializability, and the database can be a tool in achieving it.
Various non-serializable levels of database transaction isolation are relaxations of the serializability guarantee, where the database no longer enforces the guarantee and it’s up to the database user to ensure it through other means.
The isolation phenomena are a useful tool for visualizing various corner cases of non-serializability, but they are not inherently tied to it. It's possible to achieve serializability while observing all of the SQL phenomena. For example, a Kubernetes cluster with carefully-written controllers can be serializable.
Author here. This is good feedback.
The combination of transactions, isolation levels, and MVCC is such a huge undertaking to cover all at once, specially when comparing how it's done across multiple DBs which I attempted here. Always a balance between technical depth, accessibility to people with less experience, and not letting it turn into an hour-long read.
libraryofbabel
today at 5:05 PM
I actually like this article a lot. I do a bit of teaching, and I imagined the ideal audience for this as a smart junior engineer who knows SQL and has encountered transactions but maybe doesn’t really understand them yet. I think introducing things via examples of isolation anomalies (which most engineers will have seen examples of in bugs, even if they didn’t fully understand them) gives the explanation a lot more concreteness than starting with serializability as a theoretical concept as GP is proposing. Sure, strict serializability is a powerful idea that ties all this together and is more satisfying for an expert who already knows this stuff. But for someone who is just learning, you have to motivate it first.
If anything, I’d say it might be better to start with the lower isolation levels first, highlight the concurrency problems that can arise with them, and gradually introduce higher isolation levels until you get to serializability. That feels a bit more intuitive rather than downward progression from serializability to read uncommitted as presented here.
It also might be nice to see a quick discussion of why people choose particular isolation levels in practice, e.g. why you might make a tradeoff under high concurrency and give up serializability to avoid waits and deadlocks.
But excellent article overall, and great visualizations.
I love the work planetscale does on keeping this type of content accurate yet accessible. Keep it up!
https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster
More notation, more citations, more better.
Notation is useful. Citations are nice for further reading. But I don't agree more of this makes for a better article!
peterclary
today at 2:33 PM
Looks like the author is geoblocking in protest of the UK Online Safety Act (and fair enough).
lateforwork
today at 3:08 PM
Most RDBMSs offer serializable isolation if you need it. Often you don't need it. The downside of using serializable isolation unnecessarily is reduced concurrency and throughput due to increased coordination between transactions.
Yep. Its a wonderful capability to have for some situations, but for 90% of applications SERIALIZABLE isolation is overkill.
ignoramous
today at 7:49 PM
> concept of (strict) serializability [("S")], which is really a generalization of the concept of thread safety
Unsure why "strict" (L + S) is in braces: Linearizability ("L") is what resembles safety in SMP systems the most?
Then recommend a better explanation?