ACID: My Personal “C” Change
How could I miss such a simple thing?
For decades, I thought the “C” in transactional ACID was the weakest property. This just shows how I can be a dummy for 37 years.
When looking at Wikipedia for Consistency (database systems), I find a pretty typical explanation of the “C” in ACID:
Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors cannot result in the violation of any defined database constraints.
This didn’t seem very concrete so I’ve always been somewhat dismissive of it as being equal in stature to Atomicity, Isolation, and Durability. In general, I figured the “C” characteristics described above were really just a restatement of isolation.
I had a chance a couple of days ago to chat with my old friend, Andreas Reuter, the inventor of ACID. He and his PhD advisor, Theo Härder, coined the term in their famous paper Principles of Transaction-Oriented Database Recovery in 1983.
Since transactions, both theory and practice, had been one of my biggest passions since 1978, I vividly remembered the publication of this paper and the creation of the ACID concept. When I got the chance to meet Andreas in person in 1985, we became friends.
Lately, I’ve been thinking more about consistency and what it all means in our many computer science communities. Since Andreas and I planned to chat this last week, I thought I’d discuss his intentions when he included the “C” in ACID. In advance, I pulled up the paper that introduced the concept of “C” and read it again. Nothing new jumped out at me.
As Andreas and I caught up on our lives and on our thoughts on technology, I asked him about adding “Consistency” to the ACID test. Wasn’t it just warmed over “Isolation”? He said that he felt the application needed to control what was included in the transaction to ensure the rules of the application were consistent.
The “C” meant the application decided the completion of the set of changes. The database can’t stop taking changes until the app said so.
Hence, as the application wrote the set of changes it could enforce constraints, cascades, triggers, and so forth. It could also enforce application specific rules meaningful only within that application (for example, an airline’s special treatment of frequent flyers).
I was gobsmacked! For 37 years, I’d had it wrong! “C” is a powerful, simple, and important member of ACID. That simple additional rule, side-by-side with Atomic, Isolated, and Durable DID allow for a more cohesive semantic enforced by an application as it changed the database.
It was the definitions I’d seen for so many years, as typified by the Wikipedia entry cited above, that cast the “C” in a narrower light by focusing in on the consequences of the actual rule and not on the rule itself.
But what of the original paper that introduced ACID in the first place? I had read it just minutes before chatting with Andreas! Looking at it (yet again) after our phone call I clearly saw:
Consistency. A transaction reaching its normal end (EOT, end of transaction), thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results…
So, even as I looked at the definition before asking Andreas, I had on blinders based on almost 4 decades of seeing “C” based on my assumptions. I couldn’t get the simplicity until Andreas said it so bluntly.
One big lesson for me is to work hard to ALWAYS question your assumptions. Surround yourself with curious and passionate people, both young and old, who will challenge you and try to dislodge your blinders.
Thanks to my old friend Andreas, too!
I don’t get it. Doesn’t Atomicity guarantee that a transaction is executed to its natural end? And if the transaction is buggy, it would break application semantics regardless of what the database does. So what does C really do here?
I would like to add, that if you try to look at the combination of an application and a database managed by an database engine as a database system, then it all becomes obvious. The information system itself has a transaction, and that should be consistent. It should be consistent with the activity performed by the user, and the business rules of the organization(s) who own()s and operates the information system (aka database system).
This terminology perspective could also be extended to distributed systems (which could just be looked as system of systems).
Having to deal almost on a daily basis with the effects of eventual consistency (i.e. when some extracted some data in Excel and then compares it with something else in another Excel extracted at some different point in time, based on potentially different rules), I dream about the C letter. Unfortunately reality is not what we always want. I would also like people to be able to freely walk and talk on the streets, or play together - but this is not the current state.
Or maybe that is because we are in transition, or in other words - in a long running transaction :-) .