I don’t get it. Doesn’t Atomicity guarantee that a transaction is executed to its natural end? And if the transaction is buggy, it would break application semantics regardless of what the database does. So what does C really do here?
Fair comment. I honestly expected that Atomicity would cover the completion of the requests issued by the database as a part of the transaction. This "C" statement is more explicit.
Your comment is why I tacitly assumed there wasn't much to "C".
I would like to add, that if you try to look at the combination of an application and a database managed by an database engine as a database system, then it all becomes obvious. The information system itself has a transaction, and that should be consistent. It should be consistent with the activity performed by the user, and the business rules of the organization(s) who own()s and operates the information system (aka database system).
This terminology perspective could also be extended to distributed systems (which could just be looked as system of systems).
Having to deal almost on a daily basis with the effects of eventual consistency (i.e. when some extracted some data in Excel and then compares it with something else in another Excel extracted at some different point in time, based on potentially different rules), I dream about the C letter. Unfortunately reality is not what we always want. I would also like people to be able to freely walk and talk on the streets, or play together - but this is not the current state.
Or maybe that is because we are in transition, or in other words - in a long running transaction :-) .
I'm a big believer that the app and the database should be viewed as a combination. In fact, I think we should NOT view the world as updatable things but rather more like an accountants ledger through which immutable journal entries can be projected as the result of all the immutable appends to the journal.
Unfortunately, even when SANE apps do this, they need to project their journal-style state onto update-in-place database semantics.
I think the crux of my concern about the current C in ACID is that I believe the transaction semantic should be AS SEEN BY THE TRANSACTION SYSTEM. Hence, C for Consistency means we are conflating layers. Just my personal belief.
I have always thought that ACID is the desired set of properties of a transactional _system_ from the point of view of the user. I think of consistency as "consistent with the user's expectations", tacitly implying strong expectations. This is the sense in which we call Babe Ruth and Leo Messi consistent players.
To that end, I think of ACID as the _combined_ responsibility of the app and db, not that of the db alone, just as reliability is a whole system property; the user is not impressed if just the power supply is robust, for example. Sure, with a more powerful db, the app can let the db handle the generic parts of state management (A/I/D), and some parts of C (foreign key constraints, cascading deletes), but that is an internal matter; the user couldn't care who handles which part, as long as the money transferred as expected.
C comes across as a weak property only because we keep referring to the properties of a db taken by itself. When we take the system as a whole -- as we must -- C is a first class property, since it encodes the concept of what it means for the user's transaction to be valid -- the state change must involve all the updates expected in a transaction, in accordance with business rules.
Using an ACID database does not automatically bestow ACID to the transaction. Many apps regularly screw up atomicity by using redis in the scope of a db transaction, for example.
Let us not speak of ACID dbs. Only ACID transactional systems..
I do agree that this is dominantly a property of the transactional portion of the system. My blog post is about the fact that STILL we are debating it and, after 35+ years, I am still learning.
The real interesting thing (to me) is that the transactional part of the system and its Atomicity did not formally include a notion of "I'm complete". That was what Andreas's original paper said and I missed it. He defined C as "the app says its got everything". Now, it's true that the portion of the database implementing foreign key constraints can leverage this to ensure its constraints. Similarly, the app specific non-db constraint can be bundled with this.
I wish Andreas had named this exact property "Complete" giving Atomic, Complete, Isolated, and Durable. All of these "consistency" things we are discussing are usage patterns of "Complete".
The DBMS provides durability and tools for atomicity and independence. Consistency is largely outside of the realm of DBMS developers.
The application defines consistency, and implements it using the DBMS's tools. The app chooses the transaction boundaries and locking models necessary to achieve his notion of consistency.
If the database starts out in a consistent state (again, from the point of view of the app), and each transaction is correctly consistent, then an inductive proof shows that the database remains consistent. Of course, this proof is not possible without the atomicity and independence provided by the DBMS.
In short, consistency seemed unimportant because you spent most of your career as a DBMS developer instead of an app developer. From the app's perspective, consistency and durability are everything, while atomicity and independence are merely tools for achieving consistency.
For the folks that don't know us, Jim and I worked together first in 1978. In 1980, he got the hiring bonus when I went to Tandem. When Jim quit Tandem and rejoined, I got the hiring bonus for Jim (making us recursive). Later, in the 1990s, Jim came to Microsoft and we worked together for many years. So... Howdy, Jim!!!
I agree that consistency is super important for the app (and for me). What I missed is the subtlety that all consistency means to the transactional system is that the app can say when enough's enough. When it is complete.
You are correct about the paramount importance of consistency to the app. Andreas's initial definition of consistency says:
"Consistency. A transaction reaching its normal end (EOT, end of transaction), thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results…"
HIS definition is that the app's consistency is derived from "I'm done now".
That was my epiphany. Hence, there is a very tangible meaning (even though I personally would have called it "Complete" in hind site.
It's good that you have come to realise that Consistency is something a DBMS needs the application to take some responsibility for, foreign-key constraints etc can only get you so far, the DBMS can't model and enforce all the data integrity rules the application may need.
I think it's important to understand that Consistency is a state at the *end* of a transaction. During the transaction the database can and will be inconsistent. For example an order in a database is not consistent when only half of its items have been inserted, it is consistent only after all its items have been inserted. C.J. Date has done a lot of damage by insisting (as he does for instance on pg 415 of his book Database Design and Relational Theory) that the DBMS should enforce Consistency rules after each statement, not each commit.
One more comment. In a past life, we spent a bunch of time working on trigger firing at end of statement (for app consistency) versus trigger enforcement at end of transaction.
Managing the implementation end-of-transaction firing of triggers is more challenging (as you need to track triggers and their order until end-of-transaction).
The application consistency semantic is more challenging at end of statement since the partial work performed by the statement may leave the database in an interim state that is not compliant with the consistency constraints. MAYBE you can make this OK for the DBMS constraints like foreign key or cascading deletes. It's hard. It's even harder to make that work with application constraints that are unbounded until end-of-transaction.
I always knew the application owned that semantics. One funky aspect to explaining this comes from the fact that triggers/referential-intergrity and the like are (in my mind) imposed ABOVE the ACID layer in the upper parts of the typical database.
I tend to focus on the transactional/access-method side of the database. My colleagues at work often hear me say "I don't understand the optimizer very well. I specialize in the pessimizer."
In my mind, I always focused on that ACID perspective of how the APP can define the set of changes and OWN the consistency. From that perspective, I bundled referential-integrity and similar concepts into the App when I considered consistency.
It has always bugged me that the "C" seemed amorphous. I, too, assumed atomicity included the notion that he app defined the end of the atomic set of changes.
Frankly, seeing the "C" as a crisper concept of "the end" makes the whole thing bounded in a way I see as simpler and more tractable.
I'm just astounded I've been befuddled about this for so long and LOVE telling people what got me tangled up in the hope that I'm not the only one wandering the desert of transactional semantics.
I agree with you about the challenges with Date's characterization (and, indeed, the one in Wikipedia, too). I wasted a lot of time in the early 1990s noodling over intra-transaction semantic enforcement. I might be willing to speculate how much that impacted my hair loss.
Please keep arguing is I'm setting off your bullshit-o-meter.
I'm a little confused about what you're trying to say. It's clear that you used to think that "C" wasn't very useful or important but now you do think "C" is useful and important; but what's not clear is why. I thought you were saying that when you realised the App owned the Consistency semantics and not the DBMS that that is when you realised "C" was important, but maybe that isn't what you're saying..?
In other news one topic I've been thinking deeply about recently is "nested transactions". I decided in the end that I won't use such things because if I "COMMIT" I want to know that my changes have been persisted durably into the database, not "committed" into a parent transaction which may itself be rolled back. So if you're looking for a future topic I'd love to hear your thoughts on nested transactions. :)
IMHO, I think there's a blurry boundary between the upper half of the DBMS's constraints (e.g. cascading enforcement) and the application's constraints (e.g. ensuring all the parts of an application's order are complete). Why are the DBMS's features part of consistency and the application's features are not.
When I read summaries of the "C" in ACID, it always bundled the features provided with the DBMS and the features needed in the application. This seemed wonky to me as one is enforced by the DBMS and one is not.
Now, with my new recognition that the creators of ACID simply meant that the application (tacitly the union of the upper half of the DBMS and the application proper) can crisply delineate the set of changes, each of these can enforce their respective concerns (and consistency) by being in control of the end-of-work.
This seems to me to be a cleaner feature of the interface between the PROVIDER OF THE DESIRED UPDATES and the GUARANTOR OF THEIR INTEGRITY.
This is muddied by the DBMS including these higher level abstractions on top of the part of the database doing the ACID work. There's no good word for "the union of the upper half of the database and the app atop it" and there's no good word for "the stuff ensuring the ACID changes in the bottom of the DBMS".
I am MUCH happier with "C" describing the cleanliness of the set of changes being applied rather than an amorphous "consistency".
In hindsight, I'd name the "C" in ACID to be Complete. Just my opinion.
W.r.t. nested transactions, they can be really (REALLY) useful. Many DBMS systems provide statement rollback. This is a form of nested transaction that works well and is not hard to implement if there's no multi-statement parallelism necessitating locking and lock inheritance across statements.
Back when I was architecting Microsoft Transaction Server, we had a graph of components each of which was (or was not) designated as transactional. We knew many components would want to sometimes return an error and undo the portion of the transactional updates before returning the error. Isolation around this when parallel components were executing and there were sometimes multiple backend databases (using 2 phase commit) was hard. Just aborting the overarching transaction was correct but somewhat reduced the convenience of returning component errors.
No one (to my knowledge) has done a production grade implementation of nested transactions. Still, I think they have a solid place in our taxonomy of behaviors.
I am noodling over how to explain the complex world of this an more when intersecting with parallelism, distribution, and replicas (including partitioning). It will take me a while to draw it all together. It is, however, keeping me thinking.
Hey Pat. Happy to keep the dialog going! :) ...it is traditional, after all. I'm not sure if you remember but I'm the same John from 15 years ago who responded to your SOA Is Like The Night Sky blog post and kept you busy typing out replies to me for a few days. I've made a few unsuccessful attempts over the past 15 years to get back in touch. I am very pleased that we're in contact again. :)
Awesome!! I hadn't put you together in my mind with that fun back and forth 15 years ago. I remember the interaction but hadn't associated that your name.
The world doesn't need faster transactions (except for blockchain), it needs consistent transactions. The problem is: distributed applications do not have access the the minimum and necessary 'information' to recover over a network that can drop, reorder or duplicate packets.
While I don't disagree with you AT ALL, I don't think that's part of an ACID transaction.
ACID was created with database-ey people didn't think much about distribution nor the semantics of multi-master update. Distributed system people were beginning to think that way but based on separate objects being independently reconciled.
I absolutely DO think formal notions of correctness across distributed and replicated "things" make a ton of sense. IMHO, they will have extended notions of which ACID is a proper subset of the truth.
This is much like Newtonian physics is an excellent approximation of a subset of Einstein's Special Relativity (and indeed General Relativity) -- excellent approximation at somewhat small dimensions. Newtonian physics is VERY useful close to home. So are databases and ACID semantics.
I don’t get it. Doesn’t Atomicity guarantee that a transaction is executed to its natural end? And if the transaction is buggy, it would break application semantics regardless of what the database does. So what does C really do here?
Fair comment. I honestly expected that Atomicity would cover the completion of the requests issued by the database as a part of the transaction. This "C" statement is more explicit.
Your comment is why I tacitly assumed there wasn't much to "C".
I would like to add, that if you try to look at the combination of an application and a database managed by an database engine as a database system, then it all becomes obvious. The information system itself has a transaction, and that should be consistent. It should be consistent with the activity performed by the user, and the business rules of the organization(s) who own()s and operates the information system (aka database system).
This terminology perspective could also be extended to distributed systems (which could just be looked as system of systems).
Having to deal almost on a daily basis with the effects of eventual consistency (i.e. when some extracted some data in Excel and then compares it with something else in another Excel extracted at some different point in time, based on potentially different rules), I dream about the C letter. Unfortunately reality is not what we always want. I would also like people to be able to freely walk and talk on the streets, or play together - but this is not the current state.
Or maybe that is because we are in transition, or in other words - in a long running transaction :-) .
Nice comments!
I'm a big believer that the app and the database should be viewed as a combination. In fact, I think we should NOT view the world as updatable things but rather more like an accountants ledger through which immutable journal entries can be projected as the result of all the immutable appends to the journal.
Unfortunately, even when SANE apps do this, they need to project their journal-style state onto update-in-place database semantics.
I think the crux of my concern about the current C in ACID is that I believe the transaction semantic should be AS SEEN BY THE TRANSACTION SYSTEM. Hence, C for Consistency means we are conflating layers. Just my personal belief.
This is my take.
I have always thought that ACID is the desired set of properties of a transactional _system_ from the point of view of the user. I think of consistency as "consistent with the user's expectations", tacitly implying strong expectations. This is the sense in which we call Babe Ruth and Leo Messi consistent players.
To that end, I think of ACID as the _combined_ responsibility of the app and db, not that of the db alone, just as reliability is a whole system property; the user is not impressed if just the power supply is robust, for example. Sure, with a more powerful db, the app can let the db handle the generic parts of state management (A/I/D), and some parts of C (foreign key constraints, cascading deletes), but that is an internal matter; the user couldn't care who handles which part, as long as the money transferred as expected.
C comes across as a weak property only because we keep referring to the properties of a db taken by itself. When we take the system as a whole -- as we must -- C is a first class property, since it encodes the concept of what it means for the user's transaction to be valid -- the state change must involve all the updates expected in a transaction, in accordance with business rules.
Using an ACID database does not automatically bestow ACID to the transaction. Many apps regularly screw up atomicity by using redis in the scope of a db transaction, for example.
Let us not speak of ACID dbs. Only ACID transactional systems..
Thoughts?
I do agree that this is dominantly a property of the transactional portion of the system. My blog post is about the fact that STILL we are debating it and, after 35+ years, I am still learning.
The real interesting thing (to me) is that the transactional part of the system and its Atomicity did not formally include a notion of "I'm complete". That was what Andreas's original paper said and I missed it. He defined C as "the app says its got everything". Now, it's true that the portion of the database implementing foreign key constraints can leverage this to ensure its constraints. Similarly, the app specific non-db constraint can be bundled with this.
I wish Andreas had named this exact property "Complete" giving Atomic, Complete, Isolated, and Durable. All of these "consistency" things we are discussing are usage patterns of "Complete".
The DBMS provides durability and tools for atomicity and independence. Consistency is largely outside of the realm of DBMS developers.
The application defines consistency, and implements it using the DBMS's tools. The app chooses the transaction boundaries and locking models necessary to achieve his notion of consistency.
If the database starts out in a consistent state (again, from the point of view of the app), and each transaction is correctly consistent, then an inductive proof shows that the database remains consistent. Of course, this proof is not possible without the atomicity and independence provided by the DBMS.
In short, consistency seemed unimportant because you spent most of your career as a DBMS developer instead of an app developer. From the app's perspective, consistency and durability are everything, while atomicity and independence are merely tools for achieving consistency.
First of all, HOWDY JIMBO!!!
For the folks that don't know us, Jim and I worked together first in 1978. In 1980, he got the hiring bonus when I went to Tandem. When Jim quit Tandem and rejoined, I got the hiring bonus for Jim (making us recursive). Later, in the 1990s, Jim came to Microsoft and we worked together for many years. So... Howdy, Jim!!!
I agree that consistency is super important for the app (and for me). What I missed is the subtlety that all consistency means to the transactional system is that the app can say when enough's enough. When it is complete.
You are correct about the paramount importance of consistency to the app. Andreas's initial definition of consistency says:
"Consistency. A transaction reaching its normal end (EOT, end of transaction), thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results…"
HIS definition is that the app's consistency is derived from "I'm done now".
That was my epiphany. Hence, there is a very tangible meaning (even though I personally would have called it "Complete" in hind site.
It's good that you have come to realise that Consistency is something a DBMS needs the application to take some responsibility for, foreign-key constraints etc can only get you so far, the DBMS can't model and enforce all the data integrity rules the application may need.
I think it's important to understand that Consistency is a state at the *end* of a transaction. During the transaction the database can and will be inconsistent. For example an order in a database is not consistent when only half of its items have been inserted, it is consistent only after all its items have been inserted. C.J. Date has done a lot of damage by insisting (as he does for instance on pg 415 of his book Database Design and Relational Theory) that the DBMS should enforce Consistency rules after each statement, not each commit.
One more comment. In a past life, we spent a bunch of time working on trigger firing at end of statement (for app consistency) versus trigger enforcement at end of transaction.
Managing the implementation end-of-transaction firing of triggers is more challenging (as you need to track triggers and their order until end-of-transaction).
The application consistency semantic is more challenging at end of statement since the partial work performed by the statement may leave the database in an interim state that is not compliant with the consistency constraints. MAYBE you can make this OK for the DBMS constraints like foreign key or cascading deletes. It's hard. It's even harder to make that work with application constraints that are unbounded until end-of-transaction.
I always knew the application owned that semantics. One funky aspect to explaining this comes from the fact that triggers/referential-intergrity and the like are (in my mind) imposed ABOVE the ACID layer in the upper parts of the typical database.
I tend to focus on the transactional/access-method side of the database. My colleagues at work often hear me say "I don't understand the optimizer very well. I specialize in the pessimizer."
In my mind, I always focused on that ACID perspective of how the APP can define the set of changes and OWN the consistency. From that perspective, I bundled referential-integrity and similar concepts into the App when I considered consistency.
It has always bugged me that the "C" seemed amorphous. I, too, assumed atomicity included the notion that he app defined the end of the atomic set of changes.
Frankly, seeing the "C" as a crisper concept of "the end" makes the whole thing bounded in a way I see as simpler and more tractable.
I'm just astounded I've been befuddled about this for so long and LOVE telling people what got me tangled up in the hope that I'm not the only one wandering the desert of transactional semantics.
I agree with you about the challenges with Date's characterization (and, indeed, the one in Wikipedia, too). I wasted a lot of time in the early 1990s noodling over intra-transaction semantic enforcement. I might be willing to speculate how much that impacted my hair loss.
Please keep arguing is I'm setting off your bullshit-o-meter.
I'm a little confused about what you're trying to say. It's clear that you used to think that "C" wasn't very useful or important but now you do think "C" is useful and important; but what's not clear is why. I thought you were saying that when you realised the App owned the Consistency semantics and not the DBMS that that is when you realised "C" was important, but maybe that isn't what you're saying..?
In other news one topic I've been thinking deeply about recently is "nested transactions". I decided in the end that I won't use such things because if I "COMMIT" I want to know that my changes have been persisted durably into the database, not "committed" into a parent transaction which may itself be rolled back. So if you're looking for a future topic I'd love to hear your thoughts on nested transactions. :)
Hey, John!
Thanks for keeping the dialog going!
IMHO, I think there's a blurry boundary between the upper half of the DBMS's constraints (e.g. cascading enforcement) and the application's constraints (e.g. ensuring all the parts of an application's order are complete). Why are the DBMS's features part of consistency and the application's features are not.
When I read summaries of the "C" in ACID, it always bundled the features provided with the DBMS and the features needed in the application. This seemed wonky to me as one is enforced by the DBMS and one is not.
Now, with my new recognition that the creators of ACID simply meant that the application (tacitly the union of the upper half of the DBMS and the application proper) can crisply delineate the set of changes, each of these can enforce their respective concerns (and consistency) by being in control of the end-of-work.
This seems to me to be a cleaner feature of the interface between the PROVIDER OF THE DESIRED UPDATES and the GUARANTOR OF THEIR INTEGRITY.
This is muddied by the DBMS including these higher level abstractions on top of the part of the database doing the ACID work. There's no good word for "the union of the upper half of the database and the app atop it" and there's no good word for "the stuff ensuring the ACID changes in the bottom of the DBMS".
I am MUCH happier with "C" describing the cleanliness of the set of changes being applied rather than an amorphous "consistency".
In hindsight, I'd name the "C" in ACID to be Complete. Just my opinion.
W.r.t. nested transactions, they can be really (REALLY) useful. Many DBMS systems provide statement rollback. This is a form of nested transaction that works well and is not hard to implement if there's no multi-statement parallelism necessitating locking and lock inheritance across statements.
Back when I was architecting Microsoft Transaction Server, we had a graph of components each of which was (or was not) designated as transactional. We knew many components would want to sometimes return an error and undo the portion of the transactional updates before returning the error. Isolation around this when parallel components were executing and there were sometimes multiple backend databases (using 2 phase commit) was hard. Just aborting the overarching transaction was correct but somewhat reduced the convenience of returning component errors.
No one (to my knowledge) has done a production grade implementation of nested transactions. Still, I think they have a solid place in our taxonomy of behaviors.
I am noodling over how to explain the complex world of this an more when intersecting with parallelism, distribution, and replicas (including partitioning). It will take me a while to draw it all together. It is, however, keeping me thinking.
Thanks for your interest!!
Pat
Hey Pat. Happy to keep the dialog going! :) ...it is traditional, after all. I'm not sure if you remember but I'm the same John from 15 years ago who responded to your SOA Is Like The Night Sky blog post and kept you busy typing out replies to me for a few days. I've made a few unsuccessful attempts over the past 15 years to get back in touch. I am very pleased that we're in contact again. :)
Awesome!! I hadn't put you together in my mind with that fun back and forth 15 years ago. I remember the interaction but hadn't associated that your name.
That was fun!
Thanks Pat. I'm pretty excited that you're blogging again and look forward to our future chats! :)
The world doesn't need faster transactions (except for blockchain), it needs consistent transactions. The problem is: distributed applications do not have access the the minimum and necessary 'information' to recover over a network that can drop, reorder or duplicate packets.
While I don't disagree with you AT ALL, I don't think that's part of an ACID transaction.
ACID was created with database-ey people didn't think much about distribution nor the semantics of multi-master update. Distributed system people were beginning to think that way but based on separate objects being independently reconciled.
I absolutely DO think formal notions of correctness across distributed and replicated "things" make a ton of sense. IMHO, they will have extended notions of which ACID is a proper subset of the truth.
This is much like Newtonian physics is an excellent approximation of a subset of Einstein's Special Relativity (and indeed General Relativity) -- excellent approximation at somewhat small dimensions. Newtonian physics is VERY useful close to home. So are databases and ACID semantics.