The Myths about Transactions (ACID) and NoSQL

There has been widespread characterization of one of the major distinctions between NoSQL and traditional DBMSs by saying that the former don’t care for ACID semantics or that transactions aren’t needed. This is an oversimplification to say the least. As long as the NoSQL system supports incremental updates by concurrent set of users (as opposed to only single-threaded bulk or batch updates), even if multi-API-calls transactions are not supported, at least within the internals of such a system some notion of transaction is essential to retain a certain level of sanity of the internal design and keep things consistent. This is even more important if the system supports replication and/or the updating of multiple data structures within the system even in a single API call (e.g., if there are multiple access paths which have to be updated). Similar points apply to locking and recovery semantics and functionality. 

The above sorts of issues are real and were quite tricky to handle in Lotus Notes, which used very ad hoc ways of dealing with the associated complications, until log-based recovery and transaction support were added in R5 (http://bit.ly/LNotes). From Day 1 in 1989, Notes has supported replication and disconnected operations with the consequent issues of potentially conflicting parallel updates having to be dealt with. Even RDBMSs were late in dealing with that kind of functionality. 

Even if at the individual object level, high concurrency isn’t important given the nature of a NoSQL application, it might still be important from the viewpoint of the internal data structures of the NoSQL system to support high concurrency or fine granularity locking/latching (e.g., for dealing with concurrent accesses to the space management related data structures - see http://bit.ly/CMSpMg).

Vague discussions about NoSQL systems and ACID semantics make many people think that RDBMSs enforce strong ACID semantics all the time. This is completely wrong if by that people imply serializability as the correctness property for handling concurrent execution of transactions. Even from the very beginning, RDBMSs (System R and products that came from it) have supported different degrees of isolation, in some cases even the option of of being able to read uncommitted data, and different granularities of locking (http://bit.ly/CMQuCC). Even with respect to durability, in-memory RDBMSs like TimeTen and SolidDB which came much later, allowed soft commits, etc., trading off durability guarantees for improved performance. 

In my last 2 posts on NoSQL (http://bit.ly/NoSQLt http://bit.ly/NoSQL2), I gave a lot of information on my background to make it clear to the readers that this whole space of data management is a tricky business. The devil is in the details and it isn’t for the faint hearted :-) I wanted to make it clear that I don’t believe in quick and dirty approaches to handling intrinsically complicated issues and that I am not somebody who takes frequent elevator rides with VCs :-) At the same time, I am not an ivory tower researcher either! When I hear many presentations on “my kind of topics” at various conferences and meetings like the Hadoop User Group (HUG), I have a tough time making sense of what is going on given the high level nature of what is being presented with no serious attempts being made to compare what is proposed with what has been done before and about which more is known. 

Of course, NoSQL systems aren’t the only context in which such things have happened in the past. A great number of people have talked about optimistic concurrency control and recovery without much of the details really being worked out (see my discussions on this topic in http://bit.ly/CMOpCC). Even now some of the NewSQL people make some tall claims about how traditional recovery isn’t needed and that they can get away without logging while still supporting SQL, etc. One has to quiz them quite a bit to discover that they do in fact do some bookkeeping that they choose not to describe as logging and/or that they don’t support statement-level atomicity even though they support SQL and SQL requires it! 

For some people, it might be very tempting to think that the NoSQL applications are so much different from traditional database applications that simple things are sufficient (“good enough” being the often used phrase to describe such things) and that overnight mastery of the relevant material is possible. Even in the Web 2.0 space, if the application programmers are not to go crazy, more of the burden has to be taken up by the designers of the NoSQL systems. A case in point is how the Facebook messaging system designers decided eventual consistency semantics is too painful to deal with. To begin with, if the NoSQL systems have vague semantics of what they support and subsequently, as they evolve, if such things keep changing, users will be in big trouble! Also, with no standards in place for these systems, if users want to change systems for any number of reasons, applications might require significant rewriting to keep end user semantics consistent over time. 

posted @ 2015-09-07 11:56  kcbsbo  阅读(179)  评论(0编辑  收藏  举报