I’m Probably Less Deterministic than I Used to Be
In my youth, I thought everything had cause and effect like a big clock. In this light, computing made sense. Now, I see that both life and computing can be a crapshoot and I've found a new peace.
I’ve Always Been Determined
In high school class, I learned about classic Newtonian physics. Every day, I am grateful that I studied physics BEFORE getting my driver’s license as a teenager. As I looked around at the world, read history, and observed things, everything seemed to have a cause. Philosophically, I was a believer in determinism.
As I started programming in the early 1970s, bugs in my software caused momentary lapses in my belief in determinism as I swore at the computer. These dissipated as I found the underlying bug was due to my screw up. Everything in the complex system had a cause and it was possible to construct the causal reason for its correct or incorrect behavior.
Names and identities for things in the system were always derived from some overarching taxonomy. For many years, this was within the centralized single server. Over time, I started working on a small gaggle of cooperating servers. This led to directory services within enterprises and eventually to Internet-wide names based on DNS (Domain Name Service). Everything lived in its own place in the order of the ticking big clock. I could reason about the origin of each name based on a global pre-ordained hierarchy.
Similarly, as I started working on distributed systems, the bounded local networks were very tightly controlled. Dedicated hardware ensured messages seemed to be synchronously delivered. While they were not ALWAYS delivered on time, we pretended they were and built solutions based on time-out and restart. This worked well enough that we simply assumed they were always delivered on time or would never arrive.
I could map the world into predictable behavior, causes, consequences, and composed relationships. I was truly a database person, albeit one with an interest in distributed systems, too. I was at peace with the universe.
An Existential Midlife Crisis
In 1994, I started working at Microsoft to help the company move from desktop PCs to running enterprises. My experience in the 1980s at Tandem Computers had left me with a deep interest in providing the perfectly correct transactional answer for enterprise customers. This required super high availability even when things break. Microsoft was aggressive and bold and so was I.
By the time I turned 40 years old in 1996, I was middle aged. Not only had my middle aged, my view of the world very Newtonian and modeled on actions and reactions within a closed universe, including tight control over naming. This abstract “perfectly correct transactional answer” was built on a notion of identity based on some centralized authority within the customers’ company. Names and identities were hierarchical and local to the customer. Talking to outsiders and their world was an “application problem”, not part of our system solution.
In contrast, Microsoft lived in a world where software was created at many independent sources and came together in an ecosystem. At first, this was rampantly sharing floppy disks. By the 1990s, it meant squirting bits around the nascent Internet.
It was then that I was introduced to UUIDs (Universally Unique Identifiers). At Microsoft, these are called GUIDs (Globally Unique Identifiers). UUIDs are independently allocated without a hierarchical namespace and are assumed to be unique. My first exposure to the use of UUIDs was used to identify software interfaces coded at independent companies yet remaining unique when installed anywhere in the world.
How can these be unique AND independently allocated? I immediately and viscerally rejected such nonsense and went home with my head in a spin.
UUIDs are 128-bit identifiers with 122 bits allocated to hold a unique value. 2 to the 122nd power is, indeed, a big number! While I don’t consider myself to be a math expert, my colleagues at Microsoft convinced me that the probability of a UUID collision was an example of the Birthday Problem with 2^122 possible birthdays. The probability of collision in 122 random bits is driven by the number of unique identifiers in the pool. According to the math at Wiki:Universally_Unique_Identifiers, if you used 1.6 PBs to store 103 trillion UUIDs in a single place, you’d have a 1 in a billion chance of one duplicate.
That is, however, assuming you have random numbers. To understand randomness for this article, I read a white paper called The Intel Random Number Generator. I understood most of it even though I flunked out of college math because programming was more fun. This paper explains the notion of entropy or the randomness of a random bit. Quoting this white paper:
In the case of a random number generator that produces a k-bit binary result,
p[I] is the probability that an output will equal i, where 0 ≤ i < 2^kThus, for a perfect random number generator, p[I] = 2^(-k) and the entropy of the output is equal to k bits. This means that all possible outcomes are equally (un)likely, and on average the information present in the output cannot be represented in a sequence shorter than k bits.
The paper explores the use of TRNGs (True Random Number Generators) that use a non-deterministic source (such as heat sensitive oscillators) as well as PRNGs (Pseudo Random Number Generators). Intel exposes these in hardware as the RDRAND and RDSEED instructions. It is common practice to take two different sources for random numbers and pass them through a one-way cryptographic function to increase the entropy of the result. For example, feeding in 256-bits (in two different 128-bit inputs) can produce a 128-bit output with better entropy. While this discusses Intel’s solutions, I believe similar results are available from all other major chip vendors. I left my shallow investigation into randomness believing that UUIDs are, indeed, unique as far as I’m concerned.
As I said before, I consider myself to be aggressive and bold. I can live on the wild side and tolerate this probability. Emotionally, I started to see probabilities as a part of my life. My midlife existential crisis launched me towards a more Zen-like acceptance of things being less than perfect, some of the time.
Karma, dogma, and determinism are only a subset of life.
Around Here, Looks Like Newton Had It Right
By the early 2000s, I’d spent about 25 years hanging out with people who build databases. They ARE my peeps! Still, I couldn’t help feeling that they saw the world through a Newtonian prism. It struck me that there WAS data OUTSIDE of a database and it had different properties than database data. One of the major reasons was that time (as perceived by the OUTSIDE data) was disconnected from the transactional NOW inside the database. This led to my CIDR 2005 paper Data on the Outside Versus Data on the Inside.
In spite of The Singular Success of SQL, I had truly renounced SQL as the ONLY model of the universe. Like Newtonian physics, it’s extremely practical in small domains. SQL depends on transactions to suspend time for correctness. It only works in the “now”. “Now”, if it exists, is a local frame of reference.
Personally, I have no trouble living in both the frame of reference needed to support transactional correctness AND understanding that there’s a bigger world of distributed systems. Like particles of light, databases exist in their own world oblivious to interactions across their boundaries. Like waves, databases also live in a world where they have interactions with the outside sometimes leading to surprising results.
The Joy of Sects
The dictionary defines a SECT as a group of individuals with somewhat different religious beliefs from those of the larger group to which they belong. That sounds like computer scientists and engineers to me. Let’s discuss just three of these sectarian groups:
Networking folks: Most of networking people have personalities combining the best of hippies and bikers. When you talk to them, they both want to ride into town like a storm and talk existentially about the randomness of life. I’ve never met a networking engineer that thinks ANYTHING is deterministic. Everything happens with a probability and a cumulative distribution function over its latency. Yet, they are as happy as can be, ready to jump into the next fracas, and continuously show a zest for life.
Distributed systems folks: These people vacillate between philosophers and lawyers. No one else can talk so fluently about total order without discussing the scope of totality. Availability is always couched in terms of assumptions of server loss without veering into what happens when an operator logs into the system. Integral to their psyche is the belief in deterministic outcomes in the face of nondeterministic environments. I absolutely love a good debate with them (with or without alcohol)!
Database systems folks: My foundational community and my home people! These professionals bring out the best qualities of bankers, architects, and builders. Everything must be a combination of business critical, provably correct, and overengineered for reliability. Database folks assume a pre-existing deterministic world and build the coolest complex systems on top of that deterministic foundation. This works great while their assumptions hold true. Arm in arm with them, I’ll gleefully work designing an outhouse expected to last for 100 years, despite the poor hygienics.
Nothing is more fun than sliding between these groups of people and messing with their brains.
Zen and the Art of System Design
In the famous philosophy book Zen and the Art of Motorcycle Maintenance, friends ride motorcycles in a 17-day journey from Minnesota to California. During the ride, it emerges that there are a couple of different philosophies about how to maintain their motorcycles. This contrasts the “romantic” approach to life where John chooses to avoid maintaining his expensive new bike with the narrator’s “classical” approach with methodical care and diligence maintaining every part.
It emerges that John just wants to enjoy the gestalt and live in the moment. The narrator seeks to understand every detail and impose a rational analysis to all things.
Similarly, one perspective on computer systems focuses on the causality of every component’s behavior and attempt to control and manage the overarching system with the precision of a fine swiss watch. Another perspective recognizes that a commute home on the freeway is frequently, but not always, fast.
Climbing into a Fast-Moving Tube of Aluminum
My first trip on an airplane was as a senior in high school. While I had a high-level understanding of how it stayed in the air, it really seemed counter intuitive that I would trust this aluminum tube. I was completely assuaged in my fears because I had read about the extremely low chance of problems when flying in a commercial airline. Apparently, it’s much riskier to be on the freeway than in the air.
Today, we more and more interconnect things from varied and disparate sources. At the same time, we’re tossing our systems into massively shared cloud deployments. Each of these two trends undermines a completely deterministic model of work in a closed and predictable environment. Together, they become the nondeterministic Wild West. This has caused me to relax and go Zen with probabilities. No longer can I completely explain the causal dependencies of the systems.
Still, I work side-by-side with folks that squawk about collisions of UUIDs and look to find perfect confidence in unique identifiers. This happens while blithely ignoring the deep dependencies our systems take on cryptography for security and its foundational dependence on the probabilities of random key allocation. They continue to expect servers to fail cleanly and fast even though Fail-Fast Is Failing… Fast! I know they fly on airplanes, too.
Now, I can ride in an airplane with confidence, be curious a tiny bit about avionics, discuss probabilistic SLAs for responses in the cloud, and work to define the perfectly correct database recovery algorithm with complete comfort. Maybe it’s because I like hippies, bikers, philosophers, bankers, architects, AND builders. I especially like provoking debates when they’re all in the room at the same time.