Baleen is a substance found in the mouths of the largest whales in the ocean. Similar to fingernails in its construction, baleen is used as a giant filter that allows these whales to scoop up vast amounts of seawater, while keeping the nutritious krill and other sea life as food. Krill are small crustaceans similar to shrimp. I think of them as “shrimpy” shrimp, yet they provide a large portion of the food consumed by the largest animals on earth.
Like these whales, data analytics is increasingly hoovering up anything it can find without regard to shape, form, or schema. By ingesting anything and everything, oblivious to its provenance or hygiene, we are finding patterns and insights that weren’t available before. This is greatly accommodated by keeping the data in JSON, XML, Avro, or other semi-structured forms.
This article explores the implications of this “late-bound” schema for both data analytics and for messaging between services and micro-services. It seems that a pretty good understanding among many different sources allows more flexibility and interconnectivity. Increasingly, flexibility dominates perfection.
ACM Queue has just posted my latest article in my regular column Escaping the Singularity: It’s Not Your Grandmothers Database Anymore. It is available free of charge in two ways:
I want to thank the wonderful folks at ACM Queue for all of their help!
- Pat
Baleen Analytics
Excellent succinct descriptions. I'm curious what your thoughts are on SOX compliance or any other "distinct record" compliances in these Baleen architectures? More and more, my colleagues and I are required to meet these data quality / compliance standards in ELT products.
There is an inherent difficulty in meeting these standards as you alluded to when writing,
"Like late binding on procedure calls across services, ELT is much more adaptive and accepting of differences and evolution. Also, like late binding, it may not perfectly match the semantics but rather give “good enough” answers."
Hi Pat. Some minor notes:
REST is for "Representational State Transfer", not "Representational State Transformation".
https://en.wikipedia.org/wiki/Representational_state_transfer
The term "shredding" seems to be overloaded. The term is also used to indicate "secure delete":
https://en.wikipedia.org/wiki/Shredding