Baleen Analytics

Large-scale filtering of data provides serendipitous surprises

Baleen is a substance found in the mouths of the largest whales in the ocean. Similar to fingernails in its construction, baleen is used as a giant filter that allows these whales to scoop up vast amounts of seawater, while keeping the nutritious krill and other sea life as food. Krill are small crustaceans similar to shrimp. I think of them as “shrimpy” shrimp, yet they provide a large portion of the food consumed by the largest animals on earth.

Like these whales, data analytics is increasingly hoovering up anything it can find without regard to shape, form, or schema. By ingesting anything and everything, oblivious to its provenance or hygiene, we are finding patterns and insights that weren’t available before. This is greatly accommodated by keeping the data in JSON, XML, Avro, or other semi-structured forms.

This article explores the implications of this “late-bound” schema for both data analytics and for messaging between services and micro-services. It seems that a pretty good understanding among many different sources allows more flexibility and interconnectivity. Increasingly, flexibility dominates perfection.


ACM Queue has just posted my latest article in my regular column Escaping the Singularity: It’s Not Your Grandmothers Database Anymore. It is available free of charge in two ways:

I want to thank the wonderful folks at ACM Queue for all of their help!

- Pat