As promised, let’s go into a little detail regarding the Eightfold Way, my personal roadmap for Big Data. CEOs are kindly invited to read on: chances are this is a very faithful portrait of their organisation, but nobody ever tells them.

When it comes to Big Data, there are only two admissible positions: skeptics and suckers. A skeptic is somebody who knows how bad we humans are at judging probability, at telling correlation from causation, and how easily we can be fooled by randomness and by our own expectations (or by someone exploiting them). A sucker is someone who will fall for a scientific-sounding statistical lie. Note how anyone can be a sucker on occasion.

Just a quick context-setting post, then we’ll see each step in detail. Here are the steps, briefly explained.

  1. Right Identification —what constitutes data worth collecting? And how do these worthy data look? In short, what is their format?
  2. Right Collection —where do these data come from? Are you taking into account all possible sources and occasions?
  3. Right Validation —make sure a name, phone number, zip code are what they should be, see #1
  4. Right Governance —who has reading and writing (!) rights on data, hence the ability to work and mess with them.
  5. Right Interrogation —42 teaches us the problem is the question, not the answer. I’m being deadly serious here.
  6. Right Interpretation —what data really mean (as opposed to what we want to hear or get away with; also, there’s what I call the Taleb problem: typically at least 90% of data is just noise, 9% is stuff we already know, 1% is a genuine insight. And we can’t tell which is which. So you should painfully aware that the largest part of “findings” will be just noise. This makes interpretation THE problem with Big Data.
  7. Right Communication —make sure the right people get the right data at the right time
  8. Right Purge —not all data is meaningful, not all meaningful data is useful, not all data need (or can) live forever, see also number 6, Right Interpretation.

The above should be just plain common sense. But if we look at how things really work in an organization, where Big Data is collected by IT and “handled” by Marketing, this is what happens to the roadmap:

  • Identification and collection
    • IT:  are requirements to be provided by the internal client
    • Mktg: will be be decided at a later moment
  • Validation
    • IT: is a frontend function and may be introduced (if needed) when the
      relative RFM hits “panic” status
    • Mktg: a phone number is a phone number, can those dumbos in IT sort it out?
  • Governance
    • IT: “whaddaya mean “writing” rights? No lusers mess around my db” [actual DBA email]
    • Mktg: Marketing data rightfully belong to Marketing, and we’ll produce detailed reports for the CEO.
  • Interrogation:
    • IT: it’s an internal client problem, I make sure there’s a DB up to be queried, man
    • Mktg: data speaks by itself. [Translation: we’ll see what we want to see]
  • Interpretation
    • IT: their data, their problem. If they can’t handle technology, they shouldn’t be using it.
    • Mktg: we will interpret [translation: frame] the data
    • or, the interpretation will otherwise be decided by managerial mud wrestling, and appropriate cherry-picking will post-dict any actual decision taken (known as the Cover-Your-Arse approach to data-science)
  • Communication
    • IT: come again? Reports are available in the shared folder
    • Mktg: the CEO and Board have priviliged access to our reports
  • Purge
    • IT: maybe we’ll file an RFP as we run out of storage space
    • Mktg: data is a precious asset we cannot afford to lose. It’s up to IT to store it.

Maybe nothing of the above happens in your organisation, in which case you won’t need to read about step 1, on Wednesday morning.

Share This

Share This

Share this post with your friends!