Posts in February 2016

If a tree falls in the digital forest, is there anybody to see it?

Thursday, February 04, 2016

One very common issue with the software systems is that there is proliferation of data that’s both gathered and shown.

In most cases, the data is there just because it is so easy to pick up in mass quantities from the servers, workstations or – really – any other device in this increasingly IoT –infused world of microprocessors and software code running in every other electronical device. This is really a cardinal sin of systems designed by engineers and, mostly, for the engineers. The thought process behind it being, “hey, because this piece of irrelevant minutiae could become handy in some scenario, we should get it, store it and display it”.And, before you know it, there are suddenly 1000 other data points like that which “could come handy” at some point, for somebody. But, the truth is that for most people, excessive amount of data is just a distraction in the path to the true understanding of the bigger trends or relevant information that actually matters.

By my own experience, nowhere this is as apparent as in systems designed for IT operations, usually run and used by the staff responsible for keeping our digital domain in good shape. Be it monitoring the servers that fuel the organization’s Internet presence and commerce, or managing the workstations and other devices that employees use directly for performing their daily work.

These systems, called inventory tool, asset management, systems management or maybe technical monitoring depending what the primary role of the system is, are notorious for gathering a staggering amount of irrelevant data that the users of said systems are then left on their own devices to make sense of.
You get detailed CPU graphs, make and model and voltage of the memory chip running in your device or maybe single-second -precision data dump of every independent process running in those systems. Yes, of course, sometimes even the small things may matter but for vast majority of cases exposing unnecessary level of precision is actually counter-productive and harmful rather than helpful.

And yes, these things can be pretty aesthetically pleasing to look at when projected to the big screens in various forms of graphs but do we really understand what’s going on in the big picture behind those individual data points? Trends behind just the snapshot of time? If my process crashes or peaks in CPU usage, is that really meaningful information on its own or is just an isolated incident, relatively harmless and irrelevant, never to happen again for that process and device?
I pity the fool who has to wade through large body of such data and try to understand what’s going on. This goes equally true for trying to make sense of data such as events from monitoring system firing off alerts separately for every small thing found from logfiles, or endless lists of individual applications run on a typical PC multiplied by the number of those PCs used in typical organization.

Likewise we, especially as experts in computer systems, tend to focus too much on isolated details and not understanding how things relate to each other or what is the context framing those details. We concentrate on just putting fires out one-by-one rather than seeing if there’s an arsonist that should be found and stopped instead. Arsonist here of course being a systematic thing behind the fires, like faulty software package being delivered to all our computers, or misconfiguration in configuration policy or some such thing; instead of an actual person doing mischievous acts out of malice.

Like the title suggests, if there’s some issue present in the data, do we even notice it if we have to go through large body of digital noise to get to the signal?
Why do we still not put the software to work and surface issues, trends and bigger picture for us and weed out the irrelevant details? That’s what the software is good at, and we humans are not as we have to work in serial fashion and with the speed of our “wetware”.

Kalle has been working within IT in various roles and with wide-range of technologies for over 15 years, with companies such as Helsinki Stock Exchange, CDG Europe and Gridmetric.