In Security Operations, Less Is More. Incident Response Teams Should Focus on Data Value Instead of Data Volume.
While not always the case, there are some instances in life in which less is more. I believe that in the world of security operations and incident response, data is one of these instances. It sounds like a radical statement, but it’s really not. Allow me to explain.
There is much hype around big data these days. There seems to be no shortage of conferences, seminars, and white papers where the topic of big data is either one of the main focuses of the conference, or is discussed in passing. We seem to live with an implicit assumption that big data is a foregone conclusion, and that we should prepare to deal with an onslaught of data the likes of which we’ve never seen before.
That may very well be what happens in the future, but I’m not convinced that it is inevitable. Or, more precisely, I’m not convinced that all of the data involved in the “big data” hysteria adds value to security operations and incident response. One concept I’ve spoken about in the past is that of data value vs. data volume. They are two different things, and it is important to remember that.
In security operations and incident response, we collect a wide variety of data sources. Each source obviously produces a different volume of data. But, more importantly I would argue, each data source also provides a different value to security operations and incident response. The two measures are not necessarily related. A data source can have high value to security operations but be relatively low in volume. Conversely, a data source can have limited value to security operations but be relatively high in volume.
As network sizes and the number of endpoints within an organization grow, we as a community tend to focus overwhelmingly on the additional data volume this produces, rather than the value and the relevance of the data to security operations and incident response. Much of today’s big data discussion is dominated by the concepts of volume, velocity, and variety, but unfortunately, I haven’t seen much discussion around the concept of value.
Let’s examine the historical reasons behind this thinking, some of the ramifications it has on modern security operations, and some steps we can take to approach the challenge from a different perspective.
When security organizations tackle the big data challenge, they primarily focus on two things:
• Gaining access to every data source that might be relevant to security operations
• Warehousing the data from all of those data sources
In my experience, organizations take this approach for two primary reasons:
• Historically, access to log data was scarce, creating a “let’s take everything we can get our hands on” culture
• There is not a great understanding of the value of each different data source to security operations, creating a “let’s collect everything so that we don’t miss anything” philosophy
Unfortunately, this creates new challenges that are particularly acute in the era of big data:
• The variety of data sources creates confusion, uncertainty, and inefficiency -- the first question during incident response is often “Where do I go to get the data I need?” rather than “What question do I need to ask of the data?”
• The volume and velocity of the data deluge the collection/warehouse system, resulting in an inability to retrieve the data in a timely manner when required
• Storage is consumed more quickly, thus reducing the retention period (this can have a detrimental effect on security operations when performing incident response, particularly around intrusions that have been present on the network for quite some time)
While it is true that a conservative, “collect everything” approach is good in the absence of anything better, I would suggest an alternative process when facing the challenges of collection and analysis head-on:
• Determine logging/visibility needs scientifically based on business needs, policy requirements, incident response process, and other guidelines
• Review the network architecture to identify the most efficient collection points
• Instrument the network appropriately where necessary/lacking visibility
• Identify the smallest subset of data sources that provide the required visibility and value with the least amount of volume
This approach may seem radical at first glance, but those of us that have worked with log data in an incident response setting will see that this is really the only way that security operations programs can keep pace with the big data deluge. After all, if you can’t get a timely answer from the very data you insisted on collecting (due to volume-based performance degradation), was there really any value in collecting it? What goes in must come out easily, efficiently, and rapidly. Otherwise, there is simply no point in collecting it.
Those who disagree with me will argue: "I can't articulate what it is, but I know that when it comes time to perform incident response, I will need something from those other data sources." To those people, I would ask this question: If you're collecting so much data, irrespective of its value to security operations, that your retention period is cut to less than 30 days and your queries take hours or days to run, are you really able to use that data you've collected for incident response? I would think not.
Visibility into the organization is essential today. But proper visibility doesn’t have to mean a deluge of uncoordinated data sources. It makes sense to think about the value of each data source to security operations and incident response, and to select only the data sources that produce the required visibility with minimal overlap, redundancy, and extraneous volume. The buzz and hype these days should be about “big value”, not “big data”.