On the March 9, 2016, Virtualization and Cloud Security Podcast, we spoke with Sridhar Karnam, director of product marketing for Arctic Wolf, a Security Operations Center (SOC) as a Service provider. In our ongoing series on scale within IT security, a SOC is the next logical stop. The scale of data in today’s environments far exceeds people’s ability to view the data, make sense of it, and say there is a problem in a timely fashion. For this, we need automation, but we also need human intelligence!
In a SOC, the intelligence is delivered by humans and the data by automation, query systems, search systems, and many forms of aggregations systems, from standard SIEM to machine learning. In all cases, a SOC provides visibility into the world of security and compliance: visibility into change events, events that violate policy (such as logging in as an administrator), and many other actions. A SOC may also show fishy events, those events that could be the result or start of an attack. Yet, with all the streams of data, it still takes a human to identify those fishy events. That is where tools and other sources of data come into play.
The current batch of security tools augment your normal streams of security data with other data sources such as threat feeds, performance and operational feeds, even business data. This augmentation is designed to add flavor to the data so that bad actors stand out more. They are not lost in the background noise of a typical data center. Even a small SOC works with millions of lines of log files and text files. Each of these millions of lines could translate not only into 100 or so events but also into terabytes of data a day. Now, how much data would this be if you had to look at 5 billion events a day? This is why we use tools.
It is impossible for people to handle 5 billion events a day. That is why we need ways to obtain corroborating evidence that an event is really something important and not a false positive. We also may not get certain events because the data is just not gathered. How we deal with that is another question. Missing data could make things just as bad as too much data! More in this case is not always better: the best data needs to be used, not all the data. Collecting all data is a mistake many folks make. We just do not have the systems in place to handle everything and still know what we are missing.
When we talk about data protection, we often talk about recovery time objective (RTO) and recovery point objective (RPO)—the time it takes to recover and the oldest acceptable change to our data. Often, RTO is in minutes, as is RPO. Sometimes, RPO is seconds or even real-time, depending on the system and how data is protected. If we had equivalents in the security world, they would be mean time to detect (MTTD) and mean time to response (MTTR). Actually, I prefer “remediate” over “response,” but sometimes the response is all you need, and there is no remediation. In 2016 (for the 2015 year), the Mandiant M-Trends report stated that the median discovery rate had dropped to 146 days, from 205 days in 2014. The discovery time could have been as low as 56 days, however. Still, we are talking three months before knowing there was an incident. MTTD is 60 days, then. The MTTR is greater than 60 days by definition.
There are many tools that can lower MTTD and MTTR to thirty to sixty seconds, yet only if they are in use and for very specific forms of attacks. It is the nonspecific forms of attack that worry me. This is why we need a ready pair of eyes on our security data. It is not important whether those eyes are human or robot. We need someone or something looking into our data in depth, correlating likely events and finding corroborating details. Removing false positives is an ongoing struggle.
How can we proceed? First, get visibility into your environment. Second, gain knowledge of external sources of data such as threat feeds and how the applications work. Get involved and know your build, deployment, and other business items. A security person must be a businessperson as well these days. There is a distinct marriage between the two. More importantly, these external sources of data will help narrow down an event to being a false positive or to being a real event. Security and network operations must overlap and share data: not replace each other, but help each other. They are not competitors. They look at different things, after all.
Have yourself a listen to the podcast and let me know your thoughts. Also, check out this simple SOC for VMware vSphere using VMware vRealize Log Insight as a start by gaining some visibility within VMware vSphere environments. With enough interest, the SOC could be expanded to other vendors.