Logging within the Secure Hybrid Cloud

When we think of logging within the secure hybrid cloud, we tend to think of analytics, but there is more to logging than just reviewing the data there are also discussions on what to collect and from where as well as why collect the data? For security purposes we may start with collecting access data and work out from there, but most logs from complex systems such as a secure hybrid cloud include many different forms of log data and in some cases, not enough. Perhaps what log data you can retrieve may be a deciding point for hybrid cloud services as logs are used not only for audit purposes, but also for trouble shooting and forensics. What log data do you collect within your secure hybrid cloud?
When we look at our high level architecture we notice that logging is in almost every component of a secure hybrid cloud. But it is greyed out. Logging is an aspect of the secure hybrid cloud components that exists, but we may not get all the data we desire out of a particular service, because it may not exist, require time consuming requests to cloud providers and even in our own data center may be out of our immediate control. Therefore, it is greyed out. In some SaaS clouds, log data may not even contain crucial details like who accessed the data last, when and how they accessed it, and from where it was accessed?

Secure Hybrid Cloud
Figure 1: Secure Hybrid Cloud
There are three parts to our secure hybrid cloud that are of interest:

  • Cloud – The cloud includes all those places outside our immediate control where data could end up or be taken from. In some cases, it is even used to further our transitional goals. This is where APIs tend to live.
  • Data Center – The data center is generally within our control and could be a private cloud or just a collection of virtual and physical machines. The data center may transfer data between multiple data centers or  back and forth to the cloud. Check out these posts:

When we talk about logging within the secure hybrid cloud, just like our discussions on analytics, we have to look at the entire secure hybrid cloud and not just one part of it. We really want all the data from each layer to get the entire picture.

Logging: Transition Layer

Logging in the transition layer of the hybrid cloud which includes end-user computing, identity, APIs and Gateways, as well as data protection is a crucial component of any logging scheme for a hybrid cloud. This is the layer that tells us the WHO accessed our data, WHERE our data is accessed, WHAT data was accessed, HOW we access our data, WHEN we access our data, and most likely WHY. To this end we need logging that gives us this information: who, what, where, when, how, and why. We also need to send this logging data to a central log repository that allows us to query it at any time. Generally, logs of this nature are sent to a SIEM in our data center but there is a new class of logging cloud services such as from Splunk that are also good repositories.
Since we enter and generally manipulate our data through the transition components of the secure hybrid cloud, we can determine quite a bit we normally would not be able to get and into this layer we may need to add more logging to retrieve that data. It is very hard to determine what user ultimately initiated a data request after jumping through multiple layers of security. For example:

We connect to our secure hybrid cloud over a secure tunnel to a virtual desktop. We have logged into the tunnel with one ID and from there we access a virtual desktop as another user in which we bring up a tool like VMware Horizon Application Manager to access a service such as Box.net within the cloud using another user. We retrieve an encrypted file and using a virtual hardware security module we retrieve the key using another user, which then decrypts the data which is now accessible to be read from within the virtual desktop within the view session.

Alternatively, we could directly access Box.net as a different user than the above, retrieve the file from Box.net using another username,  decrypt on our mobile device using Vormetric’s encryption technology, and display the data upon our mobile device.

Mapping User Identifiers

In the first example we have used at least four different users and several different services, one of which is in the cloud. In the second example we have stayed exclusively outside our data center. Both of which are quite possible and valuable to allow. But we need to know which user accessed the data, so how can we do that?
In the first example, we would need to map users in some fashion which requires us to have all the user identifiers used by the user: tunnel, desktop, Box.net, and virtual hardware security module. In the second, we need to map mobile device to user and the user within Box.net. Neither of which are really all that easy to do. We at most have logs from each of the services that could give us user identifiers in at least a tokenized form and some idea of the data accessed. But the key is to be able to get into a log the data so we can later analyze.
But once the data is gotten how do you correlate the data to each other? User Access Management and user detection is a key feature of next generation firewalls, but we need more than that, we need to determine the user identity across the entire secure hybrid cloud and a centralized identity manager could do that for us, regardless of how we access the data. In either case we need to pick tools that log the critical bits of data for later or even real-time analysis.

Closing Thoughts

The need to gather adequate logs for forensic, audit, and other needs is crucial within a secure hybrid cloud. What data we can get from those logs is even more important. If we cannot find out who,  did what, when, where, and how then we are missing crucial data that could help us to uncover a breach. Ideally we want to deny access to crucial data based on not just who, but from where, and maybe even from what device.  Such as, if the data is being accessed from outside a nation, we may want to redirect the user to a different location. If the data is being accessed from a well known mobile device (one we control), we can grant certain accesses but if it is a device we do not know, then we grant a more public access to the data which leaves out such things as PII.
We can get most if not all this information from well formed logs which can then be analyzed in real-time to provide better rules for data handling, but we need the log data first! So pick tools to use within your secure hybrid cloud that give us the necessary information and log to a central repository somewhere.
What data is crucial within your logs? Do you log from your cloud services into a central repository?