By far, the lowest hanging fruit of virtualization and cloud environment security is the segregation of your management control from your workloads. Separation of data and control planes has been recommended for everything from storage (EMC ViPR) up to the workloads running within virtual machines. The same holds true for cloud and virtual environment management tools, tasks, and functions. Up to now there have been very few choices in how such segregation could be implemented. They have been limited to using properly placed firewalls or to using some form of proxy, and the only proxy available was HyTrust. But this has changed. There are some other tools that will help with this segregation of data from control; do they give the level of auditing we require to solve the delegate user problem?

Delegate User Problem

What is the delegate user problem? This is a security, compliance, and auditing problem caused by using automation tools to manage the cloud and virtual environments. Let us look at Figure 1 to help explain this problem, which is caused by layering automation tools upon other tools upon other tools. In Figure 1 we have minimally four levels and sometimes up to five levels of automation and therefore delegation of control.

Delegate User Problem
Figure 1: Delegate User Problem
At any point a user can log into any one of the elements in Figure 1, but the further removed from the hypervisor, the actual user that does the work changes regardless of the logged in user. Let me give an example using the diagram above. With a VMware vSphere environment (and the same is true for all other hypervisor environments), a user logs into the Cloud Management tool (vCloud Director) as themselves; let us use the username Bob. Bob is now logged into vCloud Director. Bob removes a vApp for replacement, which then signals the Central Management Server (CMS or vCenter) via a delegate user named when vCloud Director was first connected to vCenter; in our case we will call this user vCloudService. vCloudService logs into vCenter to delete the vApp, at which point vCenter signals the hypervisor (ESXi) as the vpxuser to delete the vApp and all associated VMs.
With just these three levels, we end up with the following chain of users:

Bob -> vCloudService -> vpxuser

When we look at the hypervisor logs where the action actually happens, the user we see is vpxuser. If this is the only log we gather then we have no idea that Bob logged in to vCloud Director unless we also get the logs from vCenter, but if we only get the vCenter logs we only see that vCloudService did the action and not Bob, so we also need the vCloud Director logs. Then we must correlate the logs based on time to determine who did the action at the hypervisor. Remember, we have at least three levels of delegate users and perhaps five levels in Figure 1, not to mention that we know that each user can log in at any of these levels directly. So if 10 users login to delete the same virtual machine, you may not know who actually did the deed or even that 10 requests were made or even if they had the proper privileges to actually handle that request, thanks to the delegate users.

Possible Solutions

There are several solutions that may work to avoid the delegate user problem.
The first depends on the hypervisor vendors to update their logging to add a user id of some sort at each step of the way, so that when you look at the hypervisor logfile you can see the trail of delegate users all the way up, perhaps with a common set of tags to identify the service which called it within the calling stack. This would alleviate the need to correlate all the delegate user service logs based on time. The hypervisor log would be then be authoritative about who called it. Such a log entry could include a user id string in the form of:

SERVICE:UID:SERVICE:UID:SERVICE:UID … SERVICE:UID
Where SERVICE:UID is for each step of delegation and where SERVICE is a well-defined set of names representative of each service in the delegate user stack. I feel a two or three character service id would be sufficient.

The second is to put a proxy at each level of the calling chain such that the proxy could then capture the login information for each action to take place up and down the stack. This option is harder to do as there would need to be some out of band synchronization within the proxy  to handle all the different management tools, APIs, and calling mechanisms. This form of enhanced proxy does not exist today and is shown in Figure 2, where the proxy sits between everything and everything correlating and maintaining user access control at all levels regardless of delegate users in use. This would also imply the enhanced proxy would be used for the initial login to the top level as well and would be able to inspect the API to ensure user requests are made and audited down the stack. This approach is much harder to implement than the previous approach.

Enhanced Proxy
Figure 2: Enhanced Proxy

Proxies Today

Proxies today are shown in Figure 3. There are two of them of interest, HyTrust and Xceedium, and they behave quite differently. The main difference is that HyTrust inspects the API of the CMS and the hypervisor to enhance authentication and authorization and therefore role-based access controls, while Xceedium works best to proxy or reverse proxy access to specific applications such as the CMS Client, Cloud Management, a Powershell script, etc.

Standard Proxy
Figure 3: Standard Proxy
Today these two proxies, Xceedium and HyTrust, can work together as seen in Figure 3. One gates the application, the other inspects the API sent, but even working together, they do not provide a solution to the ultimate delegate user problem sufficient for large scale cloud forensics. However, both can be used to meet compliance requirements. Compliance is met as long as all access to the CMS and underlying layers is done through the proxy without caring about delegate users as the proxies are placed before critical systems. Use of both together gives greater insight into authentication and authorization within your virtual and cloud management trust zone. This gives greater regulatory compliance auditing capabilities.
But while these proxies meet compliance requirements, the proxies in question do not yet meet all security audit requirements for knowing who did what when, where, and how throughout the management stack, not just at a given management layer; for that we need to solve the delegate user problem. Use of one or both of these tools definitely gets us closer to the ultimate knowledge we seek, tie this to a good SIEM service and the correlation of logins and actions based on time will be a powerful combination, the best we can do today.