Comparison: IT as a Service Event Management

Part of any IT management platform is the handling of events, whether that is aggregating external events, creating its own events, or passing those events on to others. There seems to be a common set of criteria for those events. So, let us look into these common criteria for handling events and compare some of the vendors in the space.

What is so special about events? Events tell us what is happening within our environment. An event could be the boot-up of more resources. It could be the movement of data from one cloud to another. It could be a security issue, or even a login. An event is an action that has taken place. How a tool handles events—by allowing us to register them with the tool, augment its streams of data, or pick them up automatically—helps with analysis. Here is a comparison of some of the vendors:
[table id=6 filter=’Events’ filter_columns=’A-B,SIOS,Turbonomic,Zenoss,Virtual Instruments’ /] Our definitions are:
Patterns of Events implies the tools can detect recurring events over time. This is often the case with events that occur once every so often, such as an accounting server increasing its load, migrating between systems, or being upgraded.
Flap Detection is necessary when an event happens, then stops, and then starts again, all in rapid succession. The event is flapping, and what we really need to know is the rate of the flap, when it started, and when it ended. An overheat situation in which the temperature is right on the cusp of the alarm is a good use of flap detection.
Event Deduplication implies the collapsing of alerts for the same event into just one event with a duration, instead of the hundreds, if not thousands, of instances of the same event that come in.
User Defined Filters are used to clarify events and to filter them according to a user’s requirements, not the vendor’s. They can also be used to search for specific events and see activity surrounding them.
Determination of an event is the ingest of not just single elements, but multiple elements that form a single event. This could be data from a log file or data that arrives at the start of event through to the end of the event. It is usually multiple lines or constructs that really make up one event.
Correlation provides a way to see the relationship between time and the event as well as between events themselves and the data the tool often shows, such as capacity or performance data.
Abstract IDs are required in the hybrid cloud as workloads move around the cloud. If they are destroyed and re-created, they have one ID, but if moved, they should have the same ID. Usually, movement entails creating a new ID. However, if we use a new ID, we often lose past data that is invaluable to root cause analysis. An abstract ID is an ID that moves with the object regardless of its location within the hybrid cloud.
Automatic Event Clearing is required, as events can pile up and misdirect viewers. If the event comes from an alarm, then when the alarming element is fixed, the event should automatically clear and be archived, thereby allowing ITaaS tools to respond to automated remediation.
Event Aging is the aging of past events off the view. We want to archive older events instead of removing them outright. We need to know the history of events. If a device is no longer part of the hybrid cloud, archiving the data is preferable to forgetting about the data. The goal is to keep systems running while allowing older data to still be usable as necessary. At the same time, the views should not be cluttered with history that is months old (unless it is requested).
Feed (F)rom/(T)o External Source: Can the tool feed data to an external source or take data in from an external source? The goal is to add third-party events or business-specific events into the system and have them be correlated, graphed, etc. F implies From external source. T implies To an external source. F/T implies both.
Rollout Logs to a group, like lines into an event, are unlike determination. They are more about understanding what makes up an event from a log file or other source with interleaved data. If the data is interleaved, the data next to an item is part of a different event. Rolling out logs in effect de-interleaves the data, leaving just a set of events.
Data Augmentation is a way to augment the raw data with insights, conclusions, markers, and other events from around the system. Data augmentation allows more interesting conclusions and correlations to be made.

Closing Thoughts

As we enter the hybrid cloud, it will be important for IT as a Service tools to handle multiple sources of events: to feed into and from other systems. Our ITaaS tool becomes the hub in which we live our operational lives. As such, we need the ability to search our data and correlate business and technology events with security, weather, etc. While tools can draw many conclusions, we need to augment those with our own conclusions and even put those conclusions directly into the tools.
Every tool in our list is changing rapidly to allow managing and monitoring a hybrid cloud to be a reality. However, these tools do not know our business; they need to be told about our business so that the proper emphasis and conclusions are drawn. Events augment all aspects of analysis within tools. If we do not know an event took place (perhaps a storm), how do we know why our performance was impacted or how to plan for the future?
How we handle business, security, performance, capacity, and applications events within our hybrid cloud is a part of our journey to the cloud. Do we even handle them properly within our own data centers?