When working with VMware ESX there are some tips that I can share that can help you manage your environment. This tips are not anything really new or exciting but rather a reinforcement of some best practices to live by in order to improve auditing for compliance and troubleshooting. Use of the following in conjunction with remote logging functionality will improve your compliance stance and improve your ability to troubleshoot over a period of time.
How you may ask? By using a tool that logs all local administrator actions to a remote logging host. There are two ways to do this today for ESX (SUDO and the HyTrust Appliance) and only one mechanism for ESXi and vCenter (the HyTrust Appliance).
With virtualization technology we, the system administrators, have a lot of tools available to make our day to day operation and administration of our environments easier to work with and speeds up the time it takes to do a lot of administration tasks. Take for example the ability we have to add resources to a virtual machine. You can add processors, memory and or increase disk space within a matter of minutes and very little downtime. On a physical host you would need to purchase the hardware first and wait for it to arrive and then schedule the downtime to add the resources to the machine. This speed and power can be both a blessing and a curse. Once application owners understand how easy it is to add resources to the virtual machines then comes the requests for additional resources any time the application owners think there is the slightest bit of need for any additional resources.
One thing I have learned in the time I have spent working in IT is that no software product, out of the box, will do everything that you want it to do. This especially goes for VMware’s vCenter Server. This is a great product but yet still has its shortcoming. vCenter will perform a lot of the tasks that we need to do and has the ability to report on a information we need to know about in our virtual environments but unfortunately not everything we need to know about can be easily found in bulk about multiple servers.
Security baselines and security health checks are an important part of any modern day infrastructure. These checks are done periodically throughout the year, usually ever quarter. In my opinion this is a good thing to check and make sure your security settings are following the guidelines that the company has set out to achieve. Here is where I do have a problem. When setting up the guidelines for the different technologies in your infrastructure it would make the most sense that the people establishing the guidelines need to fully understand the technology they are working with. After all, would you really want the midrange or mainframe group to write the policies and guidelines for the Microsoft Windows Servers in your environment?
I recently got called to examine some performance issues that were happening to a VMware VDI Cluster. I was told all the hosts in the cluster would run at 100% CPU utilization for an extended period of time and the client would like an explanation and recommendation. I pretty much had a good idea what the problem was before I ever started looking at hosts. I know this topic has been covered many times before but it does not seem like it has been covered enough.
One of my favorite movie lines is, “Life is like a box of chocolates, you never know what you are going to get” from the movie Forrest Gump. So for me and my observation of the day I declare that “Life in IT is like a day at the amusement park. It is a life where you hurry up and wait.