When you hear the term “host” when talking about virtual environment, what is the first thing you think of? For me, the answer is simple, a host is an appliance. For years now I have been standing on my soap box and preaching the power and fundamentals of automation in building and configuring your virtual environment. I came across a thread on the VMware VMTN Community Forum where a concerned individual was in a position that he was going to have to rebuild his host from scratch. What he did to get himself into this position was to run a hardening script on the host and then the host became broken and unusable. This person was concerned that he did not have a backup of the host and was looking for a way to rollback.This is not so much the most interesting part of this post as this is a pretty common question that is asked many times, in the forums, over the years. Now before I get to what I found to be interesting, let me address my thoughts on a couple of his concerns. First off, let’s start with my thoughts on backups for the VMware ESX Host itself. I personally do not think backups for the VMware ESX hosts are necessary and now that VMware is making its final push with ESXi the backup question will become mute but until that happens I find myself explaining my reasoning about backups over and over again to different clients. To actually see some of the Project Manager’s faces when I recommend that the host not be backed up is priceless in itself. Think about it we already mentioned that there should be no important files on the host to begin with and if you ever need to restore or rebuild a host you would have to build the host from scratch before you can install the backup agent and start restoring files, right? If the password of the vpxuser account has changed while the host is down, you are still going to have issues once the restore is done. This is just one example of how something can still go wrong after the install.
A little trick, one file that is well worth keeping is a file that is called anaconda-ks.cfg. When you install a VMware ESX host this file keep track of all the answers you gave when you installed the host. You will find things like hostname, IP, and partitioning information just to mention a few. Below is an example of the file:
# Kickstart file automatically generated by anaconda.
install
lang en_US.UTF-8
langsupport –default en_US.UTF-8
keyboard us
mouse generic3ps/2 –device psaux
skipx
network –device eth0 –bootproto static –ip x.x.x.x –netmask x.x.x.x –gateway x.x.x.x –nameserver x.x.x.x,x.x.x.x –hostname servername01 –addvmportgroup=0 –vlanid=0
rootpw –iscrypted $1abcdefghijklmnopqrstuv/
firewall –enabled
authconfig –enableshadow –enablemd5
timezone America/Somewhere
bootloader –location=mbr
# The following is the partition information you requested
# Note that any partitions you deleted are not expressed
# here so unless you clear all partitions first, this is
# not guaranteed to work
#clearpart –exceptvmfs
#part /boot –fstype ext3 –noformat –onpart cciss/c0d0p1
#part /var/log –fstype ext3 –noformat –onpart cciss/c0d0p2
#part / –fstype ext3 –onpart cciss/c0d0p3
#part /home –fstype ext3 –noformat –onpart cciss/c0d0p5
#part /tmp –fstype ext3 –noformat –onpart cciss/c0d0p6
#part /opt –fstype ext3 –noformat –onpart cciss/c0d0p7
#part swap –size=1600 –ondisk=cciss/c0d0%packages
grub
%post
/usr/sbin/useradd username1
/usr/sbin/usermod -p ‘$1abcdefghijklmnopqrstuv/’ username1
What you have, is an answer file to help you rebuild the host in a quick and efficient manner. Once you are done with the install then you will need to finish the configuration by putting back the virtual network, time server information and anything else that is custom to that host. With the release of vSphere we have Host Profiles to help finish off the host configuration so anything really after the initial build would be taken care of for us. But in case you are not at vSphere yet there is really nothing you cannot use a script to configure. If you have a complex setup or a lot of hosts then I really recommend building your build process out that could also be used in the disaster recovery exercise also.
Now to the part of the VMTN post that I really found interesting. One of the responses from a community member was to check out VMware’s backup and restore white paper. This page comes with a warning right at the top of the page:
Warning: This procedure is an unsupported workaround. This may lead to corruption if done incorrectly.
If you are going to play around with this in your environments make sure it is your test box first. The backup/restore solution from the whitepaper is:
Backing up Procedure
Create backups of these items:
- The /etc/passwd file
- The /etc/shadow file
- The contents of /home directory
- The contents of /root directory
- The contents of the /etc/vmware directory, excluding:
- Any soft links
- /etc/vmware/patchdb
- /etc/vmware/ssl
Restoring Procedure
To restore configuration:
- Reinstall ESX to the same patch level as the failed one.
- Get the information on the currently configured core dump partition and copy and paste the output into a text editor:esxcfg-dumppart –l
- Get the information on the currently configured cos core file and copy and paste the output into a text editor:cat /etc/vmware/esx.conf |grep CosCorefile
- Restore /etc/vmware from a previous backup.
- Update the new configuration file with core dump partition information:esxcfg-dumppart –s vmhbaX:X:X:XWhere vmhbaX:X:X:X is the dump partition name noted from step 2.
- Edit /etc/vmware/esx.conf and update the CosCorefile information to match the path copied in step 3.
- Get the new UUID for the root partition:cat /boot/grub/menu.lst |grep UUIDThis generates at least 3 lines with root=UUID=xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx where x is a hexadecimal number.
- Update the configuration with new root device UUID by executing following command esxcfg-boot –d “UUID=xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx”
- Reboot the ESX host. The ESX host reboots with the old profile.
This is looks interesting but I still wonder what you really gain from this? Will this really save you that much time and what about the warning before you begin? It is good to have options available to help in those moments of crisis but I still feel the host is an appliance that should be able to be rebuilt at will. That is how I present and control my environments, but what about you?