Windows boot IO and storage performance impact on VDI

Greg Schulz August 10, 2011 12 Comments

By Greg Schulz, Server and StorageIO @storageio

With Virtual Desktop Infrastructures (VDI) initiatives adoption being a popular theme associated with cloud and dynamic infrastructure environments a related discussion point is the impact on networks, servers and storage during boot or startup activity to avoid bottlenecks. VDI solution vendors include Citrix, Microsoft and VMware along with various server, storage, networking and management tools vendors.

A common storage and network related topic involving VDI are boot storms when many workstations or desktops all startup at the same time. However any discussion around VDI and its impact on networks, servers and storage should also be expanded from read centric boots to write intensive shutdown or maintenance activity as well.

VDI boot and shutdown storm clouds
Boot and shutdown storms are the result of many virtual desktops either starting up (boot) or shutting down causing many IO operations (IOPS) to be performed in a short amount of time. The reason that they are referred to as boot (more reads) or shutdown (many writes) storms is that a single or couple of workstations doing IOPS to their local hard disk drive (HDD), hybrid HDD (HHDD) or solid state device (SSD) have no significant impact on networks or servers and their storage. Other than making requests of servers for network resources, most of the IOPS associated with the boot or shutdown are localized to the workstation or laptop HDD, HHDD or SSD.

When the functionality supported by the local HDD, HHDD or SSD is moved in a VDI, those IOPS that would otherwise have been resolved locally now must come from a server and its storage that are accessed over a network. For example a lightly loaded or configured windows workstation, laptop or desktop could have as low as 30 IOPS per second during a boot which does not seem like a lot for a typical hard disk drive. After all, an average 7,200 (7.2K) Revolution per minute (RPM) HDD should be good for about 125 to 150 small IOPs per second depending of course on the type of adapter, controller and other configuration options. A fast 15K SAS or Fibre Channel HDD should be good for around 200 to 250 IOPS again depending upon the configuration, what it is attached to, workload and other factors.

A few workstations, desktops or laptops booting or shutting down at the same time will put some extra traffic on the network, servers and shared storage, however the exact impact will vary based on a couple of factors. The challenge with VDI environment is if you take 10 or 20, perhaps 100 or 200 or more desktops and move their disk data to a server, which also means that the IOs will be redirected from a local HDD over a network to storage attached to a server via internal DAS, shared SAS DAS, SAN or NAS. What this means is more traffic on the network plus consolidating disk IOs on the server and its storage capabilities.

Workstation or desktop impact factors on physical and VDI environments include:

The number and type of workstations, desktops or laptops that exists
What their operating system is and its configuration
How many and what types of applications are installed
Existence of any protocol, bandwidth or network optimization
Type, speed and configuration of networks that are in use

hIOmon, how many IOPs does your workstation or desktop do during a boot?
There are different rules of thumb (RUTs) and metrics floating around as to how many IOPS a typical windows desktop requires when booting. The reason the focus on windows is that the overall majority or most commonly virtualized desktop (as well as server) are Microsoft based products onto either Citrix, Microsoft Hyper-V or VMware vSphere among other hypervisors. Having heard different RUTs being tossed around ranging from 30 IOPS which seemed rather low, to a more conservative 50 to 100 IOPS (your activity will vary), I decided it was time for some actual measurements and metrics.

Realizing that standard windows performance tools along with clocks for timing were not going to be easy or give me what I wanted to see, I decided to download a tool I know of to do just this activity. That tool is called hIOmon from Tom West and hyperIO which once installed provides some very detailed and interesting measurements of Windows environments down to granular individual file access activity, process and other information. If you are a current or former performance and capacity planning analysis and have been told about how great mainframes are or were because of their SMF and RMF information, you can get related information out of Windows with tools such as hIOmon or even Unix and other systems for that matter (go check out www.cmg.org if you are not aware of them).

By the numbers: IO, IO its of to hIOmon and work I go
I installed a copy of hIOmon on four different desktops including a mix of laptops and tower based workstations, all running Windows 7 Ultimate 32 bit. The two laptops each has a HHDD which include a 4GB SLC (single level cell) flash (e.g. SSD) integrated inside the 7,200 RPM 500GB HDD that support many different common productivity applications and tools. One of the laptops is a stripped down Windows 7 Ultimate using an older and slower traditional HDD and no applications other than iTunes installed on it. The mini tower workstation (results are not shown) also had an older HDD with similar applications and tools configuration as the two laptops. In the course of testing the mini tower which was recently upgrading from Windows XP SP3 to Windows 7, I uncovered some things that will result in upgrades being applied in the next few days including more DRAM and replacing the HDD with an HHDD.

With hIOmon installed and after doing a couple a shutdown, then restart to make sure everything was normal on the systems and that no windows or other software updates were being, or scheduled for installation, the testing began. The test was to simply shut down the workstations to a cold state, then with stopwatch start timing when the boot began noting the time at which I was able to login and the time at which Windows appeared to be stable. As an indicator, I used the small circle Icon (or whatever you may have changed it to) on Windows 7 to show that most if not all the startup activity has stabilized.

Using hIOmon reporting tool (see screen shots below), I then displayed the performance data for the time interval up to the noted windows stabilization time. These metrics for the boot device included read and write IOPs, bytes read and written along with read and write latency (ms) from which I was able to compute IO per second, throughput or bandwidth per second along with other statistics. I also recorded for each test run what the metrics were about four minutes after the start of the boot process to decide how much other background activity continued after the first startup time.

hIOmon Metrics — hIOmon device performance metrics from a 120 second duration

The following table shows a summary of my informal test results confirming that your desktop boot (as well as shutdown) performance will vary. I saw a range of different IOPs with the HHDD having a higher number which should not be a surprise given that they do more consolidated work in a shorter period. Likewise by looking at the data collected during different time intervals will result in various results such as measuring during the peak of the boot process vs. waiting a minute or two for background tasks to quiet down.

	HHDD desktop 1	HDD desktop 1	HHDD desktop 2
Avg. IOPS	334	69 to 113	186 to 353
Avg. MByte sec	5.36	1.58 to 2.13	2.76 to 5.2
Percent IOPS read	94%	80 to 88%	92%
Percent MBs read	87%	63 to 77%	84%
Mbytes read	530	201 to 245	504
Mbytes written	128	60 to 141	100
Avg. read latency	2.24ms	8.2 to 9.5ms	1.3ms
Avg. write latency	10.41ms	20.5 to 14.96ms	8.6ms
Boot duration	120 seconds	120 to 240 sec	120

Note that for HHDD workstation 2 results shown above, metrics were collected at both 120 and 240 seconds from start of boot. The lower result numbers are based on the results at 240 seconds after much of the heavy lifting was done, with the larger metrics at 120 seconds. What the quick testing showed is that depending upon the type of desktop, configuration, the number and types of applications and data that are loaded during boot (or saved during shutdown) will have an impact on performance.

The above table shows that having insight into your environments applications and desktops as a baseline is important for making informed decisions on resource planning including what network enhancements, servers and storage will be needed for VDI or other improvements. Another benefit of having a baseline performance across different classes of applications and desktops is to be able to compare before and after, or to decide when there is a problem how the resources are performing compared to a normal mode. Also looking at the above table, the benefits of a faster local device in the form of a HHDD or a SSD can be seen which means for moving to a VDI environment, a fast low latency network able to move the required amount of data is needed along with fast servers and back end storage. In other words, fast desktops need fast networks, servers and storage.

Not surprisingly I saw the desktop with fewer applications and a smaller software footprint need or generate less IOPs (along with bytes read and written) during the boot process, not to mention boot in relatively short amount of time. The two desktops with HHDDs and many applications installed accomplished their boot in a shorter amount of time, in fact each success cold boot (from machine turned off) tended to be faster given the built in caching of the integrated SLC (single layer cell) flash (SSD) that is part of the HHDD. However, looking at the number of IOPS and bytes read and written during the interval, of course there were more. What this illustrated was that the HHDD (SSDs should give similar capabilities) were able to handle the increased activity in a shorter amount of time, however it also gave an indicator of either how many IOPs, how many bytes read and written, along with elapsed wall time or boot duration if that activity were placed onto a network.

General comments
Since I use Hybrid Hard Disk Drives (HHDD) such as the Seagate Momentus XT in some of my workstations, the benefit of integrated SLC flash memory as a cache combined with 7.2K 500GB SATA HDD became clear over next cold boots (e.g. complete power off to start up) tests. The benefit was that each successive boot was faster improving on average by several seconds. Other impacts and variables include IO optimization techniques for your physical and VDI including file system and application tuning, network, server and storage load balancing, bandwidth and protocol optimization among others.

Number of IOPs may be lower if your boot (or shutdown) time is longer on a HDD vs. HHDD or SSD where more work is done in a shorter period. Likewise, your number of IOPS may be higher if using HHDD and SSD with a shorter elapsed time. Reads tend to be easier for a device or server to handle as they are more cache friendly. For boots, there will be some writes, however for shutdown that activity will be amplified. Measure networks, servers and storage in the context of performance (IOPS, throughput or bandwidth and latency), capacity (space), availability (HA, RAID, BC, backup), and economics along with energy effectiveness.

Additional tips, comments and recommendations

Gain situational awareness with metrics and measurement for End to End (E2E) management
Avoid making apples to oranges comparison with regards to metrics and desktop configuration
Boot storms are a focus, however also make sure your VDI is optimized for shutdown as well
There are many variables and thus your mileage or IOPs will vary
How you measure the workstation performance will have an impact as will how you collect metrics
What time interval you use such as when is the boot done
Look beyond the number of IOPs of a desktop also consider the total number of bytes read and written
Keep an eye on latency as well as IOPS, packets and frames per second on the network
Avoid simply moving problems, find and fix them, then load balance or move as applicable
Reduce your data footprint impact with data footprint reduction (DFR) techniques and tools
Look into application virtualization and presentation services to off loading desktops
The above will have an impact on networks, servers and their shared (or dedicated) storage
The best benchmark is your real applications being measured by applicable metrics
Aggregation can cause aggravations, not all applications or functionality can be consolidated, however most can be virtualized
Look at virtualization in the context of life beyond consolidation
Similar to clouds, with VDI dont be scared, however look before you leap and be prepared

What is your take, what are you seeing for Windows boot and shutdown IO performance impact or characteristics?

I invite you to try hiMON and post what you are seeing given your workstations and or servers configuration profiles.

Also check out the new hIOmon Disk IO ranger tools that Tom West has posted on the hyperIO site.

12 replies on “Windows boot IO and storage performance impact on VDI”

Pingback: Measuring Windows performance impact for VDI planning | StorageIOblog
Pingback: Measuring Windows Performance impact for VDI Planning | SiliconANGLE
Tom West says:

August 10, 2011 at 8:19 pm

Greg has made some great observations/points above – and yes I’m the fellow from hyperI/O that Greg mentioned. 🙂

For instance, I would certainly agree (but maybe I’m biased) that “gaining situational awareness with metrics and measurements” for E2E management is an important realization/requirement for starters. There’s that old adage that “you can’t manage what you don’t measure”.

Moreover, it seems that there often is a predominant focus simply (and sometimes only) upon the “speeds and feeds” of the storage devices themselves when talking about storage I/O performance.

But it’s the actual performance impact upon the applications “at the other end of the cable” that ultimately matters (at least for most folks, and particularly in terms of benefits versus cost).

So gaining a “situational awareness” of how applications actually experience storage I/O performance is a key consideration, goal, and (dare I say) necessity – and especially during normal, everyday usage and in production environments.

And that’s one of the primary goals/benefits of hIOmon: enabling users to readily capture and analyze how their particular applications/systems actually make use of and experience storage I/O performance in everyday usage.

“Empirical metrics matter here” 🙂
Jim Moyle says:

August 11, 2011 at 6:46 am

Thanks for pointing out the hiMON tool, I have not seen it before.
Mark Nijmeijer says:

August 11, 2011 at 8:30 pm

Greg, thanks for the article – some good info in there. Can I assume that the ‘Avg. IOPS’ row in the table is the average IOPS for the first 120 seconds of booting, meaning that in total you measured about 40k IO operations for system HHDD1 to boot up?

Also, you mentioned logoff storms in the intro of your post, but you don’t include measurement numbers. Can you provide those as well?

Thanks!

— Mark Nijmeijer
Citrix XenDesktop Product Management
greg schulz says:

August 12, 2011 at 12:40 pm

Hello Mark and thanks for your comments, appreciate.

Your assumption of “Avg. IOPS” is correct, that is the average of the IOPS for the first 120 seconds. E.g. take the total IOPS for the period divided by time. For example the IOPS were more intense during the first 60 to 90 seconds on the HHDD (Hybrid HDD) systems as more work was being done on those faster devices. Then over time say at 120 seconds, 180, 240 seconds etc the intensity of the IOPS tappers off as windows stabilizes and actual work begins.

Needless to say there was a lot more data than what was shown in the post and more that I want to go back and collect. For example I took measurements at different time intervals to see how the IOPs were changing from read to write, as well as IO size varied along with read and write latency. I also measured the meantime from cold boot to where I could logon and until windows was basically ready to do work (e.g. when the cursor stopped blinking or circle stopped spinning). Granted there was still background work running.

One of the things I would like to do when I get some time is to go back and record what happens both during a shutdown, as well as hibernation or other activities. Im also interested to measure some of the network access of NAS/file shares during boot and shutdown, however need to address some other client projects first.

If you have not done so, check out Tom West hyperio tools, see his post above as you may want to get his tools into your labs if they are not there already.

Oh, and needless to say, take these along with any other benchmark, performance test, simulation, workload measurement with a grain of salt, what is important is your specfic applicaitons and enviroment…

Let me know if you have other questions I can address.

Hope all is well.

Cheers
gs
Paul Wilson says:

August 12, 2011 at 1:59 pm

I found this article very interesting, and given that I am often performing scalability tests for virtual environments I find most of the facts presented consistent with my own observations. Normally, I focus on the bootup and the steady-state (running) IOPS requirements, but I am very interested in this idea of shutdown storms, something I had not considered before. I am curious if you have the shutdown storm data and how it compares to the bootup storm data.

Also, you make the statement that bootup storms are generally read-intensive and shutdown storms are write-intensive. Do you have data that you can provide to support that conclusion or is it just a general idea based on how startup and shutdown sequences are managed within windows.
greg schulz says:

August 12, 2011 at 3:36 pm

Hello Paul and thanks for the comments.

Good to hear we are seeing similar things, granted would expect some differences given various configurations, applications, etc…

The industry focus has been around boot due to the opportunity to position solutions such as SSD or fast storage, servers and networks to address the aggravation caused by aggregation e.g. boot storms. However, the industry (e.g. vendors/vars/consultants/media/bloggers) also are waking up to the other aspect which may in fact hold as much if not more opportunity for them which is how to improve on steady state, as well as shutdown/logoff and other periods of activity. Some of that activity is latency sensitive (reads, writes or mix), some of it is activity or IOPS/frames/packets, while others are transfer again over a mix of reads and writes.

Based on experiences with Windows and other systems during shutdown, expect to see more writes than reads during logoff/shutdown. Given that most systems when they are shutting down have to destage or flush data, there are typically more writes than reads, just like at startup, more reads vs. writes. If you look at the numbers posted, there are mainly reads (80+ percent) with some writes. I will do some testing and gather some metrics on shutdown when I get a chance, hopefully sooner than later.

Btw, any empirical data, metrics, observations or antidotal information that you are seeing elsewhere that you can share with the rest of us?

Hope all is well, thanks in advance.

Cheers
gs
John Thorpe says:

August 18, 2011 at 2:48 am

The elephant in the room here is the massive amount of IOPS connected to the boot time for a machine, far above the the 30 IOPS papers tell you to allow for each VM in a VDI solution when sizing your storage. This is clearly where many VDI projects have failed, they may run fine in the middle of the day but at 9am things are pretty much unusable for 30 mins while the storage unknots itself.
This is why I always dedicate flash storage to the intensive points in a VDI solution, such as Gold Masters, replica redo disks, application injection repositories, and user profiles. It’s fine for user data to sit on regular storage and utilise the better cost per GB this gives. However, with products such a Nimbus supplying 2TB of flash storage (500,000 IOPS) for as little as $25k you have much lower cost per IOPS than regular enterprise storage, so there really isn’t an excuse to mess up your storage IO calculations on a VDI project anymore.

Great article!
greg schulz says:

August 18, 2011 at 2:30 pm

Thanks John, glad you enjoyed the piece.

You are spot on in that there is a vendor focus around boot storms as that is easy for them to get their heads around as a means of selling SSD. Thus what one vendors others will follow which is why you are not seeing or hearing many if any vendors bringing up the discussion about shutdowns, virus check, or even normal running applications modes. Ironically for those vendors trying to sell SSD based solution; they should figure out that SSDs also help support more writes while reducing latency as well, something that matters to users being productive. Rest assured, once some vendor figures out the other half of the VDI opportunity as well as importance of having good insight to plan for VDI to enable success, others will follow suit.

And as a follow up to previous posts, I have started another round of testing to collect metrics on what happens during shutdown, and during normal running mode using applications and their impact or IO needs. Will post some results as I get further along.

To your point John about SSD, those as well as HHDDs that incorporate flash such as the Momentus XT with 4GB SLC flash support more reads and write IOPs (e.g. avoids aggregation causing aggravation) as well as lower latency on both reads and writes.

Cheers
gs
Pingback: Virtual Intelligence Briefing » Measuring Windows performance impact for VDI planning | Greg …
Pingback: More storage and IO metrics that matter | StorageIOblog

Comments are closed.