One of the great things about working for a VAR is that I get to evaluate best of breed solutions in a vendor agnostic manner. Lately, I’ve been spending more and more time speaking with and doing VDI designs for both our EMC and Netapp customers.
I thought I’d take some time to discuss how each of these companies are helping to solve the “VDI IOPS problem.” For those not familiar with this particular problem space, to put it very simply: with physical desktop deployments, each desktop/user has their own hard drive. When moving to a virtual desktop, the storage is consolidated on a SAN/NAS array, and thus there is no longer a 1:1 dekstop to disk drive relationship. This is not to say that you *cannot* have a 1:1 desktop to disk relationship, but doing so would be extremely cost prohibitive and ruin any practical method for justifying VDI regardless of any opex savings. The issue is not one of disk space, but one of performance, namely IOPS. Each disk drive can only support so many IOPS, so the question becomes, how do we design cost effective storage solutions for VDI, while providing the correct amount of performance to not impact user experience.
EMC and Netapp both have different solutions to this problem. To illustrate, lets look at a simple use case:
- 5000 users
- average desktop user IOPS requirement of 10IOPS per user
Simple math gives us 5000 * 10 = 50,000 IOPS required from the storage array. Assuming a standard deployment of 15K fiber channel disks (at 200 random IOPS per disk with decent latency), NOT counting RAID overhead, we would need ~250 spindles to support this workload. This (relatively) massive spindle count will completely skew any TCO when building a business case for VDI, so we must find a way to lower this spindle count, and do it in a cost effective manner WITHOUT sacrificing the performance of the overall system. This is not a $/GB problem, this is s $/IOPS problem. Applying deduplication techniques, and reducing the overall spindle requirement from a CAPACITY perspective will do nothing for solving the $/IOPS problem. Or will it? The storage cost is one of the biggest costs of any VDI deployment. So this is a very important place to look for efficient designs.
Let’s first talk about Netapp’s architecture, and how one would go about designing a storage solution for VDI. For starters, Netapp employs what is called WAFL, which is a file system that sits on top of the raw storage. WAFL stands for “write anywhere file layout.” When doing write I/O, Netapp always appends to and never over-writes existing blocks. This boosts write I/O performance because all write I/O becomes sequential– random write I/O is “grouped” in cache and written to disk in a sequential manner and a metadata/mapping table is maintained in cache to track the actual blocks. The draw back however is during read I/Os. Because blocks were not written in the order the host specified, all reads become random in nature and lose their “sequentiality” even if the host is requesting sequential blocks on its filesystem. Of course read cache does help this problem somewhat because cache pre-fetch can still happen since Netapp has knowledge of the metadata/mappings, thus it can use algorithms to pre-fetch during host sequential I/O, but there is undoubtedly overhead due to the extra seeks on disk. How much, and its exact impact on read performance is something which has been wildly debated (and tested by some). Lets just say that if it was a REAL big issue in the real world, that is if it was a major competitive disadvantage, Netapp would not be as successful as it is selling its kit. The other optimization Netapp brings to the table, is what they call “intelligent caching.” Essentially this is “deduplication” of the cache, and Netapp will only store one actual block of data in cache for every 255 duplicate physical blocks. This allows the storage system to hold more actual blocks of data in cache since it is not storing duplicate blocks. More data in cache, less seeks to disk, faster I/O with less spindles. Based on read/write work load, 95th percentile I/O, and other factors, the Netapp sizing can tell you how much in spindle count can be saved with this feature versus the RAW IOPS spindle count calculation. This will allow for a reduction in spindle count in the VDI use case due to the proliferation of duplicate blocks. Think about the OS image on all 5000 desktops– there will undoubtedly be many duplicate blocks referenced again and again and these can all be served via the same block in cache to some degree, saving other areas of cache for other blocks of data. The more significant reduction in spindle count, comes in the way of what Netapp calls PAM/PAM-II cards. These cards are essentially large read caches for the Netapp array. Since Netapp is already very efficient in write I/O due to WAFL, using PAM/PAM-II cards much of the read I/O can be served from cache, drastically reducing the spindle count needed to support the I/O. Again the amount of spindle count reduction must be run through Netapp sizing while taking into consideration all aspects of the I/O profile for the particular desktop load. I.E. it will have a bigger reduction in spindle count on READ work loads then WRITE work loads because the PAM/PAM-II cards themselves are pure read caches. So in this design, utilizing View Linked Clones/Netapp Dedupe, the OS drive can be placed on FC, letting PAM/PAM-II do the work of serving read I/O from cache, while the data drives can be placed on a SATA tier.
What is EMC’s approach to this IOPS problem? Enterprise SSD drives. The idea being that SSD drives can perform at several magnitudes the speed of traditional 15K spinning disk (I am intentionally leaving out a magnitude number as that varies based on I/O work load…but we can assume a safe 5-8x increase in I/O density for desktop work loads), we need far fewer of them to support the I/O requirements. Enterprise SSDs are a perfect fit where a LOT of IOPS are required but very little capacity. Replacing hundreds of FC drives with a much smaller footprint of SSD drives can save cost, and provide the I/O required in a much smaller footprint. EMC does perform write coalescing and cache pre-fetch, but it does not do cache “dedupe” like Netapp, however its not needed since the underlying drives themselves can serve the I/O very quickly compared to traditional FC. Using something like View Linked Clones, the master replica can be placed on I/O dense SSD drives, and the data drives can be placed on a capacity dense SATA tier. Because the SSD drives themselves offer a massive increase in performance on a per drive basis, massive amounts of read cache are not needed like the Netapp solution. While not as involved as Netapp’s WAFL+Intelligent Dedupe+PAM/PAM-II design, I can assure you that it does a very good job of reducing the spindle count, lowering costs, and satisfying performance.
So to summarize…
- intelligent cache dedupe to increase cache efficiency
- WAFL to optimize write I/Os
- PAM/PAM-II to optimize the read I/O
- Netapp dudupe, or Linked Clones to solve the “capacity problem”
- Enterprise SSD storage tier to serve the OS/boot drive image
- Linked Clone (or potentially Celerra dedupe) to solve the “capacity problem”
I have specifically left out EXACT spindle count savings with PAM/PAM-II and SSD as that is something which has to be done very carefully with BOTH systems by anticipating (or actually monitoring) I/O work loads as it will effect the raw spindle reduction accordingly. I have also left out things such as RAID-DP versus RAID5 as I believe they don’t play as big a role as what is highlighted above.
Both vendors have different approaches to the problem, but both are effective as demonstrated in real customer deployments.
Comments always welcome.