The release of VPLEX has spawned some interesting conversations with customers. When exploring the concept of storage federation, which is enabled by storage virtualization, the question that often comes up is, how do the main storage virtualization technologies differ?
The three mainstream storage virtualization technologies in the market today are Netapp’s V-Series, HDS USP-V, and now EMC VPLEX. They all have some commonalities and differences in their architecture, function and use cases……
First, lets explore the purpose of each of these technologies:
EMC VPLEX: This technology was brought to market for the purpose of federating storage both in the data center and over distance. The problem it is looking to solve is “anywhere” data access, weather that means inside the same data center, or different data centers within 100KM distance (current limitation, but expanding to async distances). This is to enable “private cloud” architectures where by workloads can be dynamically run from any location due to their data being able to be accessed from anywhere.
Netapp V-Series: The V-series is Netapp’s gateway box that is put in front of existing SAN arrays to virtualize the storage. Reasons for doing this would be: taking advantage of Netapp software features such as snapshots, deduplication, intelligent storage caching, PAM, WAFL, etc while preserving the investment in the current SAN technology on the floor. If one has existing SAN arrays on the floor, and wish to take advantage of these Netapp’s features, one can put a V-series gateway infront of said storage, and present the data through the V-series which allows a customer to take advantages of all the advanced features Netapp has to offer.
HDS USPV: virtualizing storage behind a USP-V is done for much the same reason as a Netapp V-series. Preserving existing storage investment, while consolidating the management, taking advantage of HDS features such as snapshot, replication, thin provisioning, zero page reclamation and so on.
Now, on to some use cases / differences.
Lets assume an environment with existing storage on the floor, and explore these technologies to see what benefit they may bring.
If the purpose is to simply consolidate management and/or different silos of storage for the purpose of reclaiming space on disparate arrays and “pool” them together, any of the above technologies are capable of this. This is the very basic notion of storage virtualization. Present storage form underlying storage arrays to any of these devices and they can create virtual volumes and present those virtual storage containers to hosts. Migration and movement of data from one back end array to an another can done seamlessly with all three technologies.
How about taking advantage of advanced software features such as snapshots, replication, thin provisioning, deduplication, etc. This takes a little bit of explanation. The idea here is to collapse disparate replication, snapshot, and other advanced storage features to one set of technology for the purpose of lowered licensing costs, ease of management and economies of scale. Additionally, these capabilities may not be available in the existing storage subsystems. For example, one may want to utilize block based deduplication with an IBM array by placing a Netapp Vseries in front of it and using Netapp’s dedupe to do so. In these types of situations, it is ONLY possible to do so with a Netapp Vseries or HDS USP-V type device. The VPLEX has no advanced storage features such snapshot, replication, thin provisioning, etc. It relies on the underlying storage array for all those things. One caveat: the VPLEX has the ability to create a distributed RAID1 volume–the volume is a mirror, and each side of the mirror could come from a different array in a different location. One could view this as a replacement for traditional replication, however this is just a byproduct of what the VPLEX is really intended to do we can’t really call this “replication software” per se as it does not fit the standard definition. However, it is a new way of looking at data protection.
How about active/active access to storage within a datacenter or across data centers? Within the same data center, this can be achieved with all three technologies. As long as all the storage sits behind one Vseries or one USPV controller, there is no problem. Each of those devices can take raw LUNs from two different storage devices, and combine them together (in various ways, striping, concat, etc) and present one virtual volume/LUN to the host. Hence data is written to both back end arrays simultaneously. However, where this breaks down is across distance. The VPLEX utilizes distributed caching to achieve read/write data access across distance. Up to 100KM distances (currently), one can place a pair of VPLEX clusters in front of storage at both sites, create a DR1 volume (distributed RAID 1 volume) and have read/write access to the data at both sites. This is NOT possible with a Vseries or USP-V. Even with a pair of Vseries or USP-V at each site, using their respective replication technologies, the “secondary” side’s data WILL be write protected or in read only mode, and in some cases not even mountable by a host OS until the replication is paused and the secondary side promoted to primary. This is one of the main differences between VPLEX and these two technologies. If the purpose is active/active read/write access to data across distance, VPLEX is the only option.
From an architectural standpoint, each of these devices sit inline to the underlying storage subsystems. I/O has to pass through them before reaching the underlying storage subsystems. They all have a good bit of redundancy built in, but approach it differently. The Vseries is a cluster approach with two controllers. The USP-V is more akin to a DMX/VMAX in that it is an active/active storage platform with a large shared global cache and many front end and back end directors. The VPLEX is a scale out cluster approach, meaning you add nodes as storage requirements dictate.
Also of note is that both the USPV and the Vseries themselves are storage arrays and can house their own internal storage. So it is possible to outfit them with their own respective FC/SATA disks, as well as virtualize external storage. This is not possible with VPLEX. The VPLEX itself is not a storage array, it is only a virtualization / distributed caching engine.
The last thing worth mentioning is the differences in caching. The VPLEX uses a write through caching mechanism while the other through utilize write back. When there is a write I/O in a VPLEX environment, the I/O is cached on the VPLEX but it is passed all the way back to the underlying storage subsystem(s) before an ACK is sent back to the host. However, a Vseries/USP-V will store the I/O in their own cache and immediately return an ACK to the host; from there the I/Os are flushed to the back end storage subsystems using their respective write coalescing & cache flushing algorithms. Because of the write-back behavior, there exists a potential for a possible performance gain above and beyond the performance of the underlying storage subsystems due to the caching on these controllers (in the case of the Vseries, the write performance would come from WAFL doing all I/O as sequential writes to disk). Conversely, there is no performance gain to be had for write I/O in VPLEX environments over and beyond the existing storage due to the write-through cache design (keep in mind the underlying storage subsystem has to ACK the write before VPLEX sends and ACK to the host). However for read hits, all 3 storage technologies have the ability to serve the data from the larger cache (or PAM in the case of Netapp) and improve performance.
So I hope its obvious that there are some considerable differences between EMC VPLEX and other storage virtualization technologies and the correct choice is all about what problem(s) are trying to be solved.