Comparing storage virtualization technologies: EMC VPLEX, Netapp VSeries, HDS USP-V

The release of VPLEX has spawned some interesting conversations with customers. When exploring the concept of storage federation, which is enabled by storage virtualization, the question that often comes up is, how do the main storage virtualization technologies differ?

The three mainstream storage virtualization technologies in the market today are Netapp’s V-Series, HDS USP-V, and now EMC VPLEX. They all have some commonalities and differences in their architecture, function and use cases……

First, lets explore the purpose of each of these technologies:

EMC VPLEX: This technology was brought to market for the purpose of federating storage both in the data center and over distance. The problem it is looking to solve is “anywhere” data access, weather that means inside the same data center, or different data centers within 100KM distance (current limitation, but expanding to async distances). This is to enable “private cloud” architectures where by workloads can be dynamically run from any location due to their data being able to be accessed from anywhere.

Netapp V-Series: The V-series is Netapp’s gateway box that is put in front of existing SAN arrays to virtualize the storage. Reasons for doing this would be: taking advantage of Netapp software features such as snapshots, deduplication, intelligent storage caching, PAM, WAFL, etc while preserving the investment in the current SAN technology on the floor. If one has existing SAN arrays on the floor, and wish to take advantage of these Netapp’s features, one can put a V-series gateway infront of said storage, and present the data through the V-series which allows a customer to take advantages of all the advanced features Netapp has to offer.

HDS USPV: virtualizing storage behind a USP-V is done for much the same reason as a Netapp V-series. Preserving existing storage investment, while consolidating the management, taking advantage of HDS features such as snapshot, replication, thin provisioning, zero page reclamation and so on.

Now, on to some use cases / differences.

Lets assume an environment with existing storage on the floor, and explore these technologies to see what benefit they may bring.

If the purpose is to simply consolidate management and/or different silos of storage for the purpose of reclaiming space on disparate arrays and “pool” them together, any of the above technologies are capable of this. This is the very basic notion of storage virtualization. Present storage form underlying storage arrays to any of these devices and they can create virtual volumes and present those virtual storage containers to hosts. Migration and movement of data from one back end array to an another can done seamlessly with all three technologies.

How about taking advantage of advanced software features such as snapshots, replication, thin provisioning, deduplication, etc. This takes a little bit of explanation. The idea here is to collapse disparate replication, snapshot, and other advanced storage features to one set of technology for the purpose of lowered licensing costs, ease of management and economies of scale. Additionally, these capabilities may not be available in the existing storage subsystems. For example, one may want to utilize block based deduplication with an IBM array by placing a Netapp Vseries in front of it and using Netapp’s dedupe to do so. In these types of situations, it is ONLY possible to do so with a Netapp Vseries or HDS USP-V type device. The VPLEX has no advanced storage features such snapshot, replication, thin provisioning, etc. It relies on the underlying storage array for all those things. One caveat: the VPLEX has the ability to create a distributed RAID1 volume–the volume is a mirror, and each side of the mirror could come from a different array in a different location. One could view this as a replacement for traditional replication, however this is just a byproduct of what the VPLEX is really intended to do we can’t really call this “replication software” per se as it does not fit the standard definition. However, it is a new way of looking at data protection.

How about active/active access to storage within a datacenter or across data centers? Within the same data center, this can be achieved with all three technologies. As long as all the storage sits behind one Vseries or one USPV controller, there is no problem. Each of those devices can take raw LUNs from two different storage devices, and combine them together (in various ways, striping, concat, etc) and present one virtual volume/LUN to the host. Hence data is written to both back end arrays simultaneously. However, where this breaks down is across distance. The VPLEX utilizes distributed caching to achieve read/write data access across distance. Up to 100KM distances (currently), one can place a pair of VPLEX clusters in front of storage at both sites, create a DR1 volume (distributed RAID 1 volume) and have read/write access to the data at both sites. This is NOT possible with a Vseries or USP-V. Even with a pair of Vseries or USP-V at each site, using their respective replication technologies, the “secondary” side’s data WILL be write protected or in read only mode, and in some cases not even mountable by a host OS until the replication is paused and the secondary side promoted to primary. This is one of the main differences between VPLEX and these two technologies. If the purpose is active/active read/write access to data across distance, VPLEX is the only option.

From an architectural standpoint, each of these devices sit inline to the underlying storage subsystems. I/O has to pass through them before reaching the underlying storage subsystems. They all have a good bit of redundancy built in, but approach it differently. The Vseries is a cluster approach with two controllers. The USP-V is more akin to a DMX/VMAX in that it is an active/active storage platform with a large shared global cache and many front end and back end directors. The VPLEX is a scale out cluster approach, meaning you add nodes as storage requirements dictate.

Also of note is that both the USPV and the Vseries themselves are storage arrays and can house their own internal storage. So it is possible to outfit them with their own respective FC/SATA disks, as well as virtualize external storage. This is not possible with VPLEX. The VPLEX itself is not a storage array, it is only a virtualization / distributed caching engine.

The last thing worth mentioning is the differences in caching. The VPLEX uses a write through caching mechanism while the other through utilize write back. When there is a write I/O in a VPLEX environment, the I/O is cached on the VPLEX but it is passed all the way back to the underlying storage subsystem(s) before an ACK is sent back to the host. However, a Vseries/USP-V will store the I/O in their own cache and immediately return an ACK to the host; from there the I/Os are flushed to the back end storage subsystems using their respective write coalescing & cache flushing algorithms. Because of the write-back behavior, there exists a potential for a possible performance gain above and beyond the performance of the underlying storage subsystems due to the caching on these controllers (in the case of the Vseries, the write performance would come from WAFL doing all I/O as sequential writes to disk). Conversely, there is no performance gain to be had for write I/O in VPLEX environments over and beyond the existing storage due to the write-through cache design (keep in mind the underlying storage subsystem has to ACK the write before VPLEX sends and ACK to the host). However for read hits, all 3 storage technologies have the ability to serve the data from the larger cache (or PAM in the case of Netapp) and improve performance.

So I hope its obvious that there are some considerable differences between EMC VPLEX and other storage virtualization technologies and the correct choice is all about what problem(s) are trying to be solved.



Categories: EMC, HDS, Netapp, storage virtualization, VPLEX

18 replies

  1. Disclosure – EMCer here. I think this is a pretty fair comparison. thanks for the write-up!

  2. Thanks, simple and easy way to understand some difference between the tree vendors an proper use case.

  3. Disclosure: I work for HP StorageWorks.

    I just was introduced to you today based on some questions you asked on Twitter about our P4000 VSA – nice to see your blog. We need more independent storage bloggers out there!

    Missing from your list to compare is the StorageWorks SAN Virtualization Services Platform (or SVSP). It has several use cases and it’s often been discussed on my hp.com blog. You can find the product page at http://www.hp.com/go/SVSP and here’s a link to all the blogs that include SVSP: http://h30507.www3.hp.com/t5/Around-the-Storage-Block-Blog/bg-p/139/label-name/svsp. We also just announced the EVA Cluster, which is a factory integrated storage solution that includes SVSP and two StorageWorks EVA’s. Keep up the good work!

    • Calvin, thanks for the reply and the information on the P4000.

      I would have included the SVSP, but quite frankly I’m just not familiar with it enough to speak intelligently on it. For various (business) reasons, I am working to familiarize myself with the HP storage offerings, so hopefully that will change soon.

  4. This is an interesting blog, not for what it says, but what it doesn’t say. One, there is no mention of IBM’s SVC which is the premier virtualization platform out there today, and, second, who is actually using the VPLEX in the real world. I hear and see nothing but people touting the VPLEX but no reviews on anybody actually using one in their environment.

  5. The blog provides a good overview. I just wanted to add one aspect that EMC Vplex metro actually provides a good DR solution and I don’t know about other products but I know this much that NetApp Metrocluster has been providing same DR solution for a longer duration. NetApp does not position it as virtualization solution [which it is not], it provides as a synchronous DR solution and EMC’s Vplex also provide the same solution.

  6. Where is most leading IBM SVC?

  7. Nice write-up, but I have to correct you on one very important point, as also mentioned by Idean above.

    You stated “Up to 100KM distances (currently), one can place a pair of VPLEX clusters in front of storage at both sites, create a DR1 volume (distributed RAID 1 volume) and have read/write access to the data at both sites. This is NOT possible with a Vseries or USP-V.” We are doing exactly that over a distance of 30km using a NetApp V-series Metrocluster with EMC Clariion back-end disk, and have been doing so for a couple of years with good success.

    At the time of deciding what to purchase – about 3 years ago, this ability to synchronously replicate and automatically fail to a working system without downtime was our primary design remit, and EMC’s top engineers admitted themselves to us that they did not currently own anything that could support this functionality. As most vendors do when they are missing a feature, they fed us some fud about the issues and complexities of doing this as to why they hadn’t ventured into this arena. EMC were always quite good at issuing fud. I remember before they supported storage pools and thin provisioning they rubbished the fact that other products could do this with and said it was bad practise. Now they are on this bandwagon also.

    3 years down the line, EMC now have the Vplex solution, which sounds very promising. Bear in mind this is definitely nothing new. Metroclustered storage (the term not the trademark) has been around for many years, and EMC are now finally offering it as if it is some great new creation. We have previously used Datacore SAN Symphony, which was doing the same thing, but I know of:- Falconstor, LSI StorAge, NetApp, 3Par, IBM SVC and Compellent.

    • This piece of comment is exciting. EMC is always good at marketing activities, I don’t think EMC’s technology is better than NetApp, and in high end storage HDS.

    • But I have to correct one of your misunderstanding of VPlex against Metrocluster. In Metrocluster, two disk enclosures are mirrored as plex under the same aggregate, one is located in local site and the other at remote site, NetApp use FC over ISP fiber to sync the data, and actually user can config it to be sync or async mode, most scenario is async. The two controllers in local site and remote site cannot access the Lun concurently, if remote host access the local Lun, the remote controller will forward the IO request through the inter connection between the two controler to the local controller to handle. While vPlex can access concurrently with cache coherency,it doesn’t have to forward the IO request to the other pair of controller, it can handle on it’s local site, for example, a read request, if no dirty cache on the opposite controller, it can just read from local, rather than reading from remote.

  8. I know I am very slow to find this blog and am posting on an article written last year, but I was also surprised to see no reference to IBMs SVC (which is now complemented by the IBM Storwize V7000). The SVC is in over 6000 customers worldwide(actually I think that number is higher), so surely it should have deserved at least a mention?

    • The only reason SVC was not mentioned was because I am not as familiar with it. If you like, I’d be happy to work together on an SVC/VPLEX comparison that we can publish. Contact me if you are interested.

  9. Similar function as netapp metro cluster but you should compare more details architecture in your post

    We always talk about HA and DR as different scenerio, would you able to share the details failover situation from controller, disk, wan connections and DR situation? Interesting to find out more how vplex will handle each situation if it occurs. One last thing, how vplex handle split brain issue if the WAN connection is drop

    Look forward to your reply

  10. Really nice comparision with these 3 top selling vendors. I am expecting a comparision with NetApp Block level Deduplication & other vendor dedupe technique.
    One more thing is about NetApp’s PAM (Read Cache) & EMC claims to have Write Cache.

  11. Can a VPLEX do a auto tiering across storages?
    To be more specific: can a workload which demands very less performance can be moved online from a SAS drive to SATA drive across storage boxes? I guess, this could be possible with Hitachi or Netapp but not sure if possible with VPLex?

Trackbacks

  1. Help moving
  2. Gaptek

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: