With the VNX2 marketing event (Speed to Lead) dust settling, customers are now asking “whats in it for me”. Although the MCx re-write is nothing short of phenomenal from an engineering perspective, the reality is that the majority of customers don’t particularly care about internal architecture/code changes to a product, they simply want to know what the end benefit will be for them.
Before getting into the meat of it, I want to provide some context for the things I’m going to say. The majority of my VNX customers are what EMC would call “mid-market” or what the rest of us would identify as medium size enterprises; customers with real resource constraints, real budgets, and not enough hours in the day to get things done (let alone dedicate to storage management). These customers need storage to be invisible in their infrastructure, be easy to manage and just work. These are also the same customers who need a single storage array to be good at consolidating multiple workloads — VMware, Oracle/SQL databases, File Services. Further, they need a single array to do a good job at serving both capacity & performance. To provide some concrete numbers, a recent design I did was for a customer who purchased a ~150TB VNX for VMware, Oracle OLTP, Oracle DW, Exchange, File Services (CIFS), for their production & development environments.
Why bother providing the above details? Well, one of the major problems I see in new technology discussions is too much generalization around the customer profile. When discussing a product or solution’s efficacy, its important to identify the type of customer and the use case in somewhat specific terms. Notably here, I am not talking about customers on the extreme ends of the capacity & performance spectrum who might consider buying purpose built arrays for a particular use case/use cases — not in scope are: service providers, VERY large enterprises who have an entire array just for a single app, and also very small customers who have capacity requirements in single digit TBs and little to no performance requirements.
In the customer profile I’m describing (which I believe to represent the majority of the “mid-range storage” TAM that vendors are going after), there hasn’t been a single sales campaign where I have not been able to meet the requirement with a particular VNX model. So, realistically, the “1 Million IOPS” test, or other extreme performance metrics touted by the VNX2 marketing is of little relevance to these customers. If extreme performance offered by the MCx architecture and new hardware isn’t going to be relevant to this customer base, what value is there in the new VNX2 for them? These customers don’t necessarily need better performance than what the original VNX had to offer, but what they could use is an even lower TCO, an even easier to manage system, and even better storage efficiencies — this is not to say the original VNX did not fare well in these categories but these are areas where every product can and should be looking to improve. So for them it comes down to lowering the $/IOPS & $/TB (increased efficiencies, thus lowering TCO) as well as lowering the management effort. With that in mind, here are the specific areas where I feel customers are going to experience a tangible benefit from the VNX2:
In certain situations when designing an array, you have to size larger controllers than necessary per the drive count to accommodate the performance requirements. In these situations, having VNX2 controller options that are 4x faster than the VNX controllers will allow configurations to go out the door with smaller controllers for the same performance requirement. This translates to a direct reduction in cost to the customer.
eMLC drives for non-fast cache EFDs:
In previous generation VNX all that was available was SLC EFDs. The main difference between SLC and eMLC EFDs in this context is the life of the drive over a period of write cycles. SLC drives are rated for 30 writes per day for 5yrs per cell; since writes are cycled around all the cells, for a 100GB SLC EFD storing 12.5 Million 8K blocks, 30 x 8K writes means ~4300 writes/sec constantly for 5yrs before a cell could potentially fail. eMLC drives have a write cycle resiliency rating of around 10 writes per day, which translates into ~1500 writes/sec constantly for 5yrs. There are two uses for EFDs in a VNX/VNX2: Fast Cache and persistent data storage either in a pool or RAID group. Fast Cache has MUCH more intense write activity as its an extension of the DRAM cache, while the pool/RAID group EFDs generally have much less write activity. By being able to utilize SLC EFD for Fast Cache and eMLC for pools/raid group/FAST-VP, this also translates to a reduction in cost to the customer as the eMLC drives are quite a bit less expensive than the SLC EFDs for the same capacity.
FAST-VP 256MB slice granularity:
This one, IMO is going to be a huge win for customers. Consider the following example in the original VNX (there are private RAID groups, and private LUNs under the covers of a pool structure but for the sake of this example, its irrelevant): on the VNX, FAST-VP moves data in 1GB “slices”. What this means is that if any data in a 1GB LBA range needs to be moved, the entire 1GB slice is moved. Lets imagine a 500MB contiguous LBA range of hot data sitting on a SAS or NL-SAS tier needs to be moved to EFD. An entire 1GB slice will actually be moved. If we extrapolate this situation over a 100GB EFD drive, we can end up with 100GB of data sitting on the EFD drive, but only 50GB is actually “hot” — 100 x 500MB slices are hot, but 100 x 1GB slices actually get moved. This gives the EFD drive an efficiency of 50% because it services 50% cold data and 50% hot data. This is just an artifact from the 1GB slice size. How would this look on a VNX2? Since the slice size is is 4x more granular (256MB vs 1GB), each time a 500MB contiguous LBA range needs to be moved, only that 500MB of hot data will be moved. Thus results in a MUCH higher utilization of the EFD drive since much more of the data sitting in the EFD tier will actually be hot. See below illustration (same applies to the SAS tier):
That’s fine and dandy, but how does that benefit the customer? The 4x granularity of the slice size is going to let more hot data sit on the EFD and SAS tiers, and thus configurations can be put together with even LESS EFD ($$$), LESS SAS ($$) and more NL-SAS ($) drives resulting in a lower overall cost to the customer for the same performance/capacity requirements. In the previous example to accommodate 100GB of “hot data” we would need 200GB of EFD on the VNX, but we could accomplish the same with just 100GB of EFD on the VNX2. (**Note: while these are hypothetical examples to illustrate a point, they the concept is sound based on quite a few VNX FAST-VP performance reviews I have done. Also, some rounding of slice numbers, capacities, etc have been done to keep the math simple.)
Multi-Core Cache (MCC):
There’s two aspects to this which I feel will be immensely beneficial to customers. First is the 100% reduction in storage administration overhead in setting up the cache. Previous to the VNX2, the storage admin had to decide how to partition the cache into read & write amounts as well as configuring high and low watermark settings, and often times make tweaks to those settings after initial setup on an ongoing basis based on the IO patterns. The VNX2 has removed the need for manual cache settings as it is now self-tuning and is constantly being optimized based on the ever changing incoming IO patterns. Automated/Adaptive settings + no knobs to turn = more time for higher level and more business impacting work.
The second benefit is behavior around write flushing. The cache subsystem now tracks IO arrival rates and also understands the ability of the back-end disk subsystem to sink the IO from cache flushing (i.e. the IO ability of a SAS drive vs EFD drive in a specific array configuration). Based on those two factors, the system can throttle the IO arrival rates on the front-end to minimize and in most cases completely eliminate a force flushing situation (anyone who has been in a forced flushing situation knows how crippling it can be). The MCC can also take advantage of all the cores when performing IO de-staging which further lessens the risk of forced flushing.
The DRAM cache is the most valuable asset on a storage subsystem, and having it completely adaptive and self-tuning along with drastically reducing or eliminating the forced flushing allows for more efficient use of the cache which will directly translate to reduced latencies to applications.
Policy based Hot Sparing:
This is a subtle and small change but I think its worth highlighting because of the trend its starting. Everywhere in the industry we hear talk tracks similar to “set policies and enforce against them instead of performing manual tasks” yet there are very few areas where it actually occurs. The VNX required the storage administrator to flag certain drives as hot spares and this was a manual process. It was also error prone because it was up to a human to ensure there were the correct amount of hot spares in the system to minimize the risk of data loss. Commonly, as the array’s grew after the initial purchase the storage admins would forget to flag new upgrade drives as hot spares to align with the “1 hot spare per 30 drives” EMC best practice. The human element is now completely removed on VNX2 — the storage admin simply sets a policy stating “I want 1 hot spare per 30 drives” and the array will not drives be allocated to a pool or RAID group if the policy is violated. In this case, policy based enforcement rather than manual administrative work is solving a somewhat minor issue, but I believe this is just the introduction of policy based management in the VNX2 array family. I could see a day when the storage administrator can say “give me 200GB of storage with 1000IOPS that is replicated” and it just automatically happens instead of the storage admin having to find a suitable pool, with enough capacity/performance, that is replicated through a manual process to create and assign the storage.
This is an obvious enhancement for customers as it serves to reduce the $/TB and thus the overall TCO of the array.
Monitoring and Reporting software:
This is another subtle change, this time to the software packaging but worth noting as I feel it provides true value to the customers. The Monitoring & Reporting suite is a chargeable software package for the VNX that allowed customers to do report generation, capacity planning, trending analysis of storage consumption, as well as performance trending in an easy to use UI. In the new VNX2 line, this software is included for FREE. Every customer who purchased M&R for their VNX absolutely loves the product as its exactly what they have been asking for a long time. EMC making it free with VNX2 purchases is a great move as robust reporting capabilities are becoming table stakes these days in storage platform discussions. Customers need a way to monitor and trend capacity & performance growth and having a tool to allow them to do it for free on a VNX2 is a big benefit.
Transactional NAS enhancements:
The one exception to relevance of “bigger, better, faster” I’ll make is around NAS performance. There are two types of NAS workloads: transactional NAS, and traditional NAS. Traditional NAS is home directory serving, MS office type document sharing, file server consolidation, etc. Transactional NAS is what I’m focusing on here and examples would be: VMware on NFS, VDI/EUC on NFS, Oracle dNFS/NFS, Exchange, Sharepoint, SQL, etc running on NAS — these are file framed block access patterns that are latency sensitive. I think ongoing transactional NAS performance improvements are very significant and strategic for customers, and here is why: first and foremost NAS is just easier to manage than LUNs and the associated block storage configuration overhead; second is the fact that the world is moving away from the LUNs and the traditional block storage concepts, and for good reason. Weather you look at VMware’s VSAN, or EMC’s ScaleIO acquisition or some of the “hyper-converged” players (such as Nutanix), none of them have the concept of LUNs. NAS is a good intermediary step towards the ultimate goal of moving towards app aware object based storage, and provides many of the same benefits. Anytime I’m in storage discussions with my customers, although I’m not religious about front-end protocols, I try to educate them around the benefits of NAS for all their workloads — VMware, Oracle, et. al. wherever possible and wherever its supported — these are the transactional NAS workloads, and the VNX2 has made significant improvements in performance around them, slowly removing the barriers to put all Tier1 workloads on NAS regardless of performance requirements. IMO the more customers can embrace transactional NAS as a bridge to future storage architectures, the better.
I know the sentiment around the VNX2 launch seems to be that EMC rotated too heavily on the marketing aspect and it was “much ado about nothing” from a substance point of view. And unfortunately that has caused the VNX2 to be largely dismissed as just a faster VNX. Hopefully this discussion helps return the conversation where it belongs — how it will help customers. When it comes to needing a storage platform that does many things well, its hard to argue against the VNX/VNX2 line. Building a reliable and robust storage platform to support so many functions is not something to be taken lightly, so EMC’s strategy to make steady improvements to their mid-range offering while building a bridge to the future is a good one IMO.