Anytime we have a product release or when a nascent market starts to become more mainstream, the result is massive amounts of confusion for most customers, and with good reason — vendor marketing goes into overdrive, competitive documents fly around like Amazon drones (or will one day), and every single vendor claims their technology is the best for reason X, Y, and Z. This is compounded by the fact that since the market is just starting to take hold, the pool of customers that have already adopted said technology is very small. So a CIO at company X can’t pick up the phone and call his/her buddy CIO at company Y and ask about their experience a real world scenario. And while POCs are good, they are not practical for every customer as they often require significant time and effort investment from the customer who is already resource constrained.
If I survey the all flash array (AFA) market today, this is exactly what is happening. Unfortunately, none of the competitive documents, feature checklists or blog posts detailing deep architectural designs are very useful to customers because essentially what we have is every vendor saying their flash array is the best while pointing out “perceived pitfalls” of their competition. The other facet which complicates this for customers is that building a good all flash array is different than building a good “traditional” array. So while customers (for the most part) understand how to cut through the fluff and ask the right questions when evaluating a “legacy” storage array, that isn’t the case for an AFA (details here: https://vjswami.com/2013/11/11/a-primer-on-flash-and-a-look-into-the-challenges-of-designing-an-all-flash-array/). Additionally, if you look at most vendor blog posts on their AFA, they talk about very deep architectural design topics as a way to differntiate themselves. These low-level technical details should be relegated to product engineering teams and are something that a customer should never have to concern themselves with. All a customer should care about is making the best decision to ensure their success, not how the sausage is made. Don’t get me wrong, the geek in me thrives on understanding these details, but they just aren’t particularly useful to customers.
With that said, what do customers need to understand about AFAs, and what questions should they ask when evaluating vendors?
Consistent Array performance while “flash maintenance” operations are taking place
The nature of flash requires certain “maintenance” operations to take place such as garbage collection, flash overwrites, and wear leveling as an array is filled with data. It is critical to understand an AFAs performance under these conditions as that is how the system will exist in production environments. Any AFA can perform great when its close to empty, but in the real world, no one is going to keep their AFA 10-20% full. Customers are spending the money on flash to use as much of it as possible. A scenario where AFA is 80%+, all flash pages are non-zero, and it’s receiving 100% random read/writes most accurately reflects its performance envelope in production environments. It is important that the performance be consistent in this scenario, otherwise as these “flash maintenance” activities need to take place, the application(s) will have very inconsistent performance. Garbage collection implementations vary among AFAs, but at the end of the day it doesn’t matter where (ex: system level vs SSD level) or how garbage collection is done as long as it doesn’t interfere with delivering consistent performance to the application.
Takeaways: What is the AFA’s performance when all the flash pages are non-zero? How about when the AFA is 80%+ full? Some degradation should be expected as capacity approaches 100%, but how does it degrade? Is it graceful/linear, or catastrophic?
When looking at published IOPS/latency benchmarks, the read/write ratio is important
NAND flash has a unique property that a portion of the flash has to be locked during writes and erases– this is known as flash locking. What this means is that AFAs have to manage conditions where reads and writes (or erases) are simultaneously happening to the same region of an SSD. Once again the implementation of how this is handled across AFAs is different, however what is important is that through mixed read/write workloads, the array maintains its performance. No customer will have 100% reads or 100% writes going to an AFA. The real world scenario is some mix of read and writes (noting that this mix can also vary over time). Thus, in addition to the 100% read and 100% write workload benchmarks, its important to find out the IOPS/latency performance on mixed workloads.
Takeaways: What are the IOPS/latency numbers for 100% random 50/50 read/write workloads? Similarly, 65/25 and 80/20 ratios can also be looked at for some perspective.
Data Reduction Methods
Two of the main methodologies to increase usable capacities of AFAs and reduce the $/GB is de-duplication and compression. These are both possible because flash is a random access device, so the typical issues with de-duplicating and/or compressing primary data on HDD does not apply. The AFA architectures should be designed from the ground up to leverage these data reduction technologies and not be “bolted on” after the fact. Compression is an important data reduction technique to leverage because as non-dedupe friendly datasets are stored on the system (such as tier1 databases), compression can offer data reduction where dedupe cannot, often times 2:1 or higher on top of any de-dupe. This will serve to further reduce the $/GB and allow more data to be stored on the AFA.
Takeaways: Are compression and de-dupe performed in-line? Are the performance benchmarks marketed with the data reduction methods “turned on”? Meaning, there should be no performance drop to leverage de-dupe or compression. Is there any situation where de-dupe or compression will/should be “turned off?”
Data Protection Mechanism
Traditional RAID algorithms are not suitable for AFAs because they cause write amplification. Write amplification is an AFAs worst enemy because it impacts performance as well as the endurance of the SSDs. As such, the data protection mechanism should be designed specifically for flash, perform better and have less overhead than RAID1/RAID5/RAID6/et. al.
Takeaways: What is the capacity overhead, write overhead and read overhead of the data protection mechanism? How many SSD can fail simultaneously before incurring data loss?
Ease of Administration
Because AFAs utilize random access media and provide such high performance, the “segregation” of workloads into separate RAID groups, pools, etc should not be required. Additionally the number of nerd knobs such as cache tuning and other tweaks found in traditional arrays should be eliminated from the system. There should be no choice of “RAID level” as a single data protection scheme optimized for flash should be used. The existence of RAID level settings on an AFA is a sure sign that the architecture is not very optimized for flash.
Takeaways: The configuration of an AFA should be extremely simple: 1. Create a Volume, 2. Create Host Initiator Groups, 3. Map the Volume to an Initiator Group. If there are tuning settings required which are familiar to traditional arrays, it may be a sign that the AFA in question is not optimized for flash. The management simplicity of an AFA should be considered a core tenant.
Beyond just storing data on an AFA, customers will want to leverage snapshots (clones) and replication. Snapshots should be zero cost from both a performance and capacity perspective and be able to have (practically) infinite depth. If enabling snapshots causes a drop in performance or there are other limitations, it’s a sign that the snapshot architecture is not designed to take advantage of the random access nature of flash. The same goes for replication.
Takeaways: What is the performance difference between doing IO to a primary volume versus a snapshot/clone volume, and what impact does that have on the system? How much space does a snapshot (initially) take up? What is the replication strategy utilized and what performance impact does it have (if any)? Does the replication transmit data on the network which is de-duped and compressed? Is there any write amplification caused by the snapshot/cloning/replication data services?
SSD Drive Choice
AFAs on the market today use a variety of different SSD drives: SLC, eMLC, cMLC, etc. I don’t think customers should concern themselves with the type of drives used if the vendor guarantees an endurance of the system which is acceptable, meets the performance requirements and does so at the right cost. The results of how each AFA is managing the SSDs is more important than the details as to how they are managing it. However it is critical that the performance be “evaluated” with the aforementioned notes in the previous sections.
Takeaways: What is the mean time between parts replacement based on field installations/testing?
AFAs on the market today utilize both a scale-up and scale-out approach. I don’t want to get into the scale-up versus scale-out argument, however what is important to note is the capacity/performance scaling points to ensure that will meet the growth needs in years 2, 3,4 and so on. The capacity numbers are quite low today for AFAs (relatively speaking), however the data reduction techniques are aiding in solving that issue. As we see larger and larger SSDs become qualified, this will become less and less of an issue. Physical capacity aside, each AFA is going to have an upper limit on logical capacity because of the metadata it needs to store on de-duped/compressed blocks.
Takeaways: How easy is it to add capacity/performance for growth and what is required? What is the logical capacity limit of the system and how does that scale?
While this isn’t specific to AFAs, in any storage array, it is important to understand the failure scenarios and their impact. As a customer, these should be well understood so that if/when they occur (lets face it, HW fails, period) any performance or availability degradation does not come as a surprise.
Takeaways: What is the impact of losing SSDs in the system? How many SSDs can be lost in a system and still maintain zero data loss? What is the impact of a controller failure? What is the impact of a shelf (DAE) failure? What happens if the array loses power?
While this also isn’t specific to AFAs I think it’s an important one because we are in the beginning stages of the AFA market and so we should expect to see software improvements iterate at a rapid pace. Thus it is critical to be able to upgrade to the latest code releases to take advantage of these enhancements with minimal impact to the applications. Many of the AFAs on the market are using commodity hardware, so the bulk of performance and feature enhancements will come in the way of software. Additionally, the SSD manufactures are constantly improving their own firmware, and this will require applying firmware updates to the SSDs themselves.
Takeaways: What is the impact of applying a code upgrade? Are there situations where a controller needs to be rebooted to apply new code? How is applying firmware updates to the SSDs themselves handled?
It’s not all about the highest performance and the lowest latency
I would also caution against putting too much weight on “extreme” performance & latency numbers. What I mean by this is that an AFA that meets the capacity requirements, provides the right data services, and performs consistently while delivering 200,000 IOPS at <1ms would be a good fit for the majority of customers today. If another AFA delivers 250,000 IOPS it really doesn’t provide any customer value unless the customer can take advantage of those extra 50,000 IOPS. The same goes for latency. If vendorA is providing the IOPS at 900uS and vendorB does so at 1.1mS, the chances are that an application will never notice the difference. This is a markedly different thought process than most customers are used to because when it comes to traditional HDD based arrays, it’s easy to push the performance envelope and drive up latencies because those arrays are HDD spindle bound and as such their performance thresholds are much lower. The bulk of AFAs on the market today deliver more performance than a customer will ever use, so for most customers the limiting factor becomes capacity not performance. Again, its important to not confuse performance with consistent performance through the lifecycle of the AFA.
Architecture is important, but customer value and real world results matter more
From what I can see, customers are drowning in the minutia of deep architectural details, rather than evaluating the AFAs based on what really matters: consistent performance & latency, $/GB & $/IOPS costs, and the required data services. The right comparisons have to be done at the right level. I’m sure this list will expand as the AFA market matures, but for now, these are the things I’d be concerned with and the questions I’d ask if I was a customer evaluating the AFA market.