VIJAY SWAMI

thoughts and musings regarding enterprise technology & business

The details behind VMAX Cloud Edition

Posted by Vijay Swami on February 27, 2013

Anytime there is a product with the name “cloud” it tends to stir up a lot of interest from customers & peers. On one end of the spectrum you have vendors that simply take existing products and rebrand them as “cloud” versions while others that actually make something worthy of the name “cloud”. I believe the VMAX Cloud Edition to be the later, albeit a first step towards the eventual goal of Storage As a Service, Software Defined Storage, Enterprise Cloud-Friendly Storage, <insert any buzz word here>, etc. But, what is it REALLY?

To summarize what is a VMAX CE: its a VMAX system, with a self-service portal, REST API support, chargeback/showback reporting and multi-tenancy built-in. It abstracts away storage requirements into a discrete number of “storage offerings” doing away with the traditional methods of provisioning and thinking around storage.

Architecture:

VMAX CE Architecture

VMAX CE Architecture

A few facts:

  • The VMAX Cloud Edition product is based on the VMAX10K hardware. This means 1-4 engines, D@RE (Encryption) capability, etc. One point to note however is that you can have up to 10 VMAX Cloud Edition frames running behind a single management interface provided by two management appliances connected in a HA fashion. These appliances connect via VPN to EMC’s datacenter as well as via FC to the array / switch fabric.
  • As can be seen above, the portal is actually located in EMC’s data centers. If you or your customer strictly will not allow connectivity externally, this product will not work. This may or may not change in the future to where it can be 100% hosted on-prem. But for now, this is a requirement.– the “consumers” of storage connect to EMC’s Data Centers via secure web access. The storage admins connect to EMC’s datacenter via VPN.
  • You cannot manage the VMAX Cloud Edition using Unisphere. You must use the self-service portal or the REST API. However, managing the VMAX CE through Unisphere would defeat the purpose of it in the first place.
  • From a front-end host connectivity perspective, all it supports is FC (Fiber Channel). There may or may not be plans to support iSCSI/FCoE in the future
  • You absolutely cannot upgrade an existing VMAX to VMAX CE
  • Chargeback style reporting is built-in with a granularity on a per-tenant basis (customer, business unit, department, etc)
  • The service catalog specifies the following storage offering, and each offering has a $/GB cost associated with when purchasing the VMAX CE: Diamond 3-4IOPS/GB, Platinum 1-3IOPS/GB, Gold .5-1 IOPS/GB, Silver .25 -5 IOPS/GB, Bronze .05-.15 IOPS/GB as shown below
VMAX CE Storage Offerings

VMAX CE Storage Offerings

As with any movement towards “cloud” based consumption there are two aspects: a technology enablement and a business process change. The technology enablement on the VMAX CE comes in the form of a portal, service catalog, REST API, chargeback style reporting and multi-tenancy. From a business process perspective, it allows customers to purchase the array on a $/GB basis instead of worrying about the cost of individual components such as drives, engines, cache, etc. To go from XXTB to YYTB on a VMAX may require and engine upgrade and its associated cost, but to go from XXTB to YYTB on a VMAX CE will carry a linear $/GB fee that is pre-determined based on the storage band regardless of what extra components are required beyond drives. Storage planning becomes a simple task of understanding the capacity requirements for each service level and immediately determining a cost — this is huge compared to how storage budgets are forecasted today. And lastly, by providing a service catalog (and REST API) for storage provisioning it allows customers to automate & orchestrate the storage tasks.

It is extremely important that customers turn their frame of thinking towards “service levels” and not RAID groups, # of 15K spindles, and so on to truly provide storage as a service. What will also be of tremendous help is when storage best practices white papers get re-written with service levels in mind and not specific storage configurations. For example, while a customer may buy a VMAX CE today, the Exchange 2010 whitepaper best practices still speaks in terms of # of spindles and an “old school” methodology of storage architecture — not in Gold or Silver bands ala VMAX CE verbiage. This can be a challenge when trying to translate an application architecture into a storage requirements in the above model. There is no doubt this is the way of the future in storage, but we have a long way to go until it’s status quo — Software Defined Storage is at its infancy.

Posted in EMC, storage, VMAX | Leave a Comment »

Is vSphere Data Protection the same thing as Avamar?

Posted by Vijay Swami on February 21, 2013

One of the common conversations in my customer base and among SE peers is around vSphere Data Protection and how it compares to Avamar. It is no secret that the latest incarnation of VDP and VDP Advanced have Avamar technology under the covers: in-line deduplication, variable length block & segment size, leveraging VADP and CBT, etc. EMC and VMware teamed up to bring Avamar technology into the VMware data protection portfolio, but the question is, is it the right solution for a particular data protection requirement? As always, the devil is in the details.

First its worth pointing out that VDP comes in one deployment scenario, a software only virtual appliance where as Avamar can be deployed as a SW only virtual appliance (Avamar/VE), or a hardware based solution (“full” Avamar).

Architecturally, VDP looks like this:

VDP Architecture

VDP Architecture

It is a virtual appliance that runs on an ESX host. One of the major pluses to VDP is the UI is integrated with the vCenter (web) client instead of being an external UI like Avamar. And let me tell you the VDP UI is simple, easy to use and very intuitive — this is a huge win for VDP IMO. The installation of VDP is also very straight forward through an OVF. It should also be noted however that VDP requires the vCenter web client and vCenter 5.1. It does not work with previous versions of vCenter and does not work with the “full” thick windows vCenter client. Under the covers both VDP and Avamar work very similarly and you can expect the same de-duplication rates since they share the algorithm.

VDP comes in two editions: VDP and VDP Advanced. The major differences lie in the configuration maximums, scalability and application level agents.

Each VDP appliance can be as large as 2TB where as each VDP Advanced appliance can be as large as 8TB. VDP supports a maximum of 100 VMs per appliance where as VDP Advanced supports 400 (ofcourse the actual number of VMs will vary based on VM size, dedupe rates, retention requirements, etc… but those are upper limits). Both allow up to 10 appliances per vCenter but each appliance is treated independently meaning there is no de-duplication across multiple VDP appliances even in the same vCenter. VDP Advanced also includes agents for application consistent backup of SQL 2008,2012 as well as Exchange 2003, 2007 and 2010. With standard VDP the only option is an image based backup at the VM level. And while VDP is free with vSphere Essentials Plus and higher, VDP Advanced carries a $1095/CPU list price tag (although it can be purchased bundled through some of the higher end suites).

Now with some of the feeds & speeds of VDP out of the way, here are what I see are the major roadblocks to adoption (especially in my customer base):

First and foremost, there is no mechanism to get the data off-site for diaster recovery or compliance purposes. While Avamar supports both off-site replication to a second set of HW or Virtual Appliance, as well as tape out for off-siting through the Avamar Extended Retention capability, VDP offers neither. There are probably a couple of some kludgy, error prone, labor intensive methods of restoring at a DR site such as by replicating the VDP appliance with array based replication, or backing up the VDP appliance via another backup program which is capable of doing tape-out, but none of these would be acceptable in any of my customer environments. The lack of a clean methodology for off-site recovery would be a deal breaker in almost for almost all of my customers. If you are backing up data that is critical to your business, I don’t see how this could be acceptable regardless of organizational size — and yet, VDP Advanced is being marketed towards companies with 200-300VMs — in my customer base this is HUGE.

Another drawback is in the application level consistency. As a reminder this is only available with VDP Advanced and currently there are only two applications supported — SQL and Exchange. While the SQL agent supports most of the features one would expect, the Exchange agent is lacking a very major one — granular level restore at the mailbox & message level. While you can do a restore of an individual mailbox or message with the Avamar Exchange plug-in, the VDP Advance Exchange plug-in only goes down to the Exchange database level — quite a nascence if all that is needed is a single mailbox.

The VDP backup job scheduling also has a limitations — all backups start at the same time. While the backup job frequency, retention, etc can be altered, there is only one backup window per day with VDP. What this means is that there is no good way to stagger backup jobs. In smaller environments this may not pose a problem, but any backup admin managing size will tell you it is one of the most critical components to keeping backups running smooth — customizing the backup start time of various heavy hitter servers to spread the workload. I see this as a potential point of concern for customers of any size & complexity.

I’ve also heard rumors circulating that VDP/VDP-A can be upgraded to “full blown Avamar” and I am being told that simply isn’t true. This could change in the future, but as of right now its not even in the realm of possibility… buyer beware. If this is an environment that will grow beyond the single 8TB appliance limit of VDP Advanced, and taking advantage of global deduplication across all datasets is desired, it is better to look at full Avamar from the beginning. The deduplication across the entire dataset will bring large efficiencies to the compared to simply deduplicating in discreet 8TB silos.

With all of those things said, I don’t want to make it seem like I feel VDP is a bad product — far from it. It does many things very well: image level backup with file level restore capability; UI integrated right into vCenter; ease of use is a 10 out of 10, and I cannot stress this enough; installation is a snap and the functionality it does provide it works very well AND I feel it is a good value — but as always, its about matching up the requirements with the solution and its important to be aware of the limitations of any product.

If the draw backs outlined above are not a requirement, I would fully recommend giving VDP/VDP-A serious consideration. But while it may have Avamar technology under the covers, it is NOT Avamar.

Posted in backup, EMC, Virtualization, vmware | 1 Comment »

New (virtual) home for the blog

Posted by Vijay Swami on February 16, 2013

As you may have noticed, I have made couple of small changes. Namely, the address for this blog is now http://vjswami.com instead of the old wordpress.com address. And secondly, I have changed my twitter handle to @vjswami.

Nothing else is changing in terms of content, etc. I’m working on some articles based on some recent work & research I’ve been doing that I hope you will find useful.

Cheers!

/vijay

Posted in misc | Leave a Comment »

A study in VMAX & VNX auto-tiering

Posted by Vijay Swami on December 17, 2012

One of the major differences between a VMAX and VNX are the pooling & FAST-VP (auto-tiering SW) implementations. As more and more VNX customers are considering VMAX systems (thanks largely to the introduction of the VMAXe/VMAX10K price point) these differences are often a topic of conversation. There are some noticeable differences in the theory & operation and subsequently the real-world management which are worth understanding.

Data Movement Frequency & Data Movement Granularity:

The most obvious difference in the FAST-VP implementation between the two systems is how often data relocations can occur & the granularity of that data movement.

  • VMAX: data is collected continuously, analyzed continuously and can be moved continuously. The granularity of this data movement can be as small as 12 Symmetrix tracks or 768K
  • VNX: data is collected continuously, analyzed once an hour, and moved once per 24hrs. The way to think about auto-tiering on a VNX is that the system builds a “plan” based on a data collection window and then executes on that plan once every 24hrs during a specified relocation window. The granularity of data movement is 1GB

It’s important to take note of this major architectural difference between the two systems. The VNX the auto-tiering is designed as more of a slow moving interface and for this reason it should be considered mandatory that all VNX systems utilizing FAST-VP also have the appropriate amount of Fast Cache capacity. Fast Cache operates at 64K granularity and allows instant promotion of data. In the event there were some cold data living on NL-SAS that suddenly became hot, the VNX can immediately promote (or queue for promotion) it to Fast Cache to absorb that incoming workload spike while FAST-VP would come around on the next relocation window to decide if it makes sense to actually up-tier it permanently into the EFD pool drives. If that same scenario were to occur on the VMAX, the data would be promoted (or queued to the DA for promotion) almost instantly to the EFD tier. This is why the VMAX does not need the Fast Cache feature at a fundamental level (gobs of engine cache help too!)

Nerd knobs A.K.A. The “tunability” of FAST-VP across the two systems:

  • VMAX
    • Performance Time Window: when is the relevant time window to look at the data, usually set to 24/7/365 for continuous analysis
    • Data Movement Window: when can data be moved, usually set to 24/7/365 for continuous data movement
    • Workload Analysis Period: used to tune how to aggressively to decay “older” IO compared to “newer” IO for data promotion/demotion ranking purposes
    • Initial Analysis Period: on a newly created LUN, how long is data collected before doing the first movements
    • FAST-VP Relocation Rate: a number from 1 to 10 on how quickly FAST-VP should move data
  • VNX:
    • Data Relocation Rate: Low/Medium/High
    • Data Relocation Schedule: which days of the week & during what window should relocation run

It should be clear which system has a more “end-user tunable” FAST-VP system. To be fair, every customer does not tweak every single FAST-VP knob on the VMAX or any for that matter. The defaults or “best practices” (and how I hate that term!) actually work very well for the majority of the customers.

Taxonomy of storage pools & LUN Configuration Elements:

VNX:

The VNX has 4 main configuration elements as it pertains to this discussion:

  • the physical drives
  • the storage pool
  • the LUN
  • the tiering policy of the LUN
VNX Pool Abstractions (Annotated from pg 11 of VNX Virtual Provisioning White Paper)

VNX Pool Abstractions (Annotated from pg 11 of VNX Virtual Provisioning White Paper)

Based on the above, lets examine how to create some auto-tiered storage on the VNX.

Take the EFD/SAS/NL-SAS drives, create a pool:

VNX Create Pool

VNX Create Pool

Create a LUN in that pool:

VNX Create LUN

VNX Create LUN

Set the tiering policy on that LUN:

VNX Select Tiering Policy

VNX Select Tiering Policy

The tiering policies explained:
  • Start High Then Auto: place all the slices on the highest tier with available capacity (in this case EFD) and then examine the performance as time goes on and move the slices into the correct tiers
  • Highest: place the slices in the highest tiers possible and remain there, with the busiest slices in the highest tier
  • Auto: place the slices in the right tier based on performance thresholds, but initially slices are distributed throughout all 3 tiers (so some slices end up on EFD, some on SAS, and some on NL-SAS)
  • Lowest: place slices on the lowest available tier (in this case NL-SAS)
Assuming the first tiering policy (the recommended one for the majority of workloads), the LUN will have it’s slices start out on the highest tiers and over time auto-tier as appropriate.

VMAX:

The VMAX configuration elements are slightly different due to pooling architectural differences:

  • disk groups: collection of like type disks (I.E. 200GB EFD)
  • Virtual Pools: pooled storage capacity formed from disk groups
  • FAST-VP Tiers: association of a tier with the previously mentioned pooled capacity; multiple pools of like drive type & RAID protection type can be associated with a single FAST-VP Tier
  • FAST-VP Policies: auto-tiering policy specifying how much of each tier can be utilized I.E. 100/100/100 would tell the system that 100% of the EFD, 100% of the FC and 100% of the SATA tier can be utilized for a given LUN/Storage Group
  • storage groups: collection of host LUN
VMAX Pool Abstractions

VMAX Pool Abstractions

On the VMAX each type of drive is associated with a disk group:

VMAX Disk Groups

VMAX Disk Groups

Then virtual pools can be created from each of the drive types. A RAID protection and capacity are specified for each pool (can be full, or a subset of the entire system capacity based on the disk groups):

VMAX Pools

VMAX Pools

Next FAST-VP Tiers need to be created and associated with the Virtual Pools. Typical would be an EFD Tier, a FC Tier, and a SATA Tier. The example below shows the creation of an “EFD” FAST-VP Tier.

Creating a VMAX FAST-VP Tier

Creating a VMAX FAST-VP Tier

VMAX FAST-VP Tiers

VMAX FAST-VP Tiers

All 3 tiers have been created above.

Next comes the FAST Policy. This determines the % of each tier (by capacity) that a LUN can occupy. 100/100/100 is the ideal policy as it basically tells the system “I’m not placing any restrictions on the tier percentages, you decide the best place to land the data”. This gives the system full control and if the VNX had a FAST-VP Policy parameter, this would be the setting (100/100/100):

Creating a FAST Policy

Creating a FAST Policy

idealP FAST Policy Created

idealP FAST Policy Created

Alternatively, different policies can be created such as 20/30/50 which tells the system at most 20% EFD by capacity and 30% FC by capacity can be utilized and the rest needs to live on the SATA tier. So again, the FAST Policy on the VMAX allows plenty of “nerd knob” tweaking if desired. Other uses for this would be in multi-tenant or storage as a service model where internal/external customers are paying different $/GB rates based on expected SLAs.

Next a storage group and host devices (LUN) must be created. This is as expected, but one item that needs to be specified is the initial pool binding. Meaning, which virtual pool to associate the LUN with initially. The best practice is to choose the middle / FC tier for this. Note that if  thin provisioning is used, no space is actually occupied in pool until host writes are sent. Additionally, there is a setting in VMAX 5876 code which allows new writes to land on a tier that FAST-VP decides is best based on the data collected on host IO activity. With this setting, FAST-VP may decide to land the new write on SATA even though the initial binding was on FC if it deems appropriate. This avoids tracks landing on FC first, only to be moved down to SATA later (or up to EFD). This is a system wide setting called “allocate by FAST policy” and the recommendation is to enable it unless there is a good reason not to.

VMAX Create Storage Group

VMAX Create Storage Group

VMAX Create Storage Group

VMAX Create Storage Group

Next the storage group has to be associated with a FAST Policy:

VMAX Associate a FAST Policy

VMAX Associate a FAST Policy

… and now the LUN is auto-tiered via the pools created utilizing the FAST Policy specified (100/100/100 in this case).

The FAST Policy can be changed anytime on the fly. For example it could be changed from 100/100/100 to 20/30/50 or any combination based on business needs. This gives a lot of flexibility in the management of the performance & capacity of the array.

To summarize the data movement process between a VMAX and VNX as it pertains to auto-tiering:

-VMAX: TDEV (LUN) is bound to a pool/tier (best practice FC unless low workload); after the Initial Analysis Period performance metrics are analyzed; extents are marked for promotion / demotion; data movements queued up on the DA (disk adapters); TDEV remains bound to the pool it was originally bound (for statistics purposes) regardless of where the tracks live;new host writes behavior depends on the “allocate by FAST Policy” setting.

-VNX: LUN created in a Pool; initial allocation determined by Tiering Policy of LUN; data collected immediately, analyzed every hr, and if necessarily slices will be moved during next relocation window once every 24hrs

Other notable behavioral differences:

  • on a VNX, extents are only moved down to lower tiers if space is needed on higher tiers to accommodate up-movement. On the VMAX, extents will be proactively moved down if the system deems them of lesser performance regardless of available capacity in higher tiers.
  • during replication the VNX on the target side has no information on FAST-VP movements that have occurred. The VMAX does have a feature called “FAST-VP RDF Coordination” which when enabled at the storage group level will allow the source VMAX to communicate with the target VMAX on FAST-VP movements, and thus the R2 volumes on the target side will have data located on the appropriate tier as per the R1 workload. Note: This is for SRDF ONLY, and does not work in Recoverpoint Environments (as of today)
  • in environments where block side compression is being utilized, the VMAX can automatically compress inactive data under FAST-VP management with a toggle of between 40-400 days of inactivity. The compression on the VNX is more manual in nature with a simple enable/disable mechanism.

Summary:

Its important to understand the differences in pooling & FAST-VP between the two systems. The VMAX offers -much- more flexibility in how the pools & FAST-VP can be configured, however not all customers require it. The reason for many of the differences is due to how much more global memory & processing power a VMAX has compared to the VNX. This is especially true of the VMAX40K and the new VMAX10K “989″ engines.

Comments/questions welcome.

Posted in EMC, storage, VMAX, VNX | 5 Comments »

WTF (What The FEX) are you talking about?

Posted by Vijay Swami on November 10, 2011

FEX, or Fabric Extender technology is a core part of Cisco’s DC strategy. There are multiple marketing FEX terms that mean different things, and I’ve seen much confusion from customers & peers alike regarding these terms. There are four main FEX terms: ToR-FEX (also called “Rack-FEX”), Blade-FEX, Adapter-FEX and finally VM-FEX.

Before continuing, it would be helpful to get a background on what FEX actually is… read about FEX here.

ToR-FEX (“Rack-FEX”):

This describes utilizing Nexus 2K FEX at the top of each rack, connected to Nexus 5K/7K upstream. The server adapter port connects to the FEX and the port shows up on the upstream switch as if it was directly connected to it;  the FEX is a virtual line card in the switch, extending the fabric.

So, ToR-FEX/”Rack-FEX” = Nexus 5K/7K + Nexus 2K:

1 logical (VPC) link connects the 2K to the 5K, and each of the servers appear as if they are directly plugged in as veth interfaces are created for each of the physical adapters.

Blade-FEX:

In the UCS chassis, there are a pair of IOMs that handle the communication from the blades to the fabric interconnects; these IOM provide very similar FEX capability that is found in the Nexus 2K. The I/O flows from the blade mezz card, through the chassis backplane, to the IOM (FEX), and from there to the fabric interconnects at the top of the rack. Very similarly to the ToR-FEX/”Rack-FEX”, the IOM extends the fabric and the adapters on the blades show up on the fabric interconnects as vethernet interfaces, as if the IOM was a line card in the fabric interconnects themselves.

So Blade-FEX = UCS 6K (61xx/62xx) + UCS 2K:

A very similar logical diagram to the rack fex, except in a blade chassis. The UCS 2K is contained in the chassis, and the blades have a back plane connection to the FEX (IOM) instead of a wire. One logical (VPC) connection (supported with 62xx HW) extends the fabric up to the 6K, and logging into the 6K you can see the individual ethernet and fc interfaces of the blades.

Adapter-FEX:

The term Adapter-FEX is used to describe the the act of virtualizing a physical adapter on a server (blade or rack) and having those virtualized adapters appear to the upstream Nexus switch as if they are physically connected to it. Hence the “fabric extension” is happening from the adapter to the upstream switch, hence the term “Adapter-FEX”.

Now, there are two variants of Adapter-FEX — Adapter-FEX blade, and Adapter-FEX rack, applying to Cisco’s B-series (blade) and C-series (rack mount) servers equipped with the VIC:

So Adapter-FEX rack = VIC card + Nexus5K (one possibility, other combos are possible):

OR

Adapter-FEX Blade = VIC card + UCS2k + UCS6k:

The Adapter-FEX allows each server/blade to create multiple vNIC/vHBA and have them appear on the upstream device as if they are directly connected by showing up as veth or vfc devices.

 

VM-FEX:

VM-FEX is built on top of Adapter-FEX and is the ability to have control plane integration between the vSphere networking layer and the server networking What do we mean by that?

There are two types of virtual interfaces: static and dynamic. Static vNICs are what an vSphere administrator would create (for service console, vmotion, etc). But as virtual machines are created, a dynamic vNIC is also created by UCSM and associated with the proper port group. This vNIC also shows up in the upstream switch as if its directly connected. So each virtual machine has a vNIC which is created and shows up on the upstream device, just like if there were a physical server plugged into a physical port. It’s all about providing a unified methodology to managing virtual & physical assets.

So, VM-FEX = Adapter-FEX + vCenter networking control plane integration via UCSM.

In other words: VIC card + UCS2k + UCS 6k + vSphere integration via UCSM (blade). The key is UCSM talking to vCenter.

The above shows the VMFEX scenario for the blade, but the concept for rack servers is identical. There is control plane integration between UCSM and vCenter such that when a new VM is created a new veth (for each vNIC) is also created automatically on the upstream device, making it seem like the VM is connected physically to it. This is in addition to any virtual adapters at the hypervisor level (such as vHBA for storage, or static vNIC for hypervisor networking).

Note: as of UCSM 2.0, VM-FEX is also supported in KVM environments.

There is an analogous rack methodology, but I don’t see it used often, and have never actually seen it implemented. Most customers I see building large VMware environments are doing so with B-series.

As we go further down the virtualization journey, these control plane integrations will become more and more prevalent, and perhaps even table stakes at some point. We have, for example, storage plug-ins for vCenter, and vCenter “awareness” in some storage GUIs, but how about more direct control plane integration for “other” storage-ish? Things that make you go…. hmmmm.

* Note: diagrams are not necessarily physical representations of full deployment scenarios. In most cases, only half the picture is displayed, there would be a second 2K, second 5K, etc.

Posted in Cisco, FEX | Leave a Comment »

Peeling back the onion on HP-FEX

Posted by Vijay Swami on October 24, 2011

Recently, HP and Cisco in collaboration released a FEX module for the HP C7000 chassis. See here and here to read about the release from both HP and Cisco’s perspective. This post is not to discuss the business decisions behind this product release, but rather to take a closer look at the HP-FEX architecture from a technology perspective.

First off all, what the heck is a FEX? Read here  and here for some background on the term.

Now, with that out of the way, lets take a look at the networking architecture when deploying HP blade servers.

HP’s leading interconnect architecture is known as Virtual Connect FlexFabric. There are two main components to this:

  • server profile virtulization: Virtual Connect Service profiles allow one to take attributes of a server such as WWNS, MAC addresses, FC boot parameters, etc and store them as a software construct, thus making the hardware itself “stateless”. The Cisco UCS analog to this would be Service Profiles. For a deep dive into the differences, see here
  • virtualizing the 10Gb adapter port: allowing one to present up to 4x NICs to the host OS with traditional Flex10 or 3x NICs and 1x FCoE with FlexFabric interconnects. Cisco’s analog to this would be their “VIC” card which allows one to create up to 256 vNIC and vHBA and present them to the host. There are some technical differences between Flex-10 and Palo, but that is not the focus of this post either. Plenty of information out there on that subject easily available via Google.

First, lets take a look at what a HP BladeSystem architecture utilizing Virtual Connect FlexFabric architecture could look like:

The components here are 1x C7000 chassis with 16 blades utilizing FlexFabric interconnects and intgrated FlexFabric LOMs which give 2x 10Gb CNas per blade. The bottom most diagram represents a logical view from the OS perspective of a single blade. FlexFabric allows the administrator to divide a single 10Gbps CNA port into 4 devices: 3 NICs and 1HBA or 4 NICs. In this case, we have chosen 3 NICs and 1HBA to illustrate the FC/FCoE case. The operating system sees a total of 8 devices, 4 per CNA port. The OS communicates with the CNA as if it they were traditional NICs and HBAs. The FlexFabric LOM then combines these the NICs and HBAs into a FCoE stream and sends it through the midplane of the chassis up to the FlexFabric interconnects. The FlexFabric interconnects then split the FCoE traffic into their traditional Ethernet and Fiber Channel via seperate ports and send them upstream out of the chassis. In this case, a pair of Nexus 5Ks is used which has the ability to house both LAN and SAN ports. This Nexus switch could also uplink into a “core” LAN/SAN. Many architectures are possible upstream. Note that while the LAN connections are cross connected between switches, the SAN connections are *NOT*. This is because traditional fiber channel design relies on this “air-gapped” connectivity to maintain 2 separate fabrics.

Let’s contrast this with a HP BladeSystem deployment utilizing the B22HP-FEX:

This block diagram is very similar. The bottom most figure represents a logical view of a blade from an OS perspective. Unlike the FlexFabric configuration, when utilizing HP-FEX, the administrator does NOT have the option of creating 4 individual devices per CNA port. It defaults to a “regular” CNA adapter presenting one NIC and 1 HBA per port. The administrator will have to use other means of providing QoS since all the LAN traffic will travel through a single interface on the OS side. The classic example is creating multiple interfaces for VMware deployments — service console/VMotion, Production VM, backup etc. Another notable difference is the traffic is FCoE out of the chassis, where as in the FlexFabric design, it was getting broken out into its LAN/SAN counterparts. In this example I used the same number of ports for the upstream connectivity. The B22HP-FEX talks FCoE to the upstream 5Ks, which can then connect into “core” LAN/SAN infrastructures in larger deployments.

Notable differences between the architectures:

  • in the FlexFabric deployment, you have the option of creating up to 4 interfaces per CNA port. On the FEX design, you do not have this capability.
  • the service profile features offered by Virtual Connect is available in the FlexFabric deployment, but not in the B22HP-FEX deployment. This is a big deal since one of the major selling points to a HP BladeSystem is the ability to utilize Virtual Connect to abstract away the server hardware.
  • in the FlexFabric deployment, you have to decide up front how many Ethernet and Fiber Channel connections you want upstream of the chassis. In the FEX design, since the traffic leaving the chassis is FCoE, you do not have to make physical wiring changes in order to allocate LAN/SAN bandwidth — it can be done via SW in the upstream Nexus 5Ks
  • both the FlexFabric interconnects and B22HP-FEX offer 2:1 oversubscription — meaning there are 16 ports downstream, 1 per blade; and 8 ports up stream or .5 per blade. However the ability to utilize vPC in the FEX on all the links allows MUCH better utilization of the links. Because some (2) of the FlexFabric connections will be chewed up for chassis interconnects to create a single virtual connect domain, you actually have a higher (worse) over subscription ratio in the FlexFabric case.
  • from a points of management perspective, the B22HP-FEX interconnects are not managed individually. They act as remote line cards in the 5K (just like the standard Cisco 2000 series FEX). Each FlexFabric interconnect (pair) on the other hand is a point of management

The lack of blade profile virtualization is a MAJOR downside to utilizing the FEX in HP BladeSystem. I don’t think anyone will argue that the FEX based network architecture is cleaner and simpler ESPECIALLY at scale; but customers will have to choose between a superior network arcthiecture, or the benefits that come along with blade profile virtualization…. unless they decide to go with Cisco UCS, in which case they can have both. ;)
That being said, there are clear advantages and disadvantages to going with either design, so its going to be up to the customer to decide what is more important to them.

Posted in FEX, HP | 8 Comments »

Getting the VMware VSA running in a nested ESXi environment

Posted by Vijay Swami on August 17, 2011

In the previous VSA article we took at a look at the storage architecture of the appliance, as well as some of the caveats and considerations when deploying it. In this article, we’ll take a look at how to get it up and running in a nested ESXi environment as well as some of the functions the VSA provides.

First, in order to create a nested ESXi 5.0 environment, have a look at this great article.

When creating your environment, my recommendation is to create 4 individual vDS port groups or individual standard 4 vSwitches for the environment. You will assign each to a vNIC of the vESXi host to simulate connecting each pNIC to a physical switch in a real deployment.

Be sure to configure the vSwitches (or vDS port groups) with promiscuous mode enabled and create 2 vESXi VMs with 4 NICs minimum and a SINGLE VMFS volume (this is important or else the VSA will not install). I recommend a thin provisioned volume of about 200GB for testing.

You should end up with something like this:

Same applies if you are using standard vSwitches in your environment.

Now you need a Windows based vCenter 5.0 instance to manage this environment. Install the VSA manager software onto that vCenter which will then expose the VSA manager plug-in/tab on the vCenter client once you click on a vSphere data center:

In normal installations, you would then click on the VSA manager tab and follow the instructions to install. The problem is that since we are installing in a nested vESXi environment, “EVC” does not work with nested vESXi and is a requirement the installer checks for, thus you will not be able to proceed:

Thus far I have not been able to find a workaround for this for the GUI based install. However, after lots of lab time I found there is a way around this problem: in order to install the VSA in nested ESXi and bypass the EVC requirement, we need to tweak a configuration file and then do the installation via command line. Download the full zipfile which includes the command line installer if you haven’t already and unzip that onto your system.

Here is the minimum syntax to get it going:

install.exe -u root -p <password_to_ESX_hosts> -si <start_address_for_VSA_front_end_IPs> -nh

Recall that the VSA has a front-end network and a back-end network. The “-si” switch tells it what public IPs to use for the front-end. You can specify a “-bs” start range for the back-end IPs, but it will default to 192.168.0.1 as the start range if you do not specify anything. You can also specify netmasks and VLANs. See the manual for details.

The “-nh” tells it not to join the hosts into a high availability cluster and this will be important to help bypass the EVC check. If we execute this command this will be the result:

As you can see the automated command line installer runs an audit stage and it fails for the same EVC reason!

Well, after much lab time, I figured a way around this problem. We need to change a parameter in C:\Program Files\VMware\Infrastructure\tomcat\webapps\VSAManager\WEB-INF\classes\dev.properties. Search for this line:

evc.config=true

and change it to

evc.config=false

This will effectively bypass the audit check for EVC. Cool huh?

Now re-run the install.exe command, and it should complete:

And you end up with this in your nested ESXi environment:

The result is 2x 100GB data stores, which correlates with each VSA having 200GB of RAW storage, for a total of 400GB RAW or 200GB usable after RAID10 internal to the VSA.

 

Here is a peak at the networking the VSA installer sets up:

There are front-end and back-end port groups that live on separate vSwitches and pNICs. You are now free to customize the networking however you see fit, but it HAS to have a default configuration starting out or else the install WILL fail.

Now that the VSA is installed, you can continue to manage it through the VSA plug-in in vCenter. We only needed to do the hack and command line to get it up and running. Again, its important to note this would not be required in a real installation, it was required due to the limitations of nested ESXi.

End result:

In the next article, we’ll take a look at some administrative tasks, and testing out some of the failure scenarios and how the VSA handles it from a downtime/uptime/reliability perspective.

Posted in storage, storage virtualization, vmware | 6 Comments »

A closer look at VMware’s Virtual Storage Appliance 1.0 (VSA)

Posted by Vijay Swami on August 15, 2011

One of the new products which accompanies the vSphere 5.0 release is the Virtual Storage Appliance. The purpose of this product is to allow customers to utilize the local disks on the ESXi hosts in order to create a shared storage environment for their virtual infrastructure, thus being able to take advantage of the advanced features such as HA and VMotion which are reliant on shared storage. The idea behind this is to avoid the costs of a hardware based SAN/NAS system to allow SMB customers to implement vSphere and its advanced features at a more attractive price point.

VSA Cluster Storage Architecture:

vsa_arch

vsa_arch

The VSA cluster is two (or three) VMs that run in the ESXi environment. Depicted above is the architecture from a storage perspective, and its important to understand the levels of abstraction and how we finally arrive at a shared storage resource.

The the very bottom of the stack, is the physical ESXi host (physical server) which houses the local hard disks. Presumably, there is some kind of hardware RAID capability in this server either as a function of the BIOS or a RAID card which takes all the disks and combines them together using RAID protection to give a local volume. VMware says that RAID10 is a requirement here, but this is not a hard and fast requirement as far as I can tell — more on that below.

You then install ESXi onto this local volume and by doing so format it with the VMFS file system. When you install the VSA, the installer uses the remainder of the disk space not taken up by the ESXI install and the VSA VM itself for the “shared storage” capacity and presents that to the VSA VM as series of VMDKs which the VSA VM combines using a LVM to form a primary & secondary volume. The VSA VM runs an NFS server and exports this volume back to the ESXi host. Each VSA VM does this, and hence you end up with 2 NFS volumes (in a 2-node cluster): VSADs-1 and VSADs-2. Just a little bit of inception going on here! :) Its important to note that only half he space is actually exported as a NFS volume due to RAID10 protection.

To elaborate a little on the the primary and secondary volumes in the VSA VM — remember that each volume exported by the VSA VM is protected via RAID10. So one half of the VSADs-1 RAID10 mirror lives on VSA1 and the other half lives on VSA2. In this way, the environment can tolerate disk failure as well as node failure and still remain operational thanks to the RAID10 protection. What I haven’t been able to dig into yet is the replication mechanism for keeping the primary/secondary volumes in sync. I suspect it might be something like DRBD (not verified, just a guess).

Now that we better understand how the VSA works under the covers, its important to note that there are a number of considerations and caveats to be aware of when deciding to utilize the VSA:

  • the VSA manager (server side of the plug-in which allows you to manage the VSA) needs to be installed on a Windows based vCenter server. this means you cannot utilize the vCenter Appliance (VCA) as it is Linux based. To me this is definitely a downside as the VCA is extremely easy to setup (done via OVF and can be up and managing an environment in minutes) and perfectly suited to SMB environments with its internal database. Hopefully this can be addressed in future releases. Looking at the VSA manager, it looks to be all tomcat/java based, so there is no reason it cannot run on a the Linux based VCA
  • when setting up the VSA, each ESXi host must be a fresh install with no virtual machines running on it. further more, each ESXi host must have only the default vSphere standard switches or port groups. you cannot create any additional switches or port groups. once the VSA has been setup, you are then free to modify the networking
  • the ESXi hosts must be on the same subnet as the vCenter server
  • the ESXi hosts must not be in another HA cluster. the VSA setup utility sets up its own HA cluster for the environment
  • maximum supported hard disk capacity per ESXi host is 64TB
  • there are specific requirements around networking: each ESXi host requires 4 NIC ports minimum, and you require 2 VLANs (one for front-end and one for back-end traffic)
  • 72GB of RAM is the maximum supported & tested RAM configuration with the VSA
  • memory overcommit on VMs is not supported when utilizing the VSA. VMware’s reason for this is because if swapping occurs, there could be severe performance slow down. I don’t necessarily agree with this, as if you put enough spindles in the local host, it should not be an issue. But again this is VMware’s official support statement.
  • VMware says you should have 8 or more hard disks in RAID10 in the ESXi hosts. I see no reason why you could not utilize RAID5 or a different hard disk count. In fact, in my testing, I did not utilize any “local RAID” per se as I was running in a nested ESXi environment, and the actual LUN was utilizing RAID5 on the back-end in a FAST-VP pool. I suspect that VMware recommends a minimum of 8-disk RAID10 on the hard disks for performance reasons. But there is no reason why you wouldn’t treat spindle count on the ESXi hosts’ local drives just like you would for sizing a SAN LUN for traditional environments. Not enough spindles = performance issues no matter if they are local disks or SAN disks. But again, this is VMware’s official support statement requiring RAID10 and a minimum of 8 disks.
  • the VSA mirrors the data utilizing RAID10 (a primary and replica volume each on different hosts). this is not configurable, so plan on this from a capacity perspective. If you have 8 disks in your ESXi host doing a RAID10 giving you a volume of 1TB, and you have 2 hosts for a total of 2TB — you will end up with 1TB of usable capacity in your environment. in VSA1 500GB will be primary, 500GB will be secondary, and similarly for VSA2.
  • the VSA exports the volumes as NFS; there is no support for iSCSI
  • if you are running vCenter as VM, it CANNOT be running on the hosts participating in the VSA cluster

Next, we will look at how to get the VSA up and running in a nested ESXi environment and following that some general tasks and see what is/is not possible with the VSA compared to traditional shared storage as well as how it handles some failure scenarios.

Posted in storage, vmware | 4 Comments »

Simplifying SAN management for VMware Boot from SAN, utilizing Cisco UCS and Palo

Posted by Vijay Swami on May 31, 2011

One of the great features of the Cisco UCS is the Palo or Virtual Interface Card (VIC). When utilizing this card with UCS, it allows the administrator to create many virtual NICs (vNICs) and virtual HBAs (vHBAs) (up to 128 with some limitations). In a VMware environment, the use of vNICs is well understood — you can create individual vNICs for service console, vMotion, VM network traffic, IP storage traffic, and so on. You can then apply QoS policies to them to guarantee service levels. Additionally, you have the ability to utilize dynamic vNICs and Pass-Through-Switching which bypasses VMware’s vSwitch and dynamically assigns vNICs to VMs as they are created. The benefits to creating vNICs is clear, but how about vHBAs?

At first glance, it doesn’t seem that useful to create more than 2 vHBAs (one per SAN fabric); and after all this is something that you can do with the standard UCS mezzanine cards from Qlogic and Emulex. There is one use case where the ability to create more than two vHBAs comes in handy — that is boot from SAN in VMware environments. This applies equally to boot from SAN servers in other clustered environments, but I will be using VMware to illustrate this design option, with EMC’s midrange Clariion/VNX storage.

Read the rest of this entry »

Posted in Cisco, EMC, storage, UCS | 3 Comments »

FCoE’s impact on a Storage Administrator

Posted by Vijay Swami on May 30, 2011

As FCoE is gaining more traction and moving from a “vision” to a real consideration for many customers, one of the most common question I get from CxOs is: “I understand the benefits of FCoE in my datacenter, but how it will impact my storage team? Will they need to invest significant amounts of time  new methodologies, commands, concepts, etc when administering the storage network?”

Read the rest of this entry »

Posted in Cisco, FC, FCoE | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 773 other followers

%d bloggers like this: