There has been some talk lately about how the Cisco UCS connects into various networks and some of the “limitations”. I put “limitations” in quotes as I hope to explain why these are actually design considerations and not “limitations.”
This is an issue that hits home for me, as I am discussing UCS with customers on an almost daily basis and this is absolutely part of the conversation. The main question is, why can we not direct connect storage devices and other hosts into the 6100 since they have ethernet and FC ports? There are two aspects to this: a) why technically it is not feasible given the current rev of the HW/SW and b) why you would or would not want to do this when designing a solution in the first place.
For those not familiar, the UCS is the unified compute platform by Cisco. However, this post is not meant to describe the UCS system, rather to focus on connecting it to an existing or new customer network. That being said, some background on the UCS and the I/O flow is necessary…
The UCS is composed of 4 main components: the 6100 fabric interconnects, the 5108 chassis, the IOM/FEX modules, and the B200/B250 blades themselves. The blades “plug in” to the chassis, and communicate with the 6100 fabric interconnects via SFP+ connections from the IOM/FEX. The blades themselves are housed with CNA cards, so they carry FCoE traffic to the 6100 fabric interconnects from the FEXs. The 6100s have both 10G ethernet ports, and 4GB FC ports via the expansion module.
The diagram below illustrates the various flows and respective protocols.
As you can see, the blades plug into the chassis, and then communicate from the chassis via the RED I/O path to the 6100 fabric interconnects using the FCoE/unified I/O protocol. From there the unified I/O is split by the 6100 and the Ethernet traffic is sent to the Ethernet switches, and the FC traffic is sent to the FC switches. In this diagram, the uplink switch is one and the same for both Ethernet and FC because it is a Nexus 5000 based switch which can do both, however it could easily be a Cat 6500/Nexus 7K on the Ethernet side, and a MDS 9500 series FC switch on the SAN side.
The obvious question from here is, WHY do we need another intermediary switch for FC– namely, why can we not direct connect fiber channel arrays into the 6100? And why can we not directly connect FCoE capable arrays into the 6100 without the use of a Nexus 5000 as an intermediary? And if we had an IP storage based array (iSCSI or NAS), why can we NOT connect it directly to the 6100 via 10G-E?
The technical answer to these is the easy part. The more difficult question is WHY.
The FC ports operate in what is called NPV mode not full switched mode. To summarize, this uses software to “virtualize” all the initiator behind the environment and presents itself as an N-port to other devices in the fabric. It in essence looks like a host with many HBAs on the fabric instead of a switch (no FC domain ID). Secondly, the 6100s contain NO fabric services. This would be the zoning, FCF, aliasing that you would find on a traditional FC switch. For these reasons you cannot connect a storage array to this switch because none of the required functions exist.
On the ethernet side of the 6100, the intended use of the device is to operate in what is called “end host mode.” This is very similar to NPV mode on the FC side in that it looks like another “host” rather than a “switch” to other devices. It presents itself as a host with a bunch of MAC addresses behind it. So when connecting to an upstream switch, it simply looks like a host with a bunch of NICs. For this reason, you cannot connect other hosts to these ports, only upstream switches. The 6100s can also operate in “switched mode” on the ethernet side, meaning they can run a spanning tree protocol, learn upstream MACs, things you would expect from a traditional “ethernet switch”– however direct connecting IP storage (or any hosts for that matter) is still NOT supported. A port on a 6100 must be designated as an uplink port (connecting to an upstream switch) or a server port (connecting to a downstream FEX). Additionally, there is no layer 3 capability in the 6100 for the Ethernet side, so regardless they would need to connect to an upstream switch for routing purposes.
So understanding the technical limitations which prevent us from direct connecting devices into the 6100 is easy, but now the question… WHY?
The answer lies in the concept and architecture of the UCS. The UCS is designed to be a STATELESS EDGE device. This is the key. It is the same concept which moved layer3 capable ports from the edge to the core in traditional LAN design. Why have many costly ports (by costly, I mean both price, and performance) at the edge to maintain, when you can have “cheap” ports that have no intelligence and have just a few costly ports in the core of the network (albeit LAN or SAN) that perform all the decision-making. It centralizes management as well as reduces costs because you can keep the more plentiful ports at the edge as “dumb”, and the more expensive ports in the core where they can be minimized.
The UCS should be viewed as a large compute system, not a collection of individual servers or blades with interfaces. It is a system which serves compute to the infrastructure, and as such the 6100 should really be viewed as just a large I/O device that is passing storage and ethernet traffic to their respective networks. This is why its important to note that the 6100 IS NOT A SWITCH. It is a FABRIC INTERCONNECT– meaning, its meant to aggregate I/O from the compute nodes, and distribute them to the proper place in each network. This design plays very well in large deployments where maintaining the routing/FC zoning in very few core Nexus 7000s / MDS 9500s makes sense. There’s plenty of design options, which are beyond the scope of this post.
The problem however occurs in small 1-2 chassis/entry-level/SMB deployments. The additional devices needed to support the solution really drives the cost up. In these situations, it would be convenient to direct connect devices into the 6100s to keep the costs down, but again this really violates the concept of the UCS itself. Direct connecting hosts and performing Layer3/FC services in the 6100 is a design which just does NOT scale for large deployments. Imagine a full 40 chassis, 320 blade deployment and the amount of overhead hit the 6100s would take if they had to process layer3/FC services on all that I/O. Not happening. However, it would be nice to have a choice. Perhaps an SMB version of the 6100 which was an “all-in-one” device for smaller deployments (to avoid needing a Nexus 5K, or other 10G ethernet and FC switching devices), and the current 6100 design which exists solely for I/O aggregation and distribution. Or better yet, simply a software option to turn the ports from “dumb” to “smart” as discussed above. There would be surely a performance impact due to the extra processing overhead of layer3/FC services on each of the ports, so having a tunable option would be ideal, so the design could be based on the size of the deployment. There’s always pros/cons to every design, but giving the customer and architects a choice is always a great thing.
In any case, I hope this sheds some light on some of the “WHY” behind some of the “limitations” of the UCS. I have been told that some of this functionality is “coming”, but I don’t want to talk about futures here.