Posts Tagged ‘virtualisation’

I had been repeatedly receiving a vCenter alarm from two of our new hosts for the last 3 or 4 days, both reporting that vmnic0 had lost connectivity. The initial investigation confirmed that the physical NIC was up and passing traffic. A review of the host logs showed no signs of an error and the physical upstream switch had no record of the link going down.

Target: xxxx.xxx.co.nz

Previous Status: Green

New Status: Red

 Alarm Definition:

([Event alarm expression: Lost Network Connectivity; Status = Red] OR [Event alarm expression: Restored network connectivity to portgroups; Status = Green] OR [Event alarm expression: Lost Network Connectivity to DVPorts; Status = Red] OR [Event alarm expression: Restored Network Connectivity to DVPorts; Status = Green])

 Event details:

 Lost network connectivity on virtual switch “vSwitch0″. Physical NIC vmnic0 is down. Affected portgroups:”Vmotion”, “Management Network”.

The alert was reporting that the loss of connectivity was affecting two portgroups which didn’t even have this pNic as its active adapter. The portgroups that were set with this adapter as active were not listed.

It then became apparent that the alerts were being sent exactly 1 hour apart.  Smelling a rat I’ve restarted the vCenter service and so far these alerts have stopped being sent. I have yet to find a root cause for these erroneous alerts or any kb article that fits the problem but it was only occurring with the new IBM HS23E blades with ESXi 5 but not recently built HS22 blades.

Advertisement

An interesting issue popped up recently when we started to allocate LUN’s from our existing EMX CX-320 arrays to our Sansymphony SAN virtualisation servers.

At it’s simplist, it has to do with any storage array that implements active/passive controllers and trying to connect them to SanSymphony as backend storage.

I’ve got another post in the pipeline which details how we’ve used Sansymphony in our environment but I felt this is an interesting issue to highlight for those either using or going to use SanSymphony to virtualise mid range storage array’s like the Clariions that use active/passive designs.

For those people who haven’t had a lot of exposure to Sansymphony, you need to be aware that SanSymphony utilises it’s own FC HBA driver to implement various features that otherwise aren’t available in a normal Windows server HBA driver; like acting as a target. So because SanSymphony uses its own driver out of the box, other MPIO utilities like Powerpath aren’t able to be used on these storage servers. (Well not totally true but that comes later)

It also means that SanSymphony will actively try and use all paths to a backend target for active IO.In most cases active/passive arrays will signal that a passive path is up but when an IO request is sent down the path, the array will signal back with “Not Ready”. SanSymphony then appears to retry the command down the same path over and over. This leads to a situation where when you first try and add a new back-end LUN to SanSymphony you’re quite likely to discover that it can’t be deiscovered. Even worse is if for some reason you do manage to add it to a disk pool it could end up unavailable should the LUN be transitioned to the alternate storage processor.

According to the Datacore Tech bulletin 1302, the recommendation is to set aside HBA controllers to connect to these SAN arrays and use the array’s supported software/drivers instead of SanSymphony’s drivers on the back end port. By doing this you won’t be able to use those ports for any other SanSymphony function (such as front end or mirror ports). The only alternative is to only zone in and register paths to only one of the back end array’s storage controllers and set the host and LUN to auto trespass to that controller. I’d caution that this is really only a stop gap measure while you migrate volumes off of the array and not a permanent solution.

As always with SanSymphony it’s best to plan your connection options carefully at the start, messing around with drivers with a storage controller in production is asking for trouble if it’s done wrong.