Quantcast
Channel: Hyper-V forum
Viewing all articles
Browse latest Browse all 8743

Hyper-V 2012R2 FO Cluster with synthetic Fibre Channel guests - Quick Migrate ok, Live mostly not

$
0
0

Hi,

another post with more or less the same issue as numerous other posts, for example:

http://social.technet.microsoft.com/Forums/en-US/a09e56fc-f952-427d-8dc5-53fbd1c3ca38/live-migration-failed-while-quick-migration-is-ok?forum=winserverhyperv

http://social.technet.microsoft.com/Forums/sharepoint/en-US/822bb097-e7ff-47ac-ad83-c40174e7b441/live-migration-failed-while-quick-migration-is-okvirtual-machine-with-synthetic-fc-hba-?forum=winserverhyperv

http://social.technet.microsoft.com/Forums/scriptcenter/en-US/51ddfc57-4250-4553-9592-1702ad87c12a/live-migration-failed-using-virtual-hbas-and-guest-clustering?forum=winserverhyperv

Mentioned post were of no heklp though. In this setup I have 2 2012R2 Hyper-V boxes in a Failover Cluster and a HP MSA2040 SAN. Within this cluster I have 2 2012R2 guests that are also clustered. They have 2 Synthetic Fibre Channel Adapters, each one connected to another physical port on the host. Quick Migration works any time, Live Migration only once in a while. I get messages like:

'<GuestOS>' Synthetic FibreChannel Port: Failed to finish reserving resources with Error 'Unspecified error' (0x80004005). (Virtual machine ID 8AABB243-7333-4729-A060-120EF8E993A7)

Facts:

  • Basic stuff like NPIV support is ofcourse enabled and all HBA's and switches support this;
  • MPIO is setup accordingly to HP recommendations;
  • The hosts have 2 fibre ports each connected to a seperate fibre switch;
  • Guest VM's have 2 Fibre Adapters, each connected to a seperate Virtual SAN;
  • Initially I had all WWPN's defined in one host on the SAN, as per a recemondation in one of the mentioned threads I split that up and made a seperate host on the switch for each WWPN set. As the switch only cares about WWPN / WWNN's rather than the aliases they are 'bound' to this did not help;
  • Virtual SAN's are equally setup on the hosts, and each matching Virtual SAN is connected to the same Fibre Switch so physical pathing should be ok;
  • Zoning is correct; all 4 wwpn's are added to zoning and can access the luns (if not it would not work 'sometimes');
  • When I start the live migration, the 'not-active-WWPN-set' is immediately presented at the switches and the switches see the respective WWPN's as active. At that point all 4 WWPN's of the guest are active according to my switches. The switches also recognize these WWPN's as 'NPIV' WWPN's;
  • When I stop the guest, and manually switch the addresses of set A and B over and start again, I see all my storage. This again to make sure zoning is ok and all storage is available on all 4 WWPN's;

Now the point of interest: The MSA2040 seems to be rather slow to recognize or 'discover' as HP calls it presented WWPN's. This is the same for physical hosts as well as virtual hosts. Sometimes it takes up to a minute or two before a host is actually discovered by the SAN. Then it presents the configured LUNs to it and that step again takes a while. I think this is the base issue here. When I do a live migration all 4 WWPN's are available but the storage only discovered 2 of them. At the sporadic times Live Migration DOES work, the SAN (ofcourse) discovered the WWPN's and as soon as the LUN's are presented to them, the migration immediately continues as expected.

The same happens when starting a guest with Synthetic Fibre Channel Adapters: Starting such VM immediately presents the WWPN's at switchlevel, then 'stalls' at 10% for about a minute and then continues booting. Sometimes the SAN discovers the WWPN's faster, and as soon as the LUN's are presented the booting continues and everything is ok. However most of the times the SAN is 'too slow' and after a timeout of a minute or so the VM continues booting. This results in the storage not being available in this guest until the SAN discovered the WWPN's.

The same happens with Quick Migration. With Quick Migration there is no requirement for the LUN's to be active and therefore it works. And again when the SAN is quick enough the storage is there immediately when the migration is done, when it's too slow the storage is not available after migration, until the SAN discovered the WWPN's again.

With Live Migration ofcourse there is a requirement that the LUNs are available to all 4 ports at time of migration. Unlike a quick-migration, Live Mgration will not continue when LUN's aren't available.

So I have 2 options here that I think would fix my issue:

  • Have the SAN to pickup new presented hosts faster;
  • Have Hyper-V to wait longer before a live migration times out.

The first is not an option, HP does not have any settings for this in the MSA2040 (and neither in the P2000 G3 which we have as backup as well in this setup).

So maybe we can increase the Hyper-V live-migration-wait-for-lun-timeout?

The concrete question:

Is there a regkey or something for increasing the 'waiting for storage' timeout on live-migration?

Ofcourse any other 'magical' solution to this issue is welcome :)



Viewing all articles
Browse latest Browse all 8743

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>