Good morning,
For us, the morning began quite bad. One of our cluster shared volumes went offline causing rougly 30 virtual servers to crash. In the cluster manager we could see that the disk was in a failed state and could not get it back online. We had to disconnect
the disk, reconnect and then add it as a CSV again to get things working. While that did not take much time to figure out as it had happened before, starting the servers and services for our production line did. I'm hoping that someone can point me in the
right direction.
We have four nodes running Hyper-V 2008R2. One of the nodes has a different CPU configuration as it was added later, but it has always ran just fine. The two cluster shared volumes are disks on our Dell EqualLogic SAN's (firmware V7.1.5 R408054) and are
connected to the hosts through Microsofts iSCSI initiator. We use Commvault for backing up our VM's and data and so far every time the CSV went offline it happened at 04:00. We've also contacted our backup partner to have them look into this, but I wouldn't
mind some thoughts from Windows/Hyper-V users rather then the people that manage purely the backup systems.
Some events on HOST #2:
Event ID 4096: The Virtual Machines configuration **** at 'C:\ClusterStorage\Volume2\****' is no longer accessible: Invalid handle (0x80070006)
Event ID 16400: '****' cannot access the data folder of the virtual machine. The worker process (Process ID Invalid handle) may not be functional
anymore. (Virtual machine ID ****)
Event ID 16410: '****' cannot access the data folder of the virtual machine. (Virtual machine ID ****)
Some events on HOST #4:
Event ID 3220: '****' is unable to save RAM contents at address 31494144. (Virtual machine ID ****)
Event ID 10101: Failed to change state of virtual machine '****'. (Virtual machine ID ****)
Event ID 12054: '****' failed to save state. (Virtual machine ID ****)
Event ID 16400: '*****' cannot access the data folder of the virtual machine. The worker process (Process ID Invalid handle) may not be functional
anymore. (Virtual machine ID ****)
Event ID 10102: Failed to create the backup of virtual machine '****'. (Virtual machine ID ****)
Event ID 14090: Virtual Machine Management service is shutting down while some virtual machines begin running. All running virtual machines will
remain running with no management access.
Event ID 14094: Virtual Machine Management service is started successfully.
Event ID 19500: The Integration Services Setup Disk image was successfully updated.
Event ID 4098: The Virtual Machines configuration **** at 'C:\ClusterStorage\Volume2\****' is now accessible.
Event ID 23014: Device 'Microsoft Synthetic Display Controller' in '****' is loaded but has a different version from the server. Server version
3.0 Client version 3.3 (Virtual machine ID ****). The device will work, but this is an unsupported configuration. This means that technical support will not be provided until this problem is resolved. To fix this problem, upgrade the integration services.
To upgrade, connect to the virtual machine and select Insert Integration Services Setup Disk from the Action menu.
Some events in the Failover Cluster manager:
Event ID 5121 (known problem with 2008R2 on our environment, every day the same messages): Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device
over the network through the node that owns the volume. This may result in degraded performance. If redirected access is turned on for this volume, please turn it off. If redirected access is turned off, please troubleshoot this node's connectivity to the
storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.
Event ID 1034: Cluster physical disk resource 'Cluster Disk 4' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '****'. If the disk was replaced or restored, in the Failover Cluster Manager
snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.
Event ID 1069: Cluster resource 'Cluster Disk 4' in clustered service or application '****' failed.
Event ID 1205: The Cluster service failed to bring clustered service or application '****' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Any additional information can be posted, currently still gathering more information myself.
Kind regards,
Dennis Lans