Morning all! I've got a cluster running with 5 Hyper-V servers managed by SCVMM 2012 R2. I'm running Server 2012 (datacenter), version 6.2.9200 on the nodes. Every now and then, 8 VM's in the cluster just randomly fail. If I look at the cluster log I see
a lot of these messages:
INFO [RCM [RES] SCVMM {servername} embedded failure notifciation, code=0 _isEmbeddedFailure=false _embeddedFailureAction=0
It logs 8 of these failure notifications every minute, for those 8 servers which randomly fail. At a certain moment, this is being logged (I've filtered out the other entries not containing the hostname)
[RCM] HandleMonitorReply: LOCKEDMODE for 'SCVMM {servername} Configuration', gen(0) result 0/0.
[RCM] HandleMonitorReply: LOCKEDMODE for 'SCVMM {servername}', gen(0) result 0/0.
[RCM] SCVMM {servername}: Flags 1 added to StatusInformation. New StatusInformation 1
[RCM] SCVMM {servername} Resources: Added Flags 1 to StatusInformation. New StatusInformation 1
[RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'SCVMM {servername}', gen(0) result 0/0.
[RCM] HandleMonitorReply: LOCKEDMODE for 'SCVMM {servername} Configuration', gen(0) result 0/0.
[RCM] HandleMonitorReply: LOCKEDMODE for 'SCVMM {servername}', gen(0) result 0/0.
[RCM] SCVMM {servername}: Flags 1 removed from StatusInformation. New StatusInformation 0
[RCM] SCVMM {servername} Resources: Removed Flags 1 from StatusInformation. New StatusInformation 0
[RCM] [RES] SCVMM {servername} embedded failure notifciation, code=0 _isEmbeddedFailure=false _embeddedFailureAction=0
When I open the eventlog (Hyper-V-Config, Admin) I see these messages for the servers which randomly fail:
The Virtual Machines configuration 276DACA8-351C-4FA3-BE9A-8BA5E5746600 at 'C:\ClusterStorage\Volume2\{servername}' is no longer accessible: The volume for a file has been externally altered so that the opened file is no longer valid. (0x800703EE).
In the Hyper-V-VMMS - admin event log, these messages are logged:
'{servername}' cannot access the data folder of the virtual machine. The worker process (Process ID 9556) may not be functional anymore. (Virtual machine ID 276DACA8-351C-4FA3-BE9A-8BA5E5746600)
It's not the first time this is happening and I need to find the root cause of this. The affected VM's don't have snapshots/checkpoints. I haven't found any usefull info on the net, can anyone help me finding the root cause of this? Thanks!