Hi Everyone,
Got something strange happening in our lab at the moment and was wondering if anyone had experienced the same thing (and maybe has a solution).
Our lab environment in a nutshell:
2x Windows 2012 Hyper hosts cluster connected to a "home-made" SAN based on Windows 2012 iSCSI target.
Each Hyper-V host has two 1Gbps network cards to connect to the SAN via the Microsoft iSCSI initiator, with MPIO in load-balancing mode (least queue depth).
The SAN (Windows 2012 server with iSCSI target) has 4x 1Gbps cards, teamed two by two, so presenting two IP addresses used by each hosts to connect to it (via MPIO).
The disk subsystem on the Windows 2012 SAN is an external HP storage works with 25x HP 500 SATA disks, connected to the server via an INTEL RAID controller with 2x 240GB SSD caching enabled for read/write.
The iSCSI network is on a dedicated HP switch, with flow-control and jumbo frame enabled (tested ok).
Now the problem:
I've built a few virtual machines on the two hyper-v nodes and I'm getting very bad disk response time as soon as there is an increase in the disk traffic.
When the virtual server is doing very little, I get a normal 6-8ms, but I soon as I increase the traffic (by for example copying a big file, or installing an application), this figure shoots up to 200ms, 300ms and more!
So I first thought that it was my disk subsystem (and the SAN server), but while the spikes are happening within the virtual machine, the disks on the SAN Server are sitting at about 10ms, with some spikes to about 20ms (which is pretty good and what I would expect to see within the VM because of the SSD cache).
I then thought it could be the network, but during those times of activity, the network does not get saturated at all. Barely 150Mbps to 200Mbps per link.
I even tried to disable MPIO and run everything across one Ethernet link, but still the same result.
Am I missing something here? doing something wrong? or is this expected behaviour?
Thank you,
Stephane