Part one of this series.
A storage environment requires work from both sides of the links for proper operation: the server side and the storage management side. A flaw from either side can be bad for both sides at the same time.
A virtual machine is configured with virtual hard drive(s) to store their data. It is highly common for a large number of virtual hard drives to share a common storage environment. This ends up being a requirement if you want to utilize any of the advanced migration functions that a particular vendor offers. The hypervisor ushers the storage requests (reads and writes) through a shared infrastructure to the storage environment for processing.
A highly contentious storage environment can wreak havoc over the overall performance of the virtualization environment AND any other system outside of the virtual environment that uses the shared storage environment.
Let’s take a quick look at a data warehouse environment for a moment. Data warehouses provide a high level of analytics by processing large amounts of data from what can be disparate data sources (including itself) and running any number of operations on the data resulting in information that is meaningful. These environments tend to be rough on storage infrastructure due to the large number of requests being placed on them without proper consideration, the requests can completely overwhelm the server, the storage links (fibre channel, Ethernet, local bus), storage controller, or disk configurations.
If at any point in time another request for a storage operation comes into the environment, it is immediately in a contentious state and the request can be queued up somewhere. This results in a reduction of performance for any system attempting to use the storage environment. Depending on the source of the contention, the effects can be widespread across the Enterprise. Yuck!
What’s worse, trying to troubleshoot contention problems are very difficult and frustrating. Not only is it difficult to see storage interactions between your virtual and physical environment, but most storage systems don’t provide per-VM monitoring or reporting information to help visualize what is happening in your virtual environment.
By design, networks are meant to handle simultaneous communications, and virtual networking is no different. However, as virtualization encourages more virtual guests to be created, the likelihood of an over contentious situation is more and more possible.
Network links have bandwidth limitations associated with them. In most instances, this is a hardware limitation, although, some environments (like HP Blade enclosures) allow for the creation of a virtual network interface for the server hardware (via the HP FlexConnect modules). Techniques exist like LACP, EtherChannel, Path Selection, etc… to help ensure maximum bandwidth utilization across configured links.
Virtual machines will join the network just like any other server would and operate as any server would. They communicate on the network to pass data. Depending on the demand for data, a single virtual machine OR a number of virtual machines can completely overwhelm the network capacity for a single host. Combine virtual machine networking with converged networking (aka – storage over Ethernet (iSCSI, FCoE, NFS, SMB, etc.), and NIC contention abounds.
How to deal with these issues
- Do not rely upon the inevitable meters that show how much CPU, Memory, Network, and Storage are being used… Those only provide a brief moment of comfort as that is the environment at that very time. Historical metrics can go a long way to aiding in the determination of when an environment has experienced contention.
- Utilize integrations between components (such as VAAI/ODX) to move some of the processing to the storage infrastructure and relieve pressure from the host/guest OSes.
- Configure QoS, resource reservations, and resource limitations wherever necessary to ensure the performance and functionality are acceptable during times of contention.
- Properly separate network traffic from management and storage to help eliminate cross contention over a shared network infrastructure.
- Right-size virtual machines. The days of giving a ton of resources to a server are long gone. Providing the right amount of resources can be a difficult task… especially as servers are provisioned.
- Use an understanding of possible contention issues as another metric when determining when to add resources.
- Ultimately, it takes an understanding of the behaviors in your virtualization environment.
Virtualization breeds contention and hypervisors love it. They were designed to manage contention and provide the maximum performance possible for the guest virtual machines. However, with that being said, design decisions can result in the hypervisor running out of physical resources. The result is the hypervisor providing a sub-optimal operating environment, which remains the only solution for the contention placed on it.
Ultimately, it takes an understanding of the behaviors in your virtualization environment to properly address the contention issue. The hypervisor is a contention manager. Understanding how it deals with contention, and what else can introduce additional contention will lead to a better performing environment and the hypervisor providing the highest levels of service possible.
Unfortunately, many products today were designed to manage physical environments and don’t provide enough per-VM monitoring or reporting to help visualize what is happening in the virtual environment. The good news is that this is starting to change, as vendors begin to create new products for managing the software-defined virtual infrastructure.