For many applications, delivering performance that’s predictable is just as important as low latency or raw IOPS. With HCI, storage and compute activities run on the same nodes. In most cases, storage software runs inside a VM as a virtual appliance at the guest layer. Each IO operation flows through the hypervisor’s CPU scheduler four times: twice for the IO, twice for IO acknowledgment.
This is less of a problem when system utilization is low, but it becomes a major bottleneck when utilization becomes moderate or heavy because the CPU resource is shared. You could apply CPU reservations, but per-VM reservations introduce a new set of challenges all the way up to impacting cluster-wide HA-failover policies. And CPU reservations do not guarantee the virtual appliance will have instant access to the CPU. If another vCPU is scheduled, it is allowed to finish its operation, causing IO delays within the virtual appliance.
The result is less predictable latency with unexpected spikes when a node or cluster becomes busy. Applications may see latency that varies widely from one IO to the next, which can be a disaster for those that are latency-sensitive. This is exacerbated in an enterprise cloud environment where an organization is managing thousands of virtual machines and/or containers. Despite claims of delivering “web-scale,” conventional HCI is rarely able to meet performance expectations at scale.
HCI promises significant benefits, but as with any major infrastructure decision, let the buyer beware. Enterprise IT teams want—and in many cases, need—to use all the functionality infrastructure can deliver.
With HCI, enabling new functionality can increase resource utilization beyond acceptable levels. As new features like snapshots, replication, deduplication, compression, and so on are enabled, you either add more hardware or it impacts the predictability and performance of your infrastructure. Making these trade-offs can become an almost daily fact of life for HCI admins.
Best-of-Breed Architecture Offers Better Performance—Especially with All-Flash
The main reasons for choosing all-flash storage are:
- Dramatic reductions in the latency of each IO operation
- Big increases in total IOPS
- More predictable performance for every IO
External storage systems do a much better job delivering all three of these than HCI.
Tintri’s CONNECT architecture goes further—we automatically assign every virtual machine and container to its own lane to its own lane to eliminate any conflict over resources. That makes it simple to set minimum and maximum quality of service on individual virtual machines, and guarantee application performance. If performance is a primary consideration, you’ll want to evaluate all options carefully and think twice before choosing HCI.
Next time we’ll look at HCI complexity and risk.
Conventional HCI architectures may fail to deliver the predictable performance necessary for enterprise IT at scale.
This post is part 2 in a series looking at the limitations of deploying conventional HCI architectures for enterprise IT needs.
Performance is a key consideration in almost any IT infrastructure deployment. Enterprise data centers have quickly adopted all-flash storage as a means to deliver the IO performance needed to power applications of all kinds—especially analytics and new mobile and customer-facing applications.
Although hyper-converged infrastructure (HCI) has gotten a lot of attention in industry press, conventional HCI architectures lag behind best-of-breed external storage systems—both in hybrid flash and all-flash configurations—in a number of important performance metrics:
- IOPS (especially with all-flash)
If you’re due for an infrastructure refresh, look carefully at performance before deploying HCI.
The latency of IO operations on conventional HCI implementations suffers in comparison to external storage systems because of the requirement to store multiple copies of each block of data across the network. All data must be mirrored or copied to one or two other nodes. Some vendors support erasure coding, but it comes with a high performance and latency penalty. Others support post-process erasure coding, but just for cold data.
Mirroring or erasure coding affects write latencies and may affect read latencies as well. In a recent study, ESG compared the performance of several HCI platforms under different conditions. The best latency achieved by any solution was around 5ms, which is far slower than best-of-breed all-flash arrays.
Apart from mirroring and erasure coding, activities like VMware vMotion, HA events, maintenance on nodes and node failures can cause increased latencies for workloads because of the noisy nature of vMotion/HA events and the reduction in total available resources.
The IOPS performance that storage can deliver, especially all-flash storage, correlates directly to how much CPU you have. Most standalone all-flash arrays use 28-40 cores per controller for 13-24 SSDs. (Some arrays scale up even higher, but Tintri believes that this can negatively impact the IO density of all-flash and, as a result, performance predictability.)
HCI implementations limit the amount of CPU available for storage. Up to 8 vCPUs or 20% of available CPU are the typical limits. This is not enough horsepower to deliver full performance from the flash drives on each node (6-24), resulting in a lot of wasted flash IOPS. Enabling data reduction on HCI platforms consumes even more CPU, making the situation that much worse. That’s why data reduction is optional on many HCI implementations.
Increasing the amount of CPU dedicated to storage, if possible, can end up having a big impact on licensing costs as discussed in my previous post on HCI costs. You don’t want to be stuck paying for expensive hypervisor, SQL Server, and/or Oracle licenses on CPUs running storage functions.