skip to Main Content

Tintri Blog

Building predictive analytics for VM scale-out

May 17, 2016

Try it yourself

Ready to get your hands dirty? We’ve got an interactive demo that will show you the Tintri UI, right in your browser. It’s 100% free—and no registration is required.

  • Predictive analytics solve not just the immediate problem, but also finds a set of virtual machines to move that make the problem less likely to reoccur.
  • Only native per-VM visibility makes it efficient to balance workloads across heterogeneous systems with VMs from multiple hypervisors.
  • VM scale-out considers not just capacity limits, but also the rate of growth, the amount of I/O load, the size of the flash working set and the presence of native snapshots.

Tintri’s VM Scale-Out feature uses per virtual machine information to determine the best set of VMs to move in response to capacity and performance limits. Tintri Global Center makes predictions covering the next week of activity, looks for potential problems and selects VMs to move that will solve the issues discovered.

If we move “desktop-VM,” then that will free up some space, but the problem will likely reoccur. Rerunning the predictive model with this VM removed from the VMstore shows that there is still some chance of exceeding the threshold value:

In this case we’re pretty certain that there will be a problem. Moving a VM to another member of the pool will ensure that this VMstore doesn’t run out of space. But there are many possible choices. Let’s look at two VMs that are using a large amount of space:

On the other hand, if we move vid-repo, the rapidly growing VM, onto a VMstore which has room to support its growth, then the original VMstore is in good shape:

VM Scale-Out will recommend moving “vid-repo” in this scenario, ensuring that the expanding needs of this VM can be met.

Our predictive model for space is an ensemble predictor, which combines multiple techniques into a single predictor. It examines both long- and short-term trends, as well as running a Monte Carlo simulation based on the observed increases and decreases of space usage. This allows us to capture not just organic growth in individual VMs, but also highly variable loads such as test-and-dev workloads which rapidly add and delete virtual machines.

Other factors

It’s important to consider other factors besides the VM’s live size. Array-side snapshots of the VM may exist, and deleting the live VM won’t free up the space trapped in snapshots. Our space predictions take the size of snapshots into account, and the VM scale out recommendation executor will move snapshots along with the live VM. A single array may also be overloaded on throughput rather than capacity, and VM scale out makes a similar prediction about the effects of moving the VM on overall system load.

Tintri sells both hybrid and all-flash arrays, and VM scale-out can combine both into a single pool (including previous-generation platforms!) In addition to space and I/O load measurements, we also predict the expected flash hit rate, which lets us identify VMs which belong entirely in flash, while keeping VMs with smaller working set sizes on a combination of flash and disk.

All of these predictions are combined into a single model which lets us make the best decision for the overall health of the pool. Tintri Global Center’s existing features migrate protection policies, maintain quality-of-service settings, and present a unified performance history for each VM. These capabilities work across VMs from multiple hypervisors to present the abstraction of a single pool of storage. No other load balancing product natively understands both VMs and the behavior of the Tintri VMstore.

VM Scale-Out’s ability to predict the impact of changes at the VM level means that we can not only find problems ahead of time, but also solve them appropriately. It’s a scale-out solution that’s future-proof, with no custom hardware requirements, and one that will let your business scale from hundreds of VMs to 160,000 VMs.

It’s time to stop spending so much time and money simply going backwards. Scale storage differently with Tintri.

You want to scale out your storage because your business is growing, and along with it, your virtualized applications. But if you’re on traditional storage, unfortunately that means you’re stuck with traditional scale-out—scale-out based on blocks and custom hardware that is not making the best use of your storage time and money.

You can’t avoid problems on traditional storage.

By the time your storage array has run out of space, it’s too late. VMs that aren’t thick-provisioned will experience write errors and stop working. Backups that use native snapshot and replication capabilities will be missing. Even for less-critical errors, heavy I/O load will annoy your storage’s users and cause support calls.

This isn’t something you want to react to. You want to avoid these problems entirely.

Without the ability to “look ahead,” the best we can do is set threshold values low enough that the capacity problem can be resolved before it causes any harm. This leads to overprovisioning and wastes expensive resources. It also means that the storage system may expend a lot of time moving data with a short lifetime, or be constantly busy instead of identifying the real culprits.

Unless, of course, you’re on Tintri.

Tintri VM Scale-Out makes your virtualized applications happy.

With VM Scale-Out, we predict the next week of behavior for every VMstore in the pool, using the past 30 days of data available on Tintri Global Center. This prediction identifies potential problems involving resource shortages. Then, we look at which VMs could be moved to eliminate those problems, and reapply the predictive model on the hypothetical moves. This lets us identify VMs which not only decrease space or I/O load, but also affect the trend line for space consumption.

Our predictions are probabilistic in nature. That is, instead of predicting just a single data point, we predict a range of possible outcomes and look at what fraction of them are above a threshold value (such as running out of space, or reaching a level at which an alert is generated.)

Let’s look at a simple example. A VMstore with five VMs on it is likely to run out of space within the next week; its aggregate space usage is plotted below, along with the bounds on the prediction.

Back To Top
This site is registered on as a development site.