skip to Main Content

Tintri Blog

Automating Data Refreshes with SyncVM

June 16, 2015

Copy data in organizations is a huge expense and burden to manage. The amount of copy data in an organization exceeds the original data that was created. It can come in the form of multiple data warehouses, each containing the same data for different reporting and development efforts, to multiple development environments, all requiring their own copies of the same or nearly same data. This duplication of data is highly wasteful from a capital and operational expense of owning storage, and is inefficient from an operational data management perspective.

In the world of managing at the VM level, this copy data issue has only become more of a pain to manage when stuck in the paradigm of volumes and LUNs. Cloning VMs just enables more and more data to be duplicated. Automating those clones just accelerates the issue. With traditional storage and virtual machines, you can clone a VM as many times as you want—with the caveat that you need enough space for each VM and enough LUNs provisioned to handle the additional VMs being added into the environment.

No space efficiency, must reconfigure OS, requires reconfiguration each time VM is cloned

Efficiently enabling automation and cloning VMs requires storage that manages and understands VMs natively. Tintri is the only storage company that was built from the ground up with virtualization in mind, and because of that, features like cloning, snapshotting and replication have all been enabled at the storage level per VM.

The per-VM cloning capability, which is done natively at the storage level, has enabled functionality much like linked clones enabled for VDI desktops, with one master copy with delta data being written for the newly minted VMs. Since there is no need for extra software or plugins to enable this functionality, it can be extended to any VM stored on Tintri VMstore This means making copies of VMs for dev/test scenarios to be very efficient in terms of space, and speed.

Cloning in and of itself does not solve the main issue of only refreshing data, which is important once development environments have been built out and customized. At that point, automating the rebuild of a development environment just to do a data refresh would be a daunting task, and hard to maintain. How do you retain the custom apps that are installed? What about operating system configurations?

Must reconfigure OS

Tintri has now solved this problem with a new VMstore feature called SyncVM. Two features enabled by SyncVM are the snapshot “time travel” and data refresh capabilities. For this solution discussion, we will focus on the refreshing of data and how that can be automated to enable automatic refreshes of data in virtualized environments. This means that the operating system—all of its configuration and tuning—can be left alone and allow for virtualized copies of just the data disks to be made.

Never modified

To utilize the SyncVM capability, it can either be manually accessed through the VMstore UI, or automated via scripting/vRO workflows.

VMstore UI

  1. To utilize the VMstore UI, select the VMs you would like to Sync the data into and select Refresh Virtual Disks:
    Tintri UI
  2. Select the VM you’d like to synchronize data from:
    VM to refresh from
  3. Pick the snapshot you’d like to take the data from:
    Snapshot to refresh from
  4. Finally, map the disks:
    vDisks to refresh

With the final step of mapping the disks, you can see that the layout of disks is not predicated on the two VMs matching. You have the option to choose how you match up the disks as you look to refresh the data. Also, note that SCSI 0:0 was not chosen, as that is the operating system disk where applications are also installed.

After the synchronize feature is kicked off, a safety snapshot is created for each VM being synchronized, allowing for the ability to revert back prior to the data synchronization event:

safety snapshot

Then the two virtual machines are powered off, and the disks are replaced with pointers to the production VM’s disks, and then powered back on. All of this occurs within minutes. With this approach, the dataset size does not matter in terms of the speed at which this occurs. Small datasets and large datasets all see the same speed and efficiency.

Automation Options

All of these features can also be automated through scripts. Examples of these scripts can be found at Tintri’s Github page.

By utilizing vRO, you can automate the process to occur on a scheduled time period or even make it an ad-hoc process that your developers can access through vRA or the vCenter Web GUI.

Back To Top