Resilient and Cost effective Scheduling of a bag of Tasks on spot VMs in Java

Resilient and Cost effective Scheduling of a bag of Tasks on spot VMs in Java

Abstract:

Many data and task parallel applications can be modeled as a Bag of Tasks (BoT), and scheduled on distributed systems such as Grids, Clusters, and Clouds. We propose AutoBoT, a collection of scheduling strategies for BoTs with hard deadlines on Cloud Virtual Machines (VMs), to lower the overall monetary cost --- a distinctive factor for Clouds. Besides reliable fixed-price VMs, AutoBoT uniquely reduces costs by including preemptible spot-priced VMs that are much cheaper, but are unreliable and have time-variant pricing. It guarantees timely completion by making active runtime decisions on pricing, number of VMs to acquire/release, and on task placement, checkpointing and migration. Our rigorous simulations of 7 Million BoT runs sampled from the Google cluster workload uses a realistic Cloud model and 6 months of Amazon EC2 pricing data to compare AutoBoT against two baseline algorithms. We analyze the impact of BoT size, data centers, time periods, deadline duration, loss budget and checkpointing strategies. AutoBoT often gives ≍ 80% profit and rare but bounded losses, compared to using only fixed-price VMs. Further, its 100% completion guarantee is 23 - 42% better than using only spot-priced VMs which offer a similar profit.