Abstract:
As a crucial task in heterogeneous distributed systems, DAG-scheduling models a scheduling application with a set of distributed tasks by a Direct Acyclic Graph (DAG). The goal is to assign tasks to different processors so that the whole application can finish as soon as possible. Task Duplication-Based (TDB) scheme is an important technique addressing this problem. The main idea is to duplicate tasks on multiple machines so that the results of the duplicated tasks are available on multiple machines to trade computation time for communication time. Existing TDB algorithms enumerate and test all possible duplication candidates, and only keep the candidates that can improve the overall scheduling. We observe that while a duplication candidate is ineffective at the moment, after other duplications have been applied, this ineffective duplication candidate can become effective, which in turn can cause other ineffective duplications to become effective. We call this phenomenon the chain reaction of task duplication. We propose a novel Task Duplication based Clustering Algorithm (TDCA) to improve the schedule performance by utilizing duplication task more thoroughly. TDCA improves parameter calculation, task duplication, and task merging. The analysis and experiments are based on randomly generated graphs with various characteristics, including DAG depth and width, communication-computing cost ration, and variant computation power of processors. Our results demonstrate that the TDCA algorithm is very competitive. It improves the schedule makespan of task duplication-based algorithms for heterogeneous systems for various communication-computing cost ratios.