Abstract
To efficiently handle a large volume of data, scheduling algorithms in stream processing systems need to minimise the data movement between communicating tasks to improve system throughput. However, finding an optimal scheduling algorithm for these systems is NP-hard. In this paper, we propose a heuristic scheduling algorithm for a heterogeneous cluster-T3 Scheduler-that can efficiently identify the communicating tasks and assign them to the same node, up to a specified level of utilisation for that node. Using three common micro-benchmarks and an evaluation using a real-world application, we demonstrate that T3-Scheduler outperforms current state-of-the-art scheduling algorithms, such as Aniello et al.'s popular 'Online scheduler', improving throughput by 20-72% for micro-benchmarks and 60% for the real-world application.