Abstract
With ever-accelerating data creation rates in Big Data applications, there is a need for efficient stream processing engines. Apache Storm has been of interest in both academia and industry because of its real-time, distributed, scalable and reliable framework for stream processing. In this paper, we propose an adaptive hierarchical scheduler for the Storm framework, to allocate the resources more efficiently and improve performance. In our method, we consider the data transfer rate and traffic pattern between Storm's tasks and assign highly-communicating task pairs to the same computing node by dynamically employing two phases of graph partitioning. We also calculate the number of required computing nodes in the cluster based on the overall load of the application and use this information to reduce inter-node traffic. Our performance evaluation shows a significant improvement compared to the default scheduler provided by Storm, which evenly distributes the tasks across the cluster, ignoring the communication patterns between them.