Shahed University

DTHMM ExaLB: discrete-time hidden Markov model for load balancing in distributed exascale computing environment

Ulphat Bakhishoff | Ehsan Mousavi Khaneghah | Araz R. Aliev | Amirhossein Reyhani Showkatabadi

URL :   http://research.shahed.ac.ir/WSR/WebPages/Report/PaperView.aspx?PaperID=137528
Date :  2020/04/26
Publish in :     Cogent Engineering
DOI :  https://doi.org/10.1080/23311916.2020.1743404
Link :  http://dx.doi.org/10.1080/23311916.2020.1743404
Keywords :High Performance Computing, Distributed Exascale Computing Systems, Load Balancing, Hidden Markov Models

Abstract :
In high performance computing systems, the load-balancing manager decides on performing reloading related activities based on information received on the system state. In Distributed Exascale systems, unlike traditional High Performance Computing (HPC) systems, dynamic and interactive events occur in a system that changes system status, therefore, they violate the activities associated with the load-balancing manager. Managing these types of events, require a thorough analysis of the dynamic and interactive occurrences that lead to this situation. In this paper, dynamic and interactive events, which violate the function and activity of the burden distribution manager based on the Discrete-Time Hidden Markov Models, were analyzed. This mathematical model was used to analyze dynamic and Interactive events on the system state. The mathematical model presented in this paper provides such ability for the load balancer that instead of analyzing each dynamic and interactive event, do the activity based on changes, which violates functionality of load balancer, in system state. Based on this model, load balancer manages to reload the system when dynamic and interactive events occur. This makes the load balancer, which is able to continue the process of implementing its previous activities, taking into account not only dynamic and interactive behaviors, but also the new state of the system. As the result of evolutions it is got that the model makes an opportunity to predict load difference on resources based on simple mathematical calculations while dynamic and interactive event is occurred.