TY - GEN
T1 - Heterogeneous MacroTasking (HEMT) for Parallel Processing in the Cloud
AU - Shan, Yuquan
AU - Kesidis, George
AU - Jain, Aman
AU - Urgaonkar, Bhurvan
AU - Khamse-Ashari, Jalal
AU - Lambadaris, Ioannis
N1 - Funding Information:
In future work, we will consider application frameworks and middleware embodying more advanced, integrated online learning frameworks that leverage information from offline workload profiling [16] and service-level agreements to more precisely (online) characterize workloads’ resource needs (demand) and executors’ capacity (supply). Actions by different application frameworks based on such learning include HeMT at a fast timescale and determination of preferred types of executors based on cost/performance tradeoffs. For a budget-conscious tenant, we also plan to integrate such actions by application frameworks with scheduling by the cluster manager, i.e., [4] and more efficient, server-specific alternatives [11], based on sizing executors according to workload characterizations and considering data-locality constraints too. That is, the cluster manager’s scheduler would consider estimates of the resource needs of tasks of its application frameworks (perhaps as a function of input dataset size) in order to obtain adequate performance (scheduling in “fine grain” mode). Acknowledgement: This research was supported in part by NSF CNS 1717571 grant and a Cisco Systems URP gift.
Publisher Copyright:
© 2020 ACM.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Using tiny tasks (microtasks) has long been regarded an effective way of load balancing in parallel computing systems. When combined with containerized execution nodes pulling in work upon becoming idle, microtasking has the desirable property of automatically adapting its load distribution to the processing capacities of participating nodes-more powerful nodes finish their work sooner and, therefore, pull in additional work faster. As a result, microtasking is deemed especially desirable in settings with heterogeneous processing capacities and poorly characterized workloads. However, microtasking does have additional scheduling and I/O overheads that may make it costly in some scenarios. Moreover, the optimal task size generally needs to be learned. We herein study an alternative load balancing scheme-Heterogeneous MacroTasking (HEMT)-wherein workload is intentionally skewed according to the nodes' processing capacity. We implemented and open-sourced a prototype of HEMT within the Apache Spark application framework and conducted experiments using the Apache Mesos cluster manager. It's shown experimentally that when workload-specific estimates of nodes' processing capacities are learned, Spark with HEMT offers up to 10% shorter average completion times for realistic, multistage data-processing workloads over the baseline Homogeneous microTasking (HomT) system.
AB - Using tiny tasks (microtasks) has long been regarded an effective way of load balancing in parallel computing systems. When combined with containerized execution nodes pulling in work upon becoming idle, microtasking has the desirable property of automatically adapting its load distribution to the processing capacities of participating nodes-more powerful nodes finish their work sooner and, therefore, pull in additional work faster. As a result, microtasking is deemed especially desirable in settings with heterogeneous processing capacities and poorly characterized workloads. However, microtasking does have additional scheduling and I/O overheads that may make it costly in some scenarios. Moreover, the optimal task size generally needs to be learned. We herein study an alternative load balancing scheme-Heterogeneous MacroTasking (HEMT)-wherein workload is intentionally skewed according to the nodes' processing capacity. We implemented and open-sourced a prototype of HEMT within the Apache Spark application framework and conducted experiments using the Apache Mesos cluster manager. It's shown experimentally that when workload-specific estimates of nodes' processing capacities are learned, Spark with HEMT offers up to 10% shorter average completion times for realistic, multistage data-processing workloads over the baseline Homogeneous microTasking (HomT) system.
UR - http://www.scopus.com/inward/record.url?scp=85100487237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100487237&partnerID=8YFLogxK
U2 - 10.1145/3429885.3429962
DO - 10.1145/3429885.3429962
M3 - Conference contribution
AN - SCOPUS:85100487237
T3 - WOC 2020 - Proceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds, Part of Middleware 2020
SP - 7
EP - 12
BT - WOC 2020 - Proceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds, Part of Middleware 2020
PB - Association for Computing Machinery, Inc
T2 - 6th International Workshop on Container Technologies and Container Clouds, WOC 2020 - Part of Middleware 2020
Y2 - 7 December 2020 through 11 December 2020
ER -