There are more than 190 configuration parameters affecting the performance of MapReduce jobs on Hadoop. It is time-consuming and tedious for general users who have no deep knowledge of Hadoop configuring to tune the parameters of a MapReduce job for optimal performance. Therefore, a self-Tuning system to improve MapReduce performance in an automated and efficient manner in a complicated Hadoop environment is needed. This article explores multiple tuning methods to improve tuning efficiency for MapReduce performance on Hadoop. The proposed Catla system employs succinct templates and proper schemes of MapReduce algorithms, which can be incorporated in facilitating the tuning and optimization of MapReduce performance. A comprehensive evaluation of the Catla system, with the support of multiple tuning approaches, is discussed in this article. Direct search-based and derivative-free optimization-based tuning techniques for improved efficiency and usability are evaluated using a series of tuning experiments. The experimental results reveal that our work can identify optimal Hadoop parameters for deployed MapReduce jobs in a noninvasive, flexible, automated, and comprehensive manner.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Information Systems
- Computer Science Applications
- Computer Networks and Communications
- Electrical and Electronic Engineering