Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop

Donghua Chen, Runtong Zhang, Robin Guanghua Qiu

Research output: Contribution to journalArticlepeer-review

Abstract

There are more than 190 configuration parameters affecting the performance of MapReduce jobs on Hadoop. It is time-consuming and tedious for general users who have no deep knowledge of Hadoop configuring to tune the parameters of a MapReduce job for optimal performance. Therefore, a self-Tuning system to improve MapReduce performance in an automated and efficient manner in a complicated Hadoop environment is needed. This article explores multiple tuning methods to improve tuning efficiency for MapReduce performance on Hadoop. The proposed Catla system employs succinct templates and proper schemes of MapReduce algorithms, which can be incorporated in facilitating the tuning and optimization of MapReduce performance. A comprehensive evaluation of the Catla system, with the support of multiple tuning approaches, is discussed in this article. Direct search-based and derivative-free optimization-based tuning techniques for improved efficiency and usability are evaluated using a series of tuning experiments. The experimental results reveal that our work can identify optimal Hadoop parameters for deployed MapReduce jobs in a noninvasive, flexible, automated, and comprehensive manner.

Original languageEnglish (US)
Article number9205847
Pages (from-to)2906-2917
Number of pages12
JournalIEEE Systems Journal
Volume15
Issue number2
DOIs
StatePublished - Jun 2021

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Noninvasive MapReduce Performance Tuning Using Multiple Tuning Methods on Hadoop'. Together they form a unique fingerprint.

Cite this