Design and implementation of a parallel geographically weighted k-nearest neighbor classifier

Yingxia Pu, Xinyi Zhao, Guangqing Chi, Shuhe Zhao, Jiechen Wang, Zhibin Jin, Junjun Yin

Research output: Contribution to journalArticle

Abstract

The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.

Original languageEnglish (US)
Pages (from-to)111-122
Number of pages12
JournalComputers and Geosciences
Volume127
DOIs
StatePublished - Jun 1 2019

Fingerprint

Classifiers
Image classification
image classification
Remote sensing
remote sensing
data transmission
Message passing
Parallel algorithms
Data communication systems
land cover
spatial resolution
decomposition
Salts
salt
Decomposition
method

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computers in Earth Sciences

Cite this

Pu, Yingxia ; Zhao, Xinyi ; Chi, Guangqing ; Zhao, Shuhe ; Wang, Jiechen ; Jin, Zhibin ; Yin, Junjun. / Design and implementation of a parallel geographically weighted k-nearest neighbor classifier. In: Computers and Geosciences. 2019 ; Vol. 127. pp. 111-122.
@article{ad5fda334c9745b4b79cfadb8ad5a4a6,
title = "Design and implementation of a parallel geographically weighted k-nearest neighbor classifier",
abstract = "The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.",
author = "Yingxia Pu and Xinyi Zhao and Guangqing Chi and Shuhe Zhao and Jiechen Wang and Zhibin Jin and Junjun Yin",
year = "2019",
month = "6",
day = "1",
doi = "10.1016/j.cageo.2019.02.009",
language = "English (US)",
volume = "127",
pages = "111--122",
journal = "Computers and Geosciences",
issn = "0098-3004",
publisher = "Elsevier Limited",

}

Design and implementation of a parallel geographically weighted k-nearest neighbor classifier. / Pu, Yingxia; Zhao, Xinyi; Chi, Guangqing; Zhao, Shuhe; Wang, Jiechen; Jin, Zhibin; Yin, Junjun.

In: Computers and Geosciences, Vol. 127, 01.06.2019, p. 111-122.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Design and implementation of a parallel geographically weighted k-nearest neighbor classifier

AU - Pu, Yingxia

AU - Zhao, Xinyi

AU - Chi, Guangqing

AU - Zhao, Shuhe

AU - Wang, Jiechen

AU - Jin, Zhibin

AU - Yin, Junjun

PY - 2019/6/1

Y1 - 2019/6/1

N2 - The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.

AB - The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.

UR - http://www.scopus.com/inward/record.url?scp=85063545491&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063545491&partnerID=8YFLogxK

U2 - 10.1016/j.cageo.2019.02.009

DO - 10.1016/j.cageo.2019.02.009

M3 - Article

VL - 127

SP - 111

EP - 122

JO - Computers and Geosciences

JF - Computers and Geosciences

SN - 0098-3004

ER -