Mixed-precision block gram schmidt orthogonalization

Ichitaro Yamazaki, Stanimire Tomov, Jakub Kurzak, Jack Dongarra, Jesse Barlow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the softwareemulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the softwareemulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixedprecision CholQR, such a block variant can obtain speedups of up to 7:1× while maintaining about the same order of the numerical errors.

Original languageEnglish (US)
Title of host publicationProceedings of ScalA 2015
Subtitle of host publication6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450340113
DOIs
StatePublished - Nov 15 2015
Event6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2015 - Austin, United States
Duration: Nov 15 2015Nov 20 2015

Publication series

NameProceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Other

Other6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2015
CountryUnited States
CityAustin
Period11/15/1511/20/15

Fingerprint

Program processors
Costs
Hardware
Communication
Graphics processing unit

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Software

Cite this

Yamazaki, I., Tomov, S., Kurzak, J., Dongarra, J., & Barlow, J. (2015). Mixed-precision block gram schmidt orthogonalization. In Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis [2] (Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis). Association for Computing Machinery, Inc. https://doi.org/10.1145/2832080.2832082
Yamazaki, Ichitaro ; Tomov, Stanimire ; Kurzak, Jakub ; Dongarra, Jack ; Barlow, Jesse. / Mixed-precision block gram schmidt orthogonalization. Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc, 2015. (Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis).
@inproceedings{2179e05840a9452e99aa6f30ee232427,
title = "Mixed-precision block gram schmidt orthogonalization",
abstract = "The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the softwareemulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the softwareemulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixedprecision CholQR, such a block variant can obtain speedups of up to 7:1× while maintaining about the same order of the numerical errors.",
author = "Ichitaro Yamazaki and Stanimire Tomov and Jakub Kurzak and Jack Dongarra and Jesse Barlow",
year = "2015",
month = "11",
day = "15",
doi = "10.1145/2832080.2832082",
language = "English (US)",
series = "Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis",
publisher = "Association for Computing Machinery, Inc",
booktitle = "Proceedings of ScalA 2015",

}

Yamazaki, I, Tomov, S, Kurzak, J, Dongarra, J & Barlow, J 2015, Mixed-precision block gram schmidt orthogonalization. in Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis., 2, Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery, Inc, 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2015, Austin, United States, 11/15/15. https://doi.org/10.1145/2832080.2832082

Mixed-precision block gram schmidt orthogonalization. / Yamazaki, Ichitaro; Tomov, Stanimire; Kurzak, Jakub; Dongarra, Jack; Barlow, Jesse.

Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc, 2015. 2 (Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Mixed-precision block gram schmidt orthogonalization

AU - Yamazaki, Ichitaro

AU - Tomov, Stanimire

AU - Kurzak, Jakub

AU - Dongarra, Jack

AU - Barlow, Jesse

PY - 2015/11/15

Y1 - 2015/11/15

N2 - The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the softwareemulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the softwareemulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixedprecision CholQR, such a block variant can obtain speedups of up to 7:1× while maintaining about the same order of the numerical errors.

AB - The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the softwareemulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the softwareemulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixedprecision CholQR, such a block variant can obtain speedups of up to 7:1× while maintaining about the same order of the numerical errors.

UR - http://www.scopus.com/inward/record.url?scp=84968627043&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84968627043&partnerID=8YFLogxK

U2 - 10.1145/2832080.2832082

DO - 10.1145/2832080.2832082

M3 - Conference contribution

AN - SCOPUS:84968627043

T3 - Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

BT - Proceedings of ScalA 2015

PB - Association for Computing Machinery, Inc

ER -

Yamazaki I, Tomov S, Kurzak J, Dongarra J, Barlow J. Mixed-precision block gram schmidt orthogonalization. In Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, Inc. 2015. 2. (Proceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis). https://doi.org/10.1145/2832080.2832082