A compiler framework for extracting superword level parallelism

Jun Liu, Yuanrui Zhang, Ohyoung Jang, Wei Ding, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a stateof- the-art SLP optimization algorithm.

Original languageEnglish (US)
Title of host publicationPLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation
Pages347-357
Number of pages11
DOIs
StatePublished - Jul 9 2012
Event33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12 - Beijing, China
Duration: Jun 11 2012Jun 16 2012

Publication series

NameProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Other

Other33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12
CountryChina
CityBeijing
Period6/11/126/16/12

Fingerprint

Data storage equipment
Microprocessor chips
Scheduling

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Liu, J., Zhang, Y., Jang, O., Ding, W., & Kandemir, M. (2012). A compiler framework for extracting superword level parallelism. In PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 347-357). (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). https://doi.org/10.1145/2254064.2254106
Liu, Jun ; Zhang, Yuanrui ; Jang, Ohyoung ; Ding, Wei ; Kandemir, Mahmut. / A compiler framework for extracting superword level parallelism. PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. pp. 347-357 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).
@inproceedings{4c8ace8a223c48a5bd104b34fa67cfed,
title = "A compiler framework for extracting superword level parallelism",
abstract = "SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2{\%} over a stateof- the-art SLP optimization algorithm.",
author = "Jun Liu and Yuanrui Zhang and Ohyoung Jang and Wei Ding and Mahmut Kandemir",
year = "2012",
month = "7",
day = "9",
doi = "10.1145/2254064.2254106",
language = "English (US)",
isbn = "9781450312059",
series = "Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)",
pages = "347--357",
booktitle = "PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation",

}

Liu, J, Zhang, Y, Jang, O, Ding, W & Kandemir, M 2012, A compiler framework for extracting superword level parallelism. in PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 347-357, 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, Beijing, China, 6/11/12. https://doi.org/10.1145/2254064.2254106

A compiler framework for extracting superword level parallelism. / Liu, Jun; Zhang, Yuanrui; Jang, Ohyoung; Ding, Wei; Kandemir, Mahmut.

PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. p. 347-357 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A compiler framework for extracting superword level parallelism

AU - Liu, Jun

AU - Zhang, Yuanrui

AU - Jang, Ohyoung

AU - Ding, Wei

AU - Kandemir, Mahmut

PY - 2012/7/9

Y1 - 2012/7/9

N2 - SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a stateof- the-art SLP optimization algorithm.

AB - SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a stateof- the-art SLP optimization algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84863427661&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863427661&partnerID=8YFLogxK

U2 - 10.1145/2254064.2254106

DO - 10.1145/2254064.2254106

M3 - Conference contribution

AN - SCOPUS:84863427661

SN - 9781450312059

T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

SP - 347

EP - 357

BT - PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation

ER -

Liu J, Zhang Y, Jang O, Ding W, Kandemir M. A compiler framework for extracting superword level parallelism. In PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. p. 347-357. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). https://doi.org/10.1145/2254064.2254106