FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data

Chi Li Yu, Jung Sub Kim, Lanping Deng, Srinidhi Kestur, Vijaykrishnan Narayanan, Chaitali Chakrabarti

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.

Original languageEnglish (US)
Pages (from-to)109-122
Number of pages14
JournalJournal of Signal Processing Systems
Volume64
Issue number1
DOIs
StatePublished - Jul 1 2011

Fingerprint

Discrete Fourier transform
Discrete Fourier transforms
Field Programmable Gate Array
Field programmable gate arrays (FPGA)
Decomposition
Decompose
Data storage equipment
Bandwidth
Decomposition Algorithm
High Throughput
Reconfigurable Hardware
External Memory
Digital Image Processing
Throughput
Burst
Reconfigurable hardware
Parallelism
Architecture
High Performance
Generator

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Signal Processing
  • Information Systems
  • Modeling and Simulation
  • Hardware and Architecture

Cite this

Yu, Chi Li ; Kim, Jung Sub ; Deng, Lanping ; Kestur, Srinidhi ; Narayanan, Vijaykrishnan ; Chakrabarti, Chaitali. / FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data. In: Journal of Signal Processing Systems. 2011 ; Vol. 64, No. 1. pp. 109-122.
@article{c0d1c482d0104a13896a62487a90b2c4,
title = "FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data",
abstract = "Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.",
author = "Yu, {Chi Li} and Kim, {Jung Sub} and Lanping Deng and Srinidhi Kestur and Vijaykrishnan Narayanan and Chaitali Chakrabarti",
year = "2011",
month = "7",
day = "1",
doi = "10.1007/s11265-010-0500-y",
language = "English (US)",
volume = "64",
pages = "109--122",
journal = "Journal of Signal Processing Systems",
issn = "1939-8018",
publisher = "Springer New York",
number = "1",

}

FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data. / Yu, Chi Li; Kim, Jung Sub; Deng, Lanping; Kestur, Srinidhi; Narayanan, Vijaykrishnan; Chakrabarti, Chaitali.

In: Journal of Signal Processing Systems, Vol. 64, No. 1, 01.07.2011, p. 109-122.

Research output: Contribution to journalArticle

TY - JOUR

T1 - FPGA architecture for 2d discrete Fourier transform based on 2d decomposition for large-sized data

AU - Yu, Chi Li

AU - Kim, Jung Sub

AU - Deng, Lanping

AU - Kestur, Srinidhi

AU - Narayanan, Vijaykrishnan

AU - Chakrabarti, Chaitali

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.

AB - Applications based on Discrete Fourier Transforms (DFT) are extensively used in several areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (1D) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where 1D DFTs are computed along the rows followed by 1D DFTs along the columns. Both application specific and reconfigurable hardware have utilized this scheme for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement 2D DFT for large-sized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2K ×2K input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations.

UR - http://www.scopus.com/inward/record.url?scp=79956369368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79956369368&partnerID=8YFLogxK

U2 - 10.1007/s11265-010-0500-y

DO - 10.1007/s11265-010-0500-y

M3 - Article

VL - 64

SP - 109

EP - 122

JO - Journal of Signal Processing Systems

JF - Journal of Signal Processing Systems

SN - 1939-8018

IS - 1

ER -