### Abstract

We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

Original language | English (US) |
---|---|

Title of host publication | 2017 IEEE International Symposium on Information Theory, ISIT 2017 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 2403-2407 |

Number of pages | 5 |

ISBN (Electronic) | 9781509040964 |

DOIs | |

State | Published - Aug 9 2017 |

Event | 2017 IEEE International Symposium on Information Theory, ISIT 2017 - Aachen, Germany Duration: Jun 25 2017 → Jun 30 2017 |

### Publication series

Name | IEEE International Symposium on Information Theory - Proceedings |
---|---|

ISSN (Print) | 2157-8095 |

### Other

Other | 2017 IEEE International Symposium on Information Theory, ISIT 2017 |
---|---|

Country | Germany |

City | Aachen |

Period | 6/25/17 → 6/30/17 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Information Systems
- Modeling and Simulation
- Applied Mathematics

### Cite this

*2017 IEEE International Symposium on Information Theory, ISIT 2017*(pp. 2403-2407). [8006960] (IEEE International Symposium on Information Theory - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISIT.2017.8006960

}

*2017 IEEE International Symposium on Information Theory, ISIT 2017.*, 8006960, IEEE International Symposium on Information Theory - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 2403-2407, 2017 IEEE International Symposium on Information Theory, ISIT 2017, Aachen, Germany, 6/25/17. https://doi.org/10.1109/ISIT.2017.8006960

**Coded convolution for parallel and distributed computing within a deadline.** / Dutta, Sanghamitra; Cadambe, Viveck Ramesh; Grover, Pulkit.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Coded convolution for parallel and distributed computing within a deadline

AU - Dutta, Sanghamitra

AU - Cadambe, Viveck Ramesh

AU - Grover, Pulkit

PY - 2017/8/9

Y1 - 2017/8/9

N2 - We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

AB - We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

UR - http://www.scopus.com/inward/record.url?scp=85034107899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034107899&partnerID=8YFLogxK

U2 - 10.1109/ISIT.2017.8006960

DO - 10.1109/ISIT.2017.8006960

M3 - Conference contribution

AN - SCOPUS:85034107899

T3 - IEEE International Symposium on Information Theory - Proceedings

SP - 2403

EP - 2407

BT - 2017 IEEE International Symposium on Information Theory, ISIT 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -