Coded convolution for parallel and distributed computing within a deadline

Sanghamitra Dutta, Viveck Ramesh Cadambe, Pulkit Grover

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    15 Citations (Scopus)

    Abstract

    We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

    Original languageEnglish (US)
    Title of host publication2017 IEEE International Symposium on Information Theory, ISIT 2017
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages2403-2407
    Number of pages5
    ISBN (Electronic)9781509040964
    DOIs
    StatePublished - Aug 9 2017
    Event2017 IEEE International Symposium on Information Theory, ISIT 2017 - Aachen, Germany
    Duration: Jun 25 2017Jun 30 2017

    Publication series

    NameIEEE International Symposium on Information Theory - Proceedings
    ISSN (Print)2157-8095

    Other

    Other2017 IEEE International Symposium on Information Theory, ISIT 2017
    CountryGermany
    CityAachen
    Period6/25/176/30/17

    Fingerprint

    Parallel and Distributed Computing
    Distributed computer systems
    Deadline
    Parallel processing systems
    Convolution
    Models of Computation
    Exponent
    Distributed Systems
    Weibull Model
    Worst-case Analysis
    Parallel Processors
    Resilience
    Failure analysis
    Linear Codes
    Replication
    Tail
    Closed-form
    Quantify
    Coding
    Entire

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Information Systems
    • Modeling and Simulation
    • Applied Mathematics

    Cite this

    Dutta, S., Cadambe, V. R., & Grover, P. (2017). Coded convolution for parallel and distributed computing within a deadline. In 2017 IEEE International Symposium on Information Theory, ISIT 2017 (pp. 2403-2407). [8006960] (IEEE International Symposium on Information Theory - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISIT.2017.8006960
    Dutta, Sanghamitra ; Cadambe, Viveck Ramesh ; Grover, Pulkit. / Coded convolution for parallel and distributed computing within a deadline. 2017 IEEE International Symposium on Information Theory, ISIT 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 2403-2407 (IEEE International Symposium on Information Theory - Proceedings).
    @inproceedings{f61fb404a92047e8a6536dd30e7ecd68,
    title = "Coded convolution for parallel and distributed computing within a deadline",
    abstract = "We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.",
    author = "Sanghamitra Dutta and Cadambe, {Viveck Ramesh} and Pulkit Grover",
    year = "2017",
    month = "8",
    day = "9",
    doi = "10.1109/ISIT.2017.8006960",
    language = "English (US)",
    series = "IEEE International Symposium on Information Theory - Proceedings",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    pages = "2403--2407",
    booktitle = "2017 IEEE International Symposium on Information Theory, ISIT 2017",
    address = "United States",

    }

    Dutta, S, Cadambe, VR & Grover, P 2017, Coded convolution for parallel and distributed computing within a deadline. in 2017 IEEE International Symposium on Information Theory, ISIT 2017., 8006960, IEEE International Symposium on Information Theory - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 2403-2407, 2017 IEEE International Symposium on Information Theory, ISIT 2017, Aachen, Germany, 6/25/17. https://doi.org/10.1109/ISIT.2017.8006960

    Coded convolution for parallel and distributed computing within a deadline. / Dutta, Sanghamitra; Cadambe, Viveck Ramesh; Grover, Pulkit.

    2017 IEEE International Symposium on Information Theory, ISIT 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 2403-2407 8006960 (IEEE International Symposium on Information Theory - Proceedings).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Coded convolution for parallel and distributed computing within a deadline

    AU - Dutta, Sanghamitra

    AU - Cadambe, Viveck Ramesh

    AU - Grover, Pulkit

    PY - 2017/8/9

    Y1 - 2017/8/9

    N2 - We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

    AB - We consider the problem of computing the convolution of two long vectors using parallel processors in the presence of 'stragglers'. Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides improved resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target 'deadline' time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e., the behavior of the 'tail'. Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.

    UR - http://www.scopus.com/inward/record.url?scp=85034107899&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85034107899&partnerID=8YFLogxK

    U2 - 10.1109/ISIT.2017.8006960

    DO - 10.1109/ISIT.2017.8006960

    M3 - Conference contribution

    AN - SCOPUS:85034107899

    T3 - IEEE International Symposium on Information Theory - Proceedings

    SP - 2403

    EP - 2407

    BT - 2017 IEEE International Symposium on Information Theory, ISIT 2017

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -

    Dutta S, Cadambe VR, Grover P. Coded convolution for parallel and distributed computing within a deadline. In 2017 IEEE International Symposium on Information Theory, ISIT 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 2403-2407. 8006960. (IEEE International Symposium on Information Theory - Proceedings). https://doi.org/10.1109/ISIT.2017.8006960