TY - JOUR
T1 - "Short-Dot"
T2 - 30th Annual Conference on Neural Information Processing Systems, NIPS 2016
AU - Dutta, Sanghamitra
AU - Cadambe, Viveck
AU - Grover, Pulkit
N1 - Funding Information:
Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA. We also acknowledge NSF Awards 1350314, 1464336 and 1553248. S Dutta also received Prabhu and Poonam Goel Graduate Fellowship.
Publisher Copyright:
© 2016 NIPS Foundation - All Rights Reserved.
PY - 2016
Y1 - 2016
N2 - Faced with saturation of Moore's law and increasing size and dimension of data, system designers have increasingly resorted to parallel and distributed computing to reduce computation time of machine-learning algorithms. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to complete their processing. To combat the effect of stragglers, recent literature proposes introducing redundancy in computations across processors, e.g., using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique - that we call "Short-Dot" - to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed more efficiently at individual processors. Further, only a subset of these short dot products are required at the fusion node to finish the computation successfully. We demonstrate through probabilistic analysis as well as experiments on computing clusters that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors required to finish), for any such strategy and compare it to that achieved by our strategy.
AB - Faced with saturation of Moore's law and increasing size and dimension of data, system designers have increasingly resorted to parallel and distributed computing to reduce computation time of machine-learning algorithms. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to complete their processing. To combat the effect of stragglers, recent literature proposes introducing redundancy in computations across processors, e.g., using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique - that we call "Short-Dot" - to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed more efficiently at individual processors. Further, only a subset of these short dot products are required at the fusion node to finish the computation successfully. We demonstrate through probabilistic analysis as well as experiments on computing clusters that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors required to finish), for any such strategy and compare it to that achieved by our strategy.
UR - http://www.scopus.com/inward/record.url?scp=85019226266&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019226266&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85019226266
SN - 1049-5258
SP - 2100
EP - 2108
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 5 December 2016 through 10 December 2016
ER -