Implications of Public Cloud Resource Heterogeneity for Inference Serving

Jashwant Raj Gunasekaran, Cyan Subhra Mishra, Prashanth Thinakaran, Mahmut Taylan Kandemir, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We are witnessing an increasing trend towards using Machine Learning (ML) based prediction systems, spanning across different application domains, including product recommendation systems, personal assistant devices, facial recognition, etc. These applications typically have diverse requirements in terms of accuracy and response latency, that can be satisfied by a myriad of ML models. However, the deployment cost of prediction serving primarily depends on the type of resources being procured, which by themselves are heterogeneous in terms of provisioning latencies and billing complexity. Thus, it is strenuous for an inference serving system to choose from this confounding array of resource types and model types to provide low-latency and cost-effective inferences. In this work we quantitatively characterize the cost, accuracy and latency implications of hosting ML inferences on different public cloud resource offerings. Our evaluation shows that, prior work does not solve the problem from both dimensions of model and resource heterogeneity. Hence, to holistically address this problem, we need to solve the issues that arise from combining both model and resource heterogeneity towards optimizing for application constraints. Towards this, we discuss the design implications of a self-managed inference serving system, which can optimize for application requirements based on public cloud resource characteristics.

Original languageEnglish (US)
Title of host publicationWOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020
PublisherAssociation for Computing Machinery, Inc
Pages7-12
Number of pages6
ISBN (Electronic)9781450382045
DOIs
StatePublished - Dec 7 2020
Event6th International Workshop on Serverless Computing, WOSC 2020 - Part of Middleware 2020 - Virtual, Online, Netherlands
Duration: Dec 7 2020Dec 11 2020

Publication series

NameWOSC 2020 - Proceedings of the 2020 6th International Workshop on Serverless Computing, Part of Middleware 2020

Conference

Conference6th International Workshop on Serverless Computing, WOSC 2020 - Part of Middleware 2020
CountryNetherlands
CityVirtual, Online
Period12/7/2012/11/20

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'Implications of Public Cloud Resource Heterogeneity for Inference Serving'. Together they form a unique fingerprint.

Cite this