Data skeletons: Simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation

James P. McDermott, G. Jogesh Babu, John C. Liechty, Dennis K.J. Lin

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.

Original languageEnglish (US)
Pages (from-to)311-321
Number of pages11
JournalStatistics and Computing
Volume17
Issue number4
DOIs
StatePublished - Dec 1 2007

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Data skeletons: Simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation'. Together they form a unique fingerprint.

Cite this