24 Citations (Scopus)

Abstract

Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.

Original languageEnglish (US)
Pages (from-to)399-409
Number of pages11
JournalApplied Stochastic Models in Business and Industry
Volume29
Issue number5
DOIs
StatePublished - Sep 1 2013

Fingerprint

Statistical Inference
Data storage equipment
Estimate
Internet
Population parameter
Internet Traffic
Standard error
Asymptotic Normality
Asymptotic Properties
Simulation Study
Entire
Estimator
Statistical inference

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Business, Management and Accounting(all)
  • Management Science and Operations Research

Cite this

@article{ca85d16cd98a492d8923de14796e0d1c,
title = "Statistical inference in massive data sets",
abstract = "Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.",
author = "Runze Li and Lin, {Dennis K.J.} and Bing Li",
year = "2013",
month = "9",
day = "1",
doi = "10.1002/asmb.1927",
language = "English (US)",
volume = "29",
pages = "399--409",
journal = "Applied Stochastic Models in Business and Industry",
issn = "1524-1904",
publisher = "John Wiley and Sons Ltd",
number = "5",

}

Statistical inference in massive data sets. / Li, Runze; Lin, Dennis K.J.; Li, Bing.

In: Applied Stochastic Models in Business and Industry, Vol. 29, No. 5, 01.09.2013, p. 399-409.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Statistical inference in massive data sets

AU - Li, Runze

AU - Lin, Dennis K.J.

AU - Li, Bing

PY - 2013/9/1

Y1 - 2013/9/1

N2 - Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.

AB - Analysis of massive data sets is challenging owing to limitations of computer primary memory. In this paper, we propose an approach to estimate population parameters from a massive data set. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient if the entire data set was analyzed simultaneously. Asymptotic properties of the resulting estimate are studied, and the asymptotic normality of the resulting estimator is established. The standard error formula for the resulting estimate is proposed and empirically tested; thus, statistical inference for parameters of interest can be performed. The effectiveness of the proposed approach is illustrated using simulation studies and an Internet traffic data example.

UR - http://www.scopus.com/inward/record.url?scp=84885584832&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885584832&partnerID=8YFLogxK

U2 - 10.1002/asmb.1927

DO - 10.1002/asmb.1927

M3 - Article

AN - SCOPUS:84885584832

VL - 29

SP - 399

EP - 409

JO - Applied Stochastic Models in Business and Industry

JF - Applied Stochastic Models in Business and Industry

SN - 1524-1904

IS - 5

ER -