TY - JOUR
T1 - An EM-like algorithm for semi- and nonparametric estimation in multivariate mixtures
AU - Benaglia, Tatiana
AU - Chauveau, Didier
AU - Hunter, David R.
N1 - Funding Information:
The authors are grateful to the reviewers for their constructive comments and helpful suggestions. This research was partially supported by NSF Award SES-0518772. D. R. Hunter thanks Le Studium, CNRS Orléans, France for additional support of this research.
PY - 2009
Y1 - 2009
N2 - We propose an algorithm for nonparametric estimation for finite mixtures of multivariate random vectors that strongly resembles a true EMalgorithm. The vectors are assumed to have independent coordinates conditional upon knowing from which mixture component they come, but otherwise their density functions are completely unspecified. Sometimes, the density functions may be partially specified by Euclidean parameters, a case we call semiparametric. Our algorithm is much more flexible and easily applicable than existing algorithms in the literature; it can be extended to any number of mixture components and any number of vector coordinates of the multivariate observations. Thus it may be applied even in situations where the model is not identifiable, so care is called for when using it in situations for which identifiability is difficult to establish conclusively. Our algorithm yields much smaller mean integrated squared errors than an alternative algorithm in a simulation study. In another example using a real dataset, it provides new insights that extend previous analyses. Finally, we present two different variations of our algorithm, one stochastic and one deterministic, and find anecdotal evidence that there is not a great deal of difference between the performance of these two variants. The computer code and data used in this article are available online.
AB - We propose an algorithm for nonparametric estimation for finite mixtures of multivariate random vectors that strongly resembles a true EMalgorithm. The vectors are assumed to have independent coordinates conditional upon knowing from which mixture component they come, but otherwise their density functions are completely unspecified. Sometimes, the density functions may be partially specified by Euclidean parameters, a case we call semiparametric. Our algorithm is much more flexible and easily applicable than existing algorithms in the literature; it can be extended to any number of mixture components and any number of vector coordinates of the multivariate observations. Thus it may be applied even in situations where the model is not identifiable, so care is called for when using it in situations for which identifiability is difficult to establish conclusively. Our algorithm yields much smaller mean integrated squared errors than an alternative algorithm in a simulation study. In another example using a real dataset, it provides new insights that extend previous analyses. Finally, we present two different variations of our algorithm, one stochastic and one deterministic, and find anecdotal evidence that there is not a great deal of difference between the performance of these two variants. The computer code and data used in this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=69949133535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69949133535&partnerID=8YFLogxK
U2 - 10.1198/jcgs.2009.07175
DO - 10.1198/jcgs.2009.07175
M3 - Article
AN - SCOPUS:69949133535
SN - 1061-8600
VL - 18
SP - 505
EP - 526
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 2
ER -