TY - GEN

T1 - Strong lower bounds for approximating distribution support size and the distinct elements problem

AU - Raskhodnikova, Sofya

AU - Ron, Dana

AU - Shpilka, Amir

AU - Smith, Adam

PY - 2007/12/1

Y1 - 2007/12/1

N2 - We consider the problem of approximating the support size of a distribution from, a small number of samples, when each element in the distribution appears with probability at least 1/n. This problem is closely related to the problem of approximating the number of distinct elements in a sequence of length n. For both problems, we prove a nearly linear in n lower bound on the query complexity, applicable even for approximation with additive error. At the heart of the lower bound is a construction of two positive integer random variables, X1 and X2, with very different expectations and the following condition on the first k moments: E[X1]/E[X2] =E[X12]/ E[X22] = ... = E[X 1k]/ E[X2k]. Our lower bound method is also applicable to other problems. In particular, it gives new lower bounds for the sample complexity of (1) approximating the entropy of a distribution and (2) approximating how well a given string is compressed by the Lempel-Ziv scheme.

AB - We consider the problem of approximating the support size of a distribution from, a small number of samples, when each element in the distribution appears with probability at least 1/n. This problem is closely related to the problem of approximating the number of distinct elements in a sequence of length n. For both problems, we prove a nearly linear in n lower bound on the query complexity, applicable even for approximation with additive error. At the heart of the lower bound is a construction of two positive integer random variables, X1 and X2, with very different expectations and the following condition on the first k moments: E[X1]/E[X2] =E[X12]/ E[X22] = ... = E[X 1k]/ E[X2k]. Our lower bound method is also applicable to other problems. In particular, it gives new lower bounds for the sample complexity of (1) approximating the entropy of a distribution and (2) approximating how well a given string is compressed by the Lempel-Ziv scheme.

UR - http://www.scopus.com/inward/record.url?scp=46749118848&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46749118848&partnerID=8YFLogxK

U2 - 10.1109/FOCS.2007.4389525

DO - 10.1109/FOCS.2007.4389525

M3 - Conference contribution

AN - SCOPUS:46749118848

SN - 0769530109

SN - 9780769530109

T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

SP - 559

EP - 569

BT - Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007

T2 - 48th Annual Symposium on Foundations of Computer Science, FOCS 2007

Y2 - 20 October 2007 through 23 October 2007

ER -