Ziv Bar-Yossef, T.S. Jayram, et al.
Journal of Computer and System Sciences
We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i-th location is the probability of the i-th element, then linear time is both necessary and sufficient for approximating the entropy. We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a γ-multiplicative approximation to the entropy can be obtained in O (n(1+η)/γ(2) poly(log n)) time for distributions with entropy Ω(γ/η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω (nmax(1/(2γ2),2/(5γ2-2))). We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a γ-multiplicative approximation to the entropy can be obtained in O (γ2 log2 n/h2(γ-1)2) time for distributions with entropy Ω(h); we show a lower bound of Ω (log n/h(γ2-1)). Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
Ziv Bar-Yossef, T.S. Jayram, et al.
Journal of Computer and System Sciences
Ronald Fagin, Ravi Kumar, et al.
SIAM Journal on Discrete Mathematics
Ravi Kumar, Jasmine Novak, et al.
World Wide Web
Ronald Fagin, Ravi Kumar, et al.
SIAM Journal on Discrete Mathematics