Estimating Unobserved Probability and the Number of Unobserved Outcomes of an Experiment
Suppose that an experiment with an unknown, possibly infinite, number of outcomes is performed and that these outcomes occur according to some random mechanism. Suppose that n independent trials are carried out and that N distinct outcomes have been observed. We attempt to answer the following questions: What is the probability that, on the next trial, an outcome not observed before occurs? (This is called the problem of unobserved probability.) What is the total number of outcomes not observed? This second problem has a long history going back to Turing and is, apart from its mathematical interest, important in many areas such as biology (species sampling), numismatics, and literary scholarship. We'll give a brief survey of past work and also discuss recent joint work with Alberto Gandolfi in which a Bayes-like estimator for the number of unobserved outcomes is derived. This has the advantage over the existing estimators -- due to Chao and Lee and others -- in that, modulo the fact that Turing's ansatz is used (it is used by everyone else as well), it is derived from first principles, without any ad hoc assumptions, and includes previous estimators as special cases. We'll also briefly discuss an almost complete classification of infinite discrete probability measures, which emerges as a by-product of a solution we have obtained for the problem of unobserved probability.