Chapter 2.0 : Prerequisite 1 - Sufficient Statistics

Christopher M. Bishop, PRML, Chapter 2 Probability Distributions

Chapter 2.0 : Prerequisite 1 - Sufficient Statistics

1. Introduction

In the process of estimating parameters, we summarize, or reduce, the information in a sample of size , , to a single number, such as the sample mean . The actual sample values are no longer important to us. That is, if we use a sample mean of to estimate the population mean μ, it doesn’t matter if the original data values were or .

Problems:

Has this process of reducing the data points to a single number retained all of the information about that was contained in the original data points?
Or has some information about the parameter been lost through the process of summarizing the data?

In this lesson, we’ll learn how to find statistics that summarize all of the information in a sample about the desired parameter. Such statistics are called sufficient statistics.

2. Definition of Sufficiency

2.1 Definition:

Let be a random sample from a probability distribution with unknown parameter . Then, the statistic is said to be sufficient for if the conditional distribution of , given the statistic , i.e., does not depend on the parameter .

Why called “sufficient”?
- We say that is sufficient for , since once the value of is known (即，有了value of (i.e., given ) 就足以获取了关于未知参数的全部可用信息), 并且no other function of will provide any additional information about the possible value of .
- Sufficiency means that if we know the value of , we cannot gain any further information about the parameter by considering other functions of the data .

2.2 Example 1 - Binomial Distribution：

Consider Bernoulli trials：

Let be a random sample of Bernoulli trials in which the success has the probability , and the fail with , i.e, , and , for . Suppose, in a random sample of , that success events occur in total. If we know the value of , the number of successes in trials, can we gain any further information about the parameter by considering other functions of the data ? Or equivalently is sufficient for ?

Solution:

The definition of sufficiency tells us that if the conditional distribution of , given the statistic , does not depend on , then is said to be a sufficient statistic for the unknown parameter . The conditional distribution of , given , is given by:

Now, for the sake of concreteness, suppose we were to observe a random sample of size in which . In this case:

because

, corresponding to an impossible event in the numerator of (2.1) therefore with its probability being 0.

Now, let’s consider an event that is possible, namely . In that case, we have, by independence:

So, in general:

and

Now, the denominator in (2.1) is the binomial probability of getting exactly successes in trials with a probability of success . That is, the denominator is:

for

.

Putting the numerator and denominator together, we get

and

Conclusion 1:

We have just shown that the conditional distribution of given does not depend on . Therefore, is indeed sufficient for . That is, once the value of is known, no other function of will provide any additional information about the possible value of .

3. Factorization Theorem

3.1 We need more easy method to identify sufficiency:

While the definition of sufficiency may make sense intuitively, it is not always all that easy to find the conditional distribution of given . Not to mention that we’d have to find the conditional distribution of given for every that we’d want to consider a possible sufficient statistic! Therefore, using the formal definition of sufficiency as a way of identifying a sufficient statistic for a parameter can often be a daunting road to follow. Thankfully, a theorem often referred to as the Factorization Theorem provides an easier alternative!

3.2 Factorization Theorem:

Let denote random variables with joint probability density function or joint probability mass function , which depends on the parameter . Then, the statistic is sufficient for if and only if the p.d.f (or p.m.f.) can be factored into two components, that is:

where:

is a function that depends on the data only through the function , and

the function does not depend on the parameter .

3.3 Example 2 - Poisson Distribution:

Alt text|center

Recall that the mathematical constant is the unique real number such that the value of the derivative (slope of the tangent line) of the function at the point is equal to . It turns out that the constant is irrational, but to five decimal places, it equals . Also, note that there are (theoretically) an infinite number of possible Poisson distributions. Any specific Poisson distribution depends on the parameter .

Let denote a random sample from a Poisson distribution with parameter . Find a sufficient statistic for the parameter .

Solution:

Because is a random sample, the joint probability mass function of is, by independence:

Hey, look at that! We just factored the joint p.m.f. into two functions, one () being only a function of the statistic and the other () not depending on the parameter :

Alt text|center

We can also write the joint p.m.f. as:

Therefore, the Factorization Theorem tells us that

is also a sufficient statistic for

.

If you think about it, it makes sense that and are both sufficient statistics, because if we know , we can easily find , and vice verse.

Conclusion 2:

There can be more than one sufficient statistic for a parameter . In general, if is a sufficient statistic for a parameter , then every one-to-one function of not involving is also a sufficient statistic for .

3.4 Example 3 - Gaussian Distribution :

Let be a random sample from a normal distribution with mean and variance . Find a sufficient statistic for the parameter .

Solution:

For i.i.d. data , the joint probability density function of is

A trick to making the factoring of the joint p.d.f. an easier task is to add to the quantity in parentheses in the summation. That is: Alt text|center
Now, squaring the quantity in parentheses, we get:

And then distributing the summation, we get:

But, the middle term in the exponent is , and the last term, because it doesn’t depend on the index , can be added up times: Alt text|center

So, simplifying, we get:

In summary, we have factored the joint p.d.f. into two functions, one () being only a function of the statistic and the other () not depending on the parameter :

Conclusion 3:

Therefore, the Factorization Theorem tells us that is a sufficient statistic for .
Now, is also sufficient for , because if we are given the value of , we can easily get the value of through the one-to-one function , that is .
However, is not a sufficient statistic for , because it is not a one-to-one function, with both and mapped to .

3.5 Example 4 - Exponential Distribution:

Let be a random sample from an exponential distribution with parameter . Find a sufficient statistic for the parameter .

Solution:

The joint probability density function of is, by independence:

the joint p.d.f. is:

Now, simplifying, by adding up all

of the

and the

’s in the exponents, we get:

We have again factored the joint p.d.f. into two functions, one (

) being only a function of the statistic

and the other (

) not depending on the parameter

:

Conclusion 4:

Therefore, the Factorization Theorem tells us that is a sufficient statistic for . And, since is a one-to-one function of , it implies that is also a sufficient statistic for .

4. Exponential Form

4.1 Exponential Form

You might not have noticed that in all of the examples we have considered so far in this lesson, every p.d.f. or p.m.f. could be written in what is often called exponential form, that is:

1) Exponential Form of Bernoulli Distribution:

For example, the Bernoulli random variables with p.m.f. is written in exponential form as: Alt text|center with

(1) and being functions only of ,

(2) and being functions only of the parameter , and

(3) the support not depending on the parameter .

2) Exponential Form of Poisson Distribution:

Alt text|center with

(1) and being functions only of ,

(2) and being functions only of the parameter , and

(3) the support not depending on the parameter .

3) Exponential Form of Gaussian Distribution :

Alt text|center with

(1) and being functions only of ,

(2) and being functions only of the parameter , and

(3) the support not depending on the parameter .

4) Exponential Form of Exponential Distribution:

Alt text|center with

(1) and being functions only of ,

(2) and being functions only of the parameter , and

(3) the support not depending on the parameter .

4.2 Exponential Criterion

It turns out that writing p.d.f.s and p.m.f.s in exponential form provides us yet a third way of identifying sufficient statistics for our parameters. The following theorem tells us how.

Theorem:

Let be a random sample from a distribution with a p.d.f. or p.m.f. of the exponential form:

with a support that does not depend on θ, that is,

(1) and being functions only of ,
(2) and being functions only of the parameter , and
(3) the support being free of the parameter .

Then, the statistic:

is sufficient for

.

Proof:

Collecting like terms in the exponents, we get:

which can be factored as:

We have factored the joint p.m.f. or p.d.f. into two functions:

one () being only a function of the statistic and
the other () not depending on the parameter :

Therefore, the Factorization Theorem tells us that is a sufficient statistic for .

4.3 Example 5 - Geometric Distribution:

Let be a random sample from a geometric distribution with parameter . Find a sufficient statistic for the parameter .

Solution:

The probability mass function of a geometric random variable is:

for

The p.m.f. can be written in exponential form as

Conclusion 5:

Therefore, is sufficient for . Easy as pie!

5. Two or More Parameters

What happens if a probability distribution has two parameters, and , say, for which we want to find sufficient statistics, and ? Fortunately, the definitions of sufficiency can easily be extended to accommodate two (or more) parameters. Let’s start by extending the Factorization Theorem.

5.1 Factorization Theorem

Alt text|center

5.2 Example 6 - Gaussian Distribution :

Let denote a random sample from a normal distribution . That is, denotes the mean and denotes the variance . Use the Factorization Theorem to find joint sufficient statistics for and .

Solution:

The joint probability density function of is, by independence:

Due to the Gaussian pdf

We get

Rewriting the first factor, and squaring the quantity in parentheses, and distributing the summation, in the second factor, we get:

Simplifying yet more, we get:

Look at that! We have factored the joint p.d.f. into two functions, one () being only a function of the statistic and , and the other () not depending on the parameter and : Alt text|center

Conclusion 6.1:

Therefore, the Factorization Theorem tells us that and are joint sufficient statistics for and .
And, the one-to-one functions of and , namely:

and

are also joint sufficient statistics for and .
We have just shown that the intuitive estimators of and are also sufficient estimators. That is, the data contain no more information than the estimators and do about the parameters and . That seems like a good thing!

5.3 Exponential Criterion

We have just extended the Factorization Theorem. Now, the Exponential Criterion can also be extended to accommodate two (or more) parameters. It is stated here without proof.

Exponential Criterion:

Let

be a random sample from a distribution with a p.d.f. or p.m.f. of the exponential form:

with a support that does not depend on the parameters

and

. Then, the statistics

and

are jointly sufficient for

and

.

5.4 Example 6 - Gaussian Distribution (continued):

Let denote a random sample from a normal distribution . That is, denotes the mean and denotes the variance . Use the Exponential Criterion to find joint sufficient statistics for and .

Solution:

The probability density function of a normal random variable with mean and variance can be written in exponential form as: Alt text|center

Conclusion 6.2:

Therefore, the statistics and are joint sufficient statistics for and .

6. Reference

[1]: Lesson 53: Sufficient Statistics (ttps://onlinecourses.science.psu.edu/stat414/print/book/export/html/244)

GloryOfFamily

机器学习学习笔记 PRML Chapter 2.0 : Prerequisite 1 - Sufficient Statistics

Chapter 2.0 : Prerequisite 1 - Sufficient Statistics

1. Introduction

2. Definition of Sufficiency

2.1 Definition:

2.2 Example 1 - Binomial Distribution：

Consider Bernoulli trials：

Solution:

Conclusion 1:

3. Factorization Theorem

3.1 We need more easy method to identify sufficiency:

3.2 Factorization Theorem:

3.3 Example 2 - Poisson Distribution:

Solution:

Conclusion 2:

3.4 Example 3 - Gaussian Distribution :

Solution:

Conclusion 3:

3.5 Example 4 - Exponential Distribution:

Solution:

Conclusion 4:

4. Exponential Form

4.1 Exponential Form

1) Exponential Form of Bernoulli Distribution:

2) Exponential Form of Poisson Distribution:

3) Exponential Form of Gaussian Distribution :

4) Exponential Form of Exponential Distribution:

4.2 Exponential Criterion

Theorem:

Proof:

4.3 Example 5 - Geometric Distribution:

Solution:

Conclusion 5:

5. Two or More Parameters

5.1 Factorization Theorem

5.2 Example 6 - Gaussian Distribution :

Solution:

Conclusion 6.1:

5.3 Exponential Criterion

Exponential Criterion:

5.4 Example 6 - Gaussian Distribution (continued):

Solution:

Conclusion 6.2:

6. Reference

公告