卡方分布笔记

The chi-square distribution is a continuous probability distribution that is widely used in statistical inference, particularly in the context of hypothesis testing and in the construction of confidence intervals. It arises primarily in the context of estimating the variance of a normally distributed population and in the testing of independence in contingency tables.

Here are some key points about the chi-square distribution:

1. **Definition:** The chi-square distribution with \( k \) degrees of freedom is the distribution of a sum of the squares of \( k \) independent standard normal random variables. If \( Z_1, Z_2, ..., Z_k \) are independent standard normal random variables, then the random variable \( Q \), defined as \( Q = Z_1^2 + Z_2^2 + ... + Z_k^2 \), follows a chi-square distribution with \( k \) degrees of freedom, denoted as \( Q \sim \chi^2(k) \).

2. **Shape:** The shape of the chi-square distribution depends on the degrees of freedom. With 1 degree of freedom, the distribution is heavily skewed to the right, but as the degrees of freedom increase, the distribution becomes more symmetric and approaches a normal distribution.

3. **Non-negativity:** Since it is defined as a sum of squares, the chi-square distribution is always non-negative.

4. **Applications:**
- In hypothesis testing, it is used for tests of independence and goodness-of-fit.
- In confidence interval estimation, it helps to construct intervals for population variances.
- It's also used in the analysis of variance (ANOVA).

5. **Probability Density Function (PDF):** The PDF of a chi-square distribution with \( k \) degrees of freedom is given by:

\[ f(x; k) = \frac{1}{2^{k/2}\Gamma(k/2)} x^{(k/2)-1} e^{-x/2} \]

for \( x > 0 \), where \( \Gamma \) denotes the gamma function, which extends the factorial function to non-integer values.

6. **Mean and Variance:** The mean of a chi-square distribution is equal to its degrees of freedom (\( k \)), and its variance is twice its degrees of freedom (\( 2k \)).

The chi-square distribution is a special case of the gamma distribution and plays a crucial role in various statistical methodologies.

 

 

The test statistic of chi-square distribution used specifically for a chi-square test concerning the variance of a normally distributed population is \( \frac{(n-1)s^2}{\sigma^2} \). This is different from the chi-square goodness-of-fit test or the test for independence.

Here is the context of this special test statistics:

1. **Purpose:** This test statistic is used when you want to test a hypothesis about the population variance \( \sigma^2 \). Specifically, you might be testing whether the population variance differs from a hypothesized value, which is also \( \sigma^2 \) in the formula.

2. **Sample Variance (\( s^2 \)):** This is calculated from your sample data as:

\[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]

where \( x_i \) are the sample data points, \( \bar{x} \) is the sample mean, and \( n \) is the sample size.

3. **Population Variance (\( \sigma^2 \)):** This is the hypothesized value for the population variance under the null hypothesis.

4. **Test Statistic:** The test statistic for this hypothesis test is given by:

\[ \chi^2 = \frac{(n-1)s^2}{\sigma^2} \]

where \( n-1 \) is the degrees of freedom associated with the sample variance.

5. **Chi-Square Distribution:** When the null hypothesis is true (i.e., the sample comes from a normal distribution with variance \( \sigma^2 \)), the test statistic follows a chi-square distribution with \( n-1 \) degrees of freedom.

The reason we use this particular test statistic is because of the properties of the sample variance. If the data are normally distributed, then \( (n-1)s^2/\sigma^2 \) follows a chi-square distribution with \( n-1 \) degrees of freedom. This allows us to calculate p-values and make decisions about the null hypothesis.

If you compute this statistic from your data and compare it to a critical value from the chi-square distribution with \( n-1 \) degrees of freedom, you can determine whether to reject or fail to reject your null hypothesis about the population variance.

 

An example: 

To compute the test statistic and compare it to a critical value from the chi-square distribution, you need to follow these steps:

1. **Calculate the sample variance (\( s^2 \)):**
Use your sample data to calculate \( s^2 \) with the formula:
\[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]
where \( x_i \) are the individual sample points, \( \bar{x} \) is the sample mean, and \( n \) is the sample size.

2. **Compute the test statistic:**
Use the sample variance and the hypothesized population variance (\( \sigma^2 \)) to compute the test statistic:
\[ \chi^2 = \frac{(n-1)s^2}{\sigma^2} \]

3. **Determine the critical value:**
Decide on your significance level (\( \alpha \)), commonly 0.05 (5%). Then, find the critical value from the chi-square distribution table with \( n-1 \) degrees of freedom. This value is the cutoff point beyond which you will reject the null hypothesis.

4. **Compare the test statistic to the critical value:**
- If \( \chi^2 \) is greater than the critical value, you reject the null hypothesis.
- If \( \chi^2 \) is less than or equal to the critical value, you fail to reject the null hypothesis.

Let's go through a simple example:

Suppose you have a sample of 5 measurements: 10, 12, 14, 16, 18. You want to test whether these data come from a population with a variance of 4 (\( \sigma^2 = 4 \)) at a significance level of 0.05.

**Step 1: Calculate the sample variance (\( s^2 \)):**

First, calculate the mean \( \bar{x} \):
\[ \bar{x} = \frac{10 + 12 + 14 + 16 + 18}{5} = 14 \]

Next, calculate \( s^2 \):
\[ s^2 = \frac{(10-14)^2 + (12-14)^2 + (14-14)^2 + (16-14)^2 + (18-14)^2}{5-1} \]
\[ s^2 = \frac{(-4)^2 + (-2)^2 + 0^2 + 2^2 + 4^2}{4} \]
\[ s^2 = \frac{16 + 4 + 0 + 4 + 16}{4} \]
\[ s^2 = \frac{40}{4} = 10 \]

**Step 2: Compute the test statistic:**

\[ \chi^2 = \frac{(5-1)10}{4} = \frac{40}{4} = 10 \]

**Step 3: Determine the critical value:**

Looking at a chi-square distribution table for \( n-1 = 5-1 = 4 \) degrees of freedom and \( \alpha = 0.05 \), you find that the critical value is approximately 9.488.

**Step 4: Compare the test statistic to the critical value:**

Since our calculated \( \chi^2 = 10 \) is greater than the critical value of approximately 9.488, we reject the null hypothesis that the population variance is 4.

This is a simplified example to illustrate the process. In practice, you would use larger samples and more precise methods (like software or a calculator) for finding the critical values from the chi-square distribution.

 

 

In other contexts, the test statistic of a chi-square distribution typically takes the form:

\[ \chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} \]

Here's how the chi-square test statistic is used in different contexts:

1. **Goodness-of-Fit Test:**
In a goodness-of-fit test, you are testing how well your observed data fit an expected distribution. The "Observed" values are the counts you have actually observed in your data. The "Expected" values are what you would expect to observe if the null hypothesis were true (usually based on theoretical probabilities). Each term in the sum corresponds to a different category or outcome.

2. **Test of Independence:**
In a test of independence (often used with contingency tables), you're looking at whether two categorical variables are independent of each other. The "Observed" values are the counts in each cell of the table, and the "Expected" values are calculated based on the product of the marginal totals divided by the grand total.

3. **Homogeneity Test:**
Similar to the test of independence, this test compares the distribution of counts across different populations using a contingency table. The computation of the "Expected" values is similar to that of the test of independence.

For all these tests, if the chi-square test statistic is sufficiently large, it suggests that there is a significant difference between the observed and expected values, leading to the rejection of the null hypothesis. The degrees of freedom for the chi-square distribution used to determine this statistic depend on the number of categories or cells minus any constraints required by the null hypothesis.

The critical value for \( \chi^2 \) is determined by the desired significance level (commonly 0.05) and the degrees of freedom. If the calculated \( \chi^2 \) statistic is greater than the critical value from the chi-square distribution with the appropriate degrees of freedom, the null hypothesis is rejected.

 

 

Of course, let's consider a more realistic example where the observed frequencies do not match the expected frequencies exactly.

**Example: Goodness-of-Fit Test for a Six-Sided Die**

Suppose you suspect that a six-sided die is biased. To test this, you roll the die 120 times and record the following observed counts for each face:

- Face 1: 16 times
- Face 2: 20 times
- Face 3: 25 times
- Face 4: 19 times
- Face 5: 18 times
- Face 6: 22 times

If the die is fair, you would expect each face to come up \( \frac{1}{6} \) of the time. Therefore, the expected frequency for each face is \( \frac{120}{6} = 20 \) times.

Now, calculate the chi-square test statistic:

\[ \chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} \]

Substituting in our observed and expected frequencies:

\[ \chi^2 = \frac{(16-20)^2}{20} + \frac{(20-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(19-20)^2}{20} + \frac{(18-20)^2}{20} + \frac{(22-20)^2}{20} \]

\[ \chi^2 = \frac{(-4)^2}{20} + \frac{0^2}{20} + \frac{5^2}{20} + \frac{(-1)^2}{20} + \frac{(-2)^2}{20} + \frac{2^2}{20} \]

\[ \chi^2 = \frac{16}{20} + 0 + \frac{25}{20} + \frac{1}{20} + \frac{4}{20} + \frac{4}{20} \]

\[ \chi^2 = 0.8 + 0 + 1.25 + 0.05 + 0.2 + 0.2 \]

\[ \chi^2 = 2.5 \]

You would then compare this \( \chi^2 \) value to the critical value from the chi-square distribution table with \( n - 1 = 6 - 1 = 5 \) degrees of freedom at your chosen significance level (commonly 0.05).

Suppose the critical value for \( \chi^2_{(5)} \) at a significance level of 0.05 is approximately 11.07. Since our calculated \( \chi^2 \) value of 2.5 is less than the critical value of 11.07, we would fail to reject the null hypothesis and conclude that there isn't enough evidence to suggest the die is biased based on our sample.

 

 

Sure, let's perform a chi-square test of independence using a hypothetical example.

**Example: Chi-Square Test of Independence for Preference by Gender**

Imagine we conducted a survey to understand if there is an association between gender and preference for three different flavors of ice cream: Chocolate, Vanilla, and Strawberry. The respondents were asked to choose their favorite flavor, and the results were as follows:

| | Chocolate | Vanilla | Strawberry | Total |
|------------|-----------|---------|------------|-------|
| Male | 30 | 10 | 10 | 50 |
| Female | 20 | 20 | 10 | 50 |
| Total | 50 | 30 | 20 | 100 |

To perform the chi-square test of independence, follow these steps:

1. **Calculate Expected Frequencies:**
The expected frequency for each cell in a contingency table is calculated by multiplying the sum of the row by the sum of the column and then dividing by the grand total.

For example, the expected frequency for males who prefer chocolate is:
\( E_{\text{Chocolate, Male}} = \frac{\text{Total Male} \times \text{Total Chocolate}}{\text{Grand Total}} = \frac{50 \times 50}{100} = 25 \)

You would perform this calculation for each cell.

2. **Compute the Chi-Square Statistic:**
Use the formula:
\( \chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} \)

For males who prefer chocolate:
\( \chi^2_{\text{Chocolate, Male}} = \frac{(30 - 25)^2}{25} = \frac{25}{25} = 1 \)

Repeat this for every cell and sum the values to get the total chi-square statistic.

3. **Determine Degrees of Freedom:**
The degrees of freedom for a test of independence is calculated as:
\( df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1) \)

In this case:
\( df = (2 - 1) \times (3 - 1) = 1 \times 2 = 2 \)

4. **Find the Critical Value or P-Value:**
Using a chi-square distribution table or calculator, find the critical value for your chosen significance level (commonly 0.05) with the calculated degrees of freedom. Alternatively, compute the p-value.

5. **Make a Decision:**
If the chi-square statistic is greater than the critical value, or if the p-value is less than the significance level, reject the null hypothesis that there is no association between gender and ice cream flavor preference.

Let's compute the chi-square statistic for all cells:

\[ \chi^2 = \frac{(30-25)^2}{25} + \frac{(10-15)^2}{15} + \frac{(10-10)^2}{10} + \frac{(20-25)^2}{25} + \frac{(20-15)^2}{15} + \frac{(10-10)^2}{10} \]

\[ \chi^2 = 1 + 1.666 + 0 + 1 + 1.666 + 0 \]

\[ \chi^2 = 5.332 \]

Assuming a critical value from the chi-square distribution table for \( df = 2 \) and significance level \( \alpha = 0.05 \) is approximately 5.991. Since our chi-square statistic (5.332) is less than the critical value (5.991), we would fail to reject the null hypothesis at this level of significance, concluding that there is not enough evidence to suggest an association between gender and ice cream flavor preference in the population sampled.

 

 

Certainly! Let's perform a chi-square test for homogeneity using a hypothetical example.

**Example: Chi-Square Test for Homogeneity for Educational Preferences Across Different Cities**

Imagine an education department wants to determine if the preference for a particular type of educational program (Science, Arts, Commerce) is the same across three different cities. They sample a number of students in each city and ask them about their preferred educational program. The results are as follows:

| | Science | Arts | Commerce | Total |
|------------|---------|------|----------|-------|
| City A | 90 | 60 | 50 | 200 |
| City B | 80 | 70 | 50 | 200 |
| City C | 75 | 55 | 70 | 200 |
| Total | 245 | 185 | 170 | 600 |

To perform the chi-square test for homogeneity, follow these steps:

1. **Calculate Expected Frequencies:**
The expected frequency for each cell in the table is calculated by multiplying the row total by the column total and then dividing by the grand total.

For example, the expected frequency for Science in City A is:
\( E_{\text{Science, City A}} = \frac{\text{Total City A} \times \text{Total Science}}{\text{Grand Total}} = \frac{200 \times 245}{600} \)

Perform this calculation for each cell.

2. **Compute the Chi-Square Statistic:**
Use the formula:
\( \chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} \)

For Science in City A:
\( \chi^2_{\text{Science, City A}} = \frac{(90 - E_{\text{Science, City A}})^2}{E_{\text{Science, City A}}} \)

Repeat this for every cell and sum the values to get the total chi-square statistic.

3. **Determine Degrees of Freedom:**
The degrees of freedom for a test of homogeneity is calculated as:
\( df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1) \)

In this case:
\( df = (3 - 1) \times (3 - 1) = 2 \times 2 = 4 \)

4. **Compare to Critical Value:**
Determine the critical value from the chi-square distribution table with the calculated degrees of freedom at your chosen significance level (commonly 0.05). Compare your chi-square statistic to this critical value to decide whether to reject the null hypothesis.

The null hypothesis for the chi-square test for homogeneity states that there are no differences in preferences across the groups being compared—in this case, the students from different cities. If your chi-square statistic is greater than the critical value, you would reject the null hypothesis, suggesting that preferences do vary significantly by city. If it is lower, you would fail to reject the null hypothesis, suggesting no significant difference in preferences across cities.

posted @ 2024-02-14 17:13  热爱工作的宁致桑  阅读(25)  评论(0编辑  收藏  举报