10 Two-Population Inferences

Learning Objectives

After completing this chapter, you will be able to:

Construct and interpret confidence intervals for the difference between two population means with large samples
Apply pooled variance methods for small sample comparisons when population variances are equal
Use adjusted degrees of freedom methods when population variances are unequal
Implement paired-sample analysis for before-after or matched-pairs designs
Estimate and test differences between two population proportions
Determine appropriate sample sizes for two-population studies
Conduct hypothesis tests comparing two means using independent samples
Perform paired-sample hypothesis tests
Test hypotheses about differences between two proportions
Apply two-population inference methods to real business decision-making scenarios

10.1 Opening Scenario: U.S. Foreign Investment Strategy

International Investment Analyst Challenge

Context: October 1996 - Fortune Magazine Analysis

Fortune magazine published a series of articles examining trends in U.S. foreign trade, focusing on massive international transactions and the competition between Europe and Asia for American investment dollars.

The Numbers:

European Investment: $364 billion in 1996, up 17% from previous year’s record
Asian Investment: Over $100 billion, representing 16% growth in U.S. commercial participation

The Debate:

These articles challenged conventional wisdom that U.S. companies preferred investing in Asia’s rapidly growing economies over Europe’s established markets. Instead, the analysis suggested American business interests still view Europe as offering more lucrative opportunities for corporate growth.

Your Role:

You are an international analyst for a major U.S. corporation. You must prepare a comprehensive comparative report on the advantages of investing in each geopolitical region. This report will be presented to division executives who will decide the future course of foreign investment for the coming years.

graph TD
    A[Inferences about<br/>Two Populations] --> B[Interval Estimation]
    A --> C[Hypothesis Testing]
    
    B --> B1[Independent<br/>Samples]
    B --> B2[Paired<br/>Sampling]
    B --> B3[Difference between<br/>Two Proportions]
    
    B1 --> B1a[Estimation with<br/>Large Samples]
    B1 --> B1b[Equal Variances -<br/>Pooled Data]
    B1 --> B1c[Unequal<br/>Variances]
    
    C --> C1[Independent<br/>Samples]
    C --> C2[Paired<br/>Samples]
    C --> C3[Tests for the<br/>Difference between<br/>Two Proportions]
    
    C1 --> C1a[Tests with<br/>Large Samples]
    C1 --> C1b[Equal Variances -<br/>Pooled Data]
    C1 --> C1c[Unequal<br/>Variances]
    
    style A fill:#333,stroke:#000,stroke-width:4px,color:#fff
    style B fill:#fff,stroke:#000,stroke-width:3px
    style C fill:#fff,stroke:#000,stroke-width:3px

Chapter 9: Two-Population Inferences Structure

10.1.1 Your Analysis Requirements

To guide the investment decision, you must:

Compare average return on investment in Europe versus Asia
Determine which region has a lower percentage of failed investment projects
Estimate average investment levels in both Europe and Asia
Provide thorough comparison of all relevant financial measures in these two foreign markets

This chapter provides the statistical tools you’ll need to make these critical comparisons.

10.2 9.1 Introduction: Why Compare Two Populations?

Chapters 7 and 8 demonstrated how to construct confidence intervals and test hypotheses for a single population. However, many of the most important business questions require comparing two populations:

Real-World Comparison Questions

Manufacturing: - What’s the difference, if any, between the average durability of ski boots made by North Slope versus those produced by Head?

Operations: - Do workers in Plant A produce more on average than workers in Plant B?

Quality Control: - Is there a difference between the proportion of defective units produced by Method 1 versus Method 2?

Marketing: - Does advertising campaign A generate more sales than campaign B?

Human Resources: - Are employee satisfaction scores higher after implementing a new policy?

Finance: - Do European investments yield higher returns than Asian investments?

10.2.1 Two Fundamental Sampling Approaches

The exact statistical procedure depends on the sampling technique used:

1. Independent Samples

As the name indicates, independent sampling involves collecting separate, unrelated samples from each population.

Samples don’t need to be the same size
Observations in one sample have no relationship to observations in the other
Example: Comparing durability of 100 Brand A tires versus 80 Brand B tires

2. Paired Samples (Matched Pairs)

With paired sampling, observations from each population are matched or correspond to each other.

Observations are paired based on similarity in relevant characteristics
Also called “before-after” comparisons when measuring same units twice
Example: Measuring employee productivity before and after training

We’ll begin with independent sampling methods.

10.3 9.2 Interval Estimation for Independent Samples

When comparing two populations, we’re interested in estimating the difference between two population means: (\mu_1 - \mu_2)

The appropriate method depends on the sample sizes n_1 and n_2:

Large samples (both n_1 \geq 30 and n_2 \geq 30): Use Z-distribution
Small samples (either n_1 < 30 or n_2 < 30): Use t-distribution

10.3.1 A. Estimation with Large Samples (n₁ ≥ 30 and n₂ ≥ 30)

Point Estimate:

The point estimate of the difference (\mu_1 - \mu_2) is the difference between sample means:

\bar{X}_1 - \bar{X}_2

Sampling Distribution:

When both n_1 and n_2 are large, the sampling distribution of differences (\bar{X}_1 - \bar{X}_2) follows a normal distribution centered at (\mu_1 - \mu_2).

Standard Error of the Difference:

The standard error measures how much the differences between sample means tend to vary:

Standard Error of Difference Between Sample Means

\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

Where: - \sigma_1^2, \sigma_2^2 = population variances - n_1, n_2 = sample sizes

In practice, population variances are usually unknown. We estimate them using sample variances:

s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

Confidence Interval Formula:

Confidence Interval for (\mu_1 - \mu_2) — Large Samples

\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm Z \cdot s_{\bar{X}_1 - \bar{X}_2}

Where: - (\bar{X}_1 - \bar{X}_2) = point estimate - Z = critical Z-value for desired confidence level - s_{\bar{X}_1 - \bar{X}_2} = estimated standard error

Important Note: We’re not interested in the individual values of \mu_1 or \mu_2, but only in their difference.

10.3.2 Example: Transfer Trucking Route Comparison

Scenario: Delivery Time Analysis

Transfer Trucking transports shipments between Chicago and Kansas City using two different routes. The dispatcher, Delmar, wants to determine if there’s a difference in average transit times.

Sample Data:

Route	Sample Size	Mean Time	Std Dev
North	n = 100	\bar{X}_N = 17.2 hrs	s_N = 5.3 hrs
South	n = 75	\bar{X}_S = 19.4 hrs	s_S = 4.5 hrs

Objective: Develop a 95% confidence interval for the difference in average transit time.

Solution:

Step 1: Calculate Standard Error

Since population standard deviations are unknown, we use the sample standard deviations:

s_{\bar{X}_N - \bar{X}_S} = \sqrt{\frac{s_N^2}{n_N} + \frac{s_S^2}{n_S}} = \sqrt{\frac{(5.3)^2}{100} + \frac{(4.5)^2}{75}}

= \sqrt{\frac{28.09}{100} + \frac{20.25}{75}} = \sqrt{0.2809 + 0.27} = \sqrt{0.5509} = 0.742 \text{ hours}

Step 2: Find Critical Value

For 95% confidence level: Z = 1.96

Step 3: Compute Confidence Interval

\text{C.I.} = (\bar{X}_N - \bar{X}_S) \pm Z \cdot s_{\bar{X}_N - \bar{X}_S}

= (17.2 - 19.4) \pm (1.96)(0.742)

= -2.2 \pm 1.45

-3.65 \leq (\mu_N - \mu_S) \leq -0.75

Interpretation:

The results can be interpreted two ways:

Technical: Delmar can be 95% confident that (\mu_N - \mu_S) is between -3.65 hours and -0.75 hours.
Practical: Since we subtracted the South route mean from the North route mean and got negative numbers, Delmar can be 95% confident that the South route takes between 0.75 and 3.65 hours longer than the North route.

Business Decision: The North route is consistently faster. If minimizing transit time is important, Transfer Trucking should prioritize the North route.

10.4 Section Exercises: Large Sample Confidence Intervals

1. Clark Insurance Claims Analysis: Clark Insurance sells policies to residents throughout the Chicago area. The owner wants to estimate the difference in average claims between people living in urban zones versus those in suburbs.

Urban sample: n = 180, \bar{X} = $2,025, s = $918
Suburban sample: n = 200, \bar{X} = $1,802, s = $512

What does a 95% confidence interval tell the owner about average claims filed by these two groups?

2. Steel Tube Production Comparison: Two production processes are used to produce steel tubes.

Process 1: n = 100, \bar{X} = 27.3 inches, s = 10.3 inches
Process 2: n = 100, \bar{X} = 30.1 inches, s = 5.2 inches

What does a 99% confidence interval reveal about the difference in average lengths of tubes produced by these two methods?

3. Chapman Industries Phone System Comparison: Chuck Chapman wants to determine if customers calling one phone system are kept on hold longer on average than those calling another system.

System 1: n = 75, \bar{X} = 25.2 seconds, s = 4.8 seconds
System 2: n = 70, \bar{X} = 21.3 seconds, s = 3.8 seconds

What recommendation would you provide to Chuck based on a 90% confidence interval estimating the difference in average wait times if he wants to minimize customer wait time?

4. Production Design Time Comparison: Two production designs are used to manufacture a certain product.

Old design: n = 150, \bar{X} = 3.51 days, s = 0.79 days
New design: n = 150, \bar{X} = 3.32 days, s = 0.73 days

What does a 99% confidence interval reveal about the difference between average times required to make the product? Which design should be used?

5. Conceptual Question: Explain exactly what the standard error of the difference between sample means actually measures.

End of Stage 1

This completes the first stage covering: - Introduction to two-population comparisons - Opening scenario (U.S. foreign investment strategy) - Large sample confidence intervals for independent samples - Standard error calculation - Complete worked example with interpretation - Section exercises

Coming in Stage 2: - Small sample methods (t-distribution) - Pooled variance approach (equal variances) - Separate variance approach (unequal variances) - More comprehensive examples # Two-Population Inferences - Stage 2

10.5 9.2B Estimation with Small Samples: The t-Distribution

When either sample is small (n < 30), we cannot assume that the distribution of differences (\bar{X}_1 - \bar{X}_2) follows a normal distribution. Therefore, we cannot use the Z-distribution.

Requirements for Using t-Distribution:

Populations are normally distributed (or approximately normal)
Population variances are unknown

When these conditions are met, we must use the t-distribution.

10.5.1 Critical Question: Are the Variances Equal?

An important consideration is whether the two population variances are equal (\sigma_1^2 = \sigma_2^2).

Why This Matters

The paradox: How can we assume variances are equal if we don’t know what they are?

The answer: In many practical situations, there are reasonable grounds for assuming equal variances:

Assembly line processes: Machines periodically adjusted may have changing average fill levels, but variance in fill amounts remains constant
Before-after comparisons: Training may change average performance, but performance variability stays the same
Quality control: Defect rates may shift, but variation in defect patterns remains stable

Later in this chapter, we’ll present a formal F-test to statistically test whether two variances are equal.

We’ll examine two approaches:

Equal variances (\sigma_1^2 = \sigma_2^2): Use pooled variance method
Unequal variances (\sigma_1^2 \neq \sigma_2^2): Use separate variance method with adjusted degrees of freedom

10.5.2 1. Equal Variances: Pooled Variance Method

When population variances are equal, there exists some common variance \sigma^2 shared by both populations:

\sigma_1^2 = \sigma_2^2 = \sigma^2

However, due to sampling error, the two sample variances s_1^2 and s_2^2 will likely differ from each other and from the common \sigma^2.

Solution: Pool the data from both samples to obtain a single estimate of \sigma^2.

Pooled Variance Formula

s_p^2 = \frac{s_1^2(n_1 - 1) + s_2^2(n_2 - 1)}{n_1 + n_2 - 2}

This is a weighted average of the two sample variances, where the weights are the degrees of freedom (n - 1) for each sample.

Degrees of freedom: df = n_1 + n_2 - 2

Confidence Interval Formula:

C.I. for (\mu_1 - \mu_2) with Equal Variances (Small Samples)

\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}

Where: - t = critical t-value with n_1 + n_2 - 2 degrees of freedom - s_p^2 = pooled variance estimate

10.5.3 Example: Vending Machine Beverage Dispenser

Scenario: Quality Control for Student Cafeteria

A vending machine in the student cafeteria dispenses beverages into paper cups. The facilities manager wants to know if a recent machine adjustment changed the average fill amount.

Sample Data:

Timing	Sample Size	Mean	Variance
Before adjustment	n₁ = 15	\bar{X}_1 = 15.3 oz	s_1^2 = 3.5
After adjustment	n₂ = 10	\bar{X}_2 = 17.1 oz	s_2^2 = 3.9

Assumptions: - Variance \sigma^2 is constant before and after adjustment - Dispensed amounts are normally distributed

Objective: Construct a 95% confidence interval for the difference in average fill amounts.

Solution:

Step 1: Calculate Pooled Variance

s_p^2 = \frac{s_1^2(n_1 - 1) + s_2^2(n_2 - 1)}{n_1 + n_2 - 2}

= \frac{3.5(14) + 3.9(9)}{15 + 10 - 2} = \frac{49 + 35.1}{23} = \frac{84.1}{23} = 3.66

Step 2: Find Critical t-Value

Confidence level: 95% → α = 0.05
Degrees of freedom: df = n_1 + n_2 - 2 = 15 + 10 - 2 = 23
From t-table: t_{0.025, 23} = 2.069

Step 3: Calculate Confidence Interval

\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}

= (15.3 - 17.1) \pm 2.069 \sqrt{3.66 \left(\frac{1}{15} + \frac{1}{10}\right)}

= -1.8 \pm 2.069 \sqrt{3.66(0.0667 + 0.1)} = -1.8 \pm 2.069\sqrt{3.66 \times 0.1667}

= -1.8 \pm 2.069\sqrt{0.6103} = -1.8 \pm 2.069(0.781) = -1.8 \pm 1.61

-3.41 \leq (\mu_1 - \mu_2) \leq -0.19

Interpretation:

Subtracting the after-adjustment mean (17.1 oz) from the before-adjustment mean (15.3 oz) produces negative values for both endpoints.

Conclusion: We can be 95% confident that the adjustment increased the average fill level by between 0.19 and 3.41 ounces.

The interval does not contain zero, confirming a real change occurred.

10.5.4 Example 9.2: Labor Negotiations — Atlanta vs. Newport News

Scenario: Wage Equity Analysis

Labor negotiations between your company and the workers’ union are on the verge of breaking down. There’s considerable disagreement about average wage levels between workers at the Atlanta plant and the Newport News, Virginia plant.

Background: - Wages were set by the old labor agreement three years ago - Wages are based strictly on seniority - Contract tightly controls wages → variance is same at both plants ✓ - Wages are normally distributed ✓ - Different seniority patterns → different average wages suspected

Your Task:

The management negotiator wants you to develop a 98% confidence interval to estimate the difference between average wage levels. If a difference exists, adjustments must be made to bring lower wages up to match higher wages.

Sample Data:

	Atlanta Plant	Newport News Plant
Sample size	n_A = 23	n_N = 19
Sample mean	\bar{X}_A = \$17.53/hr	\bar{X}_N = \$15.50/hr
Sample variance	s_A^2 = 92.10	s_N^2 = 87.10

Solution:

Step 1: Calculate Pooled Variance

s_p^2 = \frac{s_A^2(n_A - 1) + s_N^2(n_N - 1)}{n_A + n_N - 2}

= \frac{92.10(22) + 87.10(18)}{23 + 19 - 2} = \frac{2026.2 + 1567.8}{40} = \frac{3594}{40} = 89.85

Step 2: Find Critical t-Value

α = 0.02 (98% confidence level)
df = 23 + 19 - 2 = 40
From t-table: t_{0.01, 40} = 2.423

Step 3: Calculate Confidence Interval

\text{C.I.} = (\bar{X}_A - \bar{X}_N) \pm t \sqrt{s_p^2 \left(\frac{1}{n_A} + \frac{1}{n_N}\right)}

= (17.53 - 15.50) \pm 2.423 \sqrt{89.85 \left(\frac{1}{23} + \frac{1}{19}\right)}

= 2.03 \pm 2.423 \sqrt{89.85(0.0435 + 0.0526)} = 2.03 \pm 2.423\sqrt{89.85 \times 0.0961}

= 2.03 \pm 2.423\sqrt{8.635} = 2.03 \pm 2.423(2.939) = 2.03 \pm 7.12

-5.09 \leq (\mu_A - \mu_N) \leq 9.15

Interpretation:

We can be 98% confident that the average Atlanta wage is between: - $5.09 less than Newport News wages, OR - $9.15 more than Newport News wages

Critical Finding: Interval Contains Zero

Because this interval contains $0, we can be 98% confident that no difference exists in average wages between the two plants.

Business Recommendation: No wage adjustment is warranted. The apparent difference of $2.03/hr could easily be due to sampling variation rather than a true population difference.

10.6 2. Unequal Variances: Separate Variance Method

When population variances are unequal (\sigma_1^2 \neq \sigma_2^2), or there’s no evidence to assume equality, the pooled variance method doesn’t apply.

The Challenge:

The distribution of (\bar{X}_1 - \bar{X}_2) doesn’t follow a t-distribution with n_1 + n_2 - 2 degrees of freedom. No exact distribution has been found, only approximations.

The Solution:

Use a modified t-statistic (t') with adjusted degrees of freedom.

Adjusted Degrees of Freedom (Welch-Satterthwaite)

\text{df} = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}

Rule: If df is fractional, round DOWN to the nearest integer.

Confidence Interval Formula:

C.I. for (\mu_1 - \mu_2) with Unequal Variances

\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

Where: - t' = critical t-value with adjusted df - No pooling of variances

10.6.1 Example: IBM Executive Training Programs

Scenario: Wall Street Journal Report

The Wall Street Journal described two training programs used by IBM. A comparison is needed to determine if one program is more effective.

Sample Data:

Program	Sample Size	Mean Score	Variance
Program 1	n₁ = 12	\bar{X}_1 = 73.5	s_1^2 = 100.2
Program 2	n₂ = 15	\bar{X}_2 = 79.8	s_2^2 = 121.3

Note: Variances appear different → use separate variance method

Objective: Construct a 95% confidence interval for the difference in average scores.

Solution:

Step 1: Calculate Adjusted Degrees of Freedom

\text{df} = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}

= \frac{\left(\frac{100.2}{12} + \frac{121.3}{15}\right)^2}{\frac{(100.2/12)^2}{11} + \frac{(121.3/15)^2}{14}}

= \frac{(8.35 + 8.087)^2}{\frac{(8.35)^2}{11} + \frac{(8.087)^2}{14}} = \frac{(16.437)^2}{\frac{69.72}{11} + \frac{65.40}{14}}

= \frac{270.18}{6.338 + 4.671} = \frac{270.18}{11.009} = 24.55

Round down: df = 24

Step 2: Find Critical t’-Value

95% confidence level
df = 24
From t-table: t'_{0.025, 24} = 2.064

Step 3: Calculate Confidence Interval

\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

= (73.5 - 79.8) \pm 2.064 \sqrt{\frac{100.2}{12} + \frac{121.3}{15}}

= -6.3 \pm 2.064 \sqrt{8.35 + 8.087} = -6.3 \pm 2.064\sqrt{16.437}

= -6.3 \pm 2.064(4.054) = -6.3 \pm 8.36

-14.66 \leq (\mu_1 - \mu_2) \leq 2.06

Interpretation:

Because the interval contains zero, there’s no strong evidence of a difference in training program effectiveness. Either program appears equally suitable for training IBM executives.

10.6.2 Example 9.3: Acme Ltd. Rubber Shock Absorbers

Scenario: Product Durability Comparison

Acme Ltd. sells two types of rubber shock absorbers for baby carriages. Wear tests measure durability.

Sample Data:

Type	Sample Size	Mean Duration	Std Dev
Type 1	n₁ = 13	\bar{X}_1 = 11.3 weeks	s₁ = 3.5 weeks
Type 2	n₂ = 10	\bar{X}_2 = 7.5 weeks	s₂ = 2.7 weeks

Business Context: - Type 1 is more expensive to manufacture - CEO will only use Type 1 if it lasts at least 8 weeks longer than Type 2 - CEO tolerates only 2% probability of error (α = 0.02) - No evidence that variances are equal → use separate variance method

Solution:

Step 1: Calculate Adjusted Degrees of Freedom

\text{df} = \frac{\left[\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}\right]^2}{\frac{[(3.5)^2/13]^2}{12} + \frac{[(2.7)^2/10]^2}{9}}

= \frac{\left[\frac{12.25}{13} + \frac{7.29}{10}\right]^2}{\frac{(0.942)^2}{12} + \frac{(0.729)^2}{9}}

= \frac{(0.942 + 0.729)^2}{\frac{0.887}{12} + \frac{0.531}{9}} = \frac{(1.671)^2}{0.0739 + 0.059}

= \frac{2.792}{0.1329} = 20.99 \approx 20 \text{ (round down)}

Step 2: Find Critical t’-Value

98% confidence level (α = 0.02)
df = 20
From t-table: t'_{0.01, 20} = 2.528

Step 3: Calculate Confidence Interval

\text{C.I.} = (11.3 - 7.5) \pm 2.528 \sqrt{\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}}

= 3.8 \pm 2.528 \sqrt{0.942 + 0.729} = 3.8 \pm 2.528\sqrt{1.671}

= 3.8 \pm 2.528(1.293) = 3.8 \pm 3.3

0.5 \leq (\mu_1 - \mu_2) \leq 7.1 \text{ weeks}

Interpretation:

Acme can be 98% confident that Type 1 lasts between 0.5 and 7.1 weeks longer than Type 2.

Business Decision: Do NOT Use Type 1

The required difference of 8 weeks is NOT in the interval. Even at the upper limit (7.1 weeks), Type 1 doesn’t meet the CEO’s criterion.

Recommendation: Continue using Type 2 (the less expensive option) since Type 1 doesn’t provide sufficient additional durability to justify its higher manufacturing cost.

10.7 Section Exercises: Small Sample Confidence Intervals

6. Conceptual Question: What conditions must the t-distribution meet before it can be used for two-population inference?

7. Croc Aid vs. Energy Pro: Seventeen cans of Croc Aid show a mean of 17.2 ounces with a standard deviation of 3.2 ounces, and 13 cans of Energy Pro produce a mean of 18.1 ounces and s = 2.7 ounces. Assuming equal variances and normal distributions in population weights, what conclusions can be drawn regarding the difference in average weights based on a 98% confidence interval?

8. Grow-rite Fertilizer Complaint: Grow-rite sells commercial fertilizer produced in two plants (Atlanta and Dallas). Recent customer complaints suggest Atlanta shipments are underweight compared to Dallas shipments. If 10 boxes from Atlanta average 96.3 pounds with s = 12.5, and 15 boxes from Dallas average 101.1 with s = 10.3, does a 99% confidence interval confirm this complaint? Assume equal variances.

9. Opus Gold Extraction: Opus, Inc. has developed a process for producing gold from seawater. Fifteen gallons from the Pacific Ocean produced a mean of 12.7 ounces of gold per gallon with s = 4.2 ounces, and 12 gallons from the Atlantic Ocean produced similar figures of 15.9 and 1.7. Based on a 95% interval, what is your estimate of the difference in average ounces of gold from these two sources? There’s no reason to assume variances are equal.

10. Ralphie’s Apartment Search: Ralphie starts college next fall. He samples apartments on the north and south ends of the city to see if there’s a difference in average rents.

North apartments: $600, $650, $530, $800, $750, $700, $750
South apartments: $500, $450, $800, $650, $500, $500, $450, $400

If there’s no evidence that variances are equal, what does a 99% interval tell Ralphie about the difference in average rents?

11. Bigelow Products Sales Comparison: Bigelow Products wants to develop a 95% interval to estimate the difference in average weekly sales in two target markets. A sample of 9 weeks in market 1 produced a mean and standard deviation (in hundreds of dollars) of 5.72 and 1.008 respectively. Comparable figures for market 2, based on a sample of 10 weeks, were 8.72 and 1.208. Assuming equal variances, what results do they report?

12. U.S. Manufacturing Supplier Comparison: U.S. Manufacturing buys raw materials from two suppliers. Management is concerned about production delays from late shipment deliveries. A sample of 10 shipments from Supplier A have an average delivery time of 6.8 days and s = 2.57 days, while 12 shipments from Supplier B average 4.08 days and s = 1.93. If equal variances cannot be assumed, what recommendation would you make based on a 90% interval for the difference in average delivery times?

End of Stage 2

This completes the second stage covering: - Small sample confidence intervals using t-distribution - Pooled variance method (equal variances assumed) - Separate variance method (unequal variances) - Adjusted degrees of freedom calculation (Welch-Satterthwaite) - Multiple worked examples with business interpretations - Python visualizations of confidence intervals - 7 comprehensive section exercises

Coming in Stage 3: - Paired samples (matched pairs) analysis - Confidence intervals for difference between two proportions - Sample size determination - Complete examples and exercises # Two-Population Inferences - Stage 3

10.8 9.3 Paired Samples: Matched Pairs Analysis

In the previous sections, we examined independent samples—observations from one population are completely unrelated to observations from the second population. Now we examine paired samples (also called matched pairs or dependent samples).

What Are Paired Samples?

Paired samples occur when observations are matched in pairs, creating a natural correspondence between measurements.

Common pairing scenarios:

Before-After comparisons: Same subject measured twice
- Employee performance before and after training
- Hospital costs before and after new billing system
- Product quality before and after process improvement
Matched subjects: Different subjects paired by characteristics
- Twins in medical studies
- Similar stores in different locations
- Matched competitors in market research
Repeated measurements: Same unit tested under different conditions
- Car performance with different fuels
- Machine output with different settings
- Customer satisfaction across product versions

10.8.1 Why Use Paired Samples?

Key Advantage: Pairing reduces variability by controlling for differences between subjects.

Example: Testing a training program’s effectiveness - Independent samples approach: Compare Group A (trained) vs. Group B (untrained) - Problem: Groups may differ in experience, education, motivation - Result: High variability masks training effect

Paired samples approach: Measure each person before AND after training
- Benefit: Each person serves as their own control
- Result: Lower variability, more powerful test

10.8.2 The Paired Difference Approach

Instead of analyzing two separate samples, we analyze one sample of differences.

Key Transformation: Two Samples → One Sample

For each pair (X_{1i}, X_{2i}), calculate the difference:

d_i = X_{1i} - X_{2i}

Then analyze the differences using single-sample methods!

Sample mean of differences: \bar{d} = \frac{\sum d_i}{n}
Sample standard deviation: s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}
Sample size: n = number of pairs

10.8.3 Confidence Interval for Paired Differences

C.I. for (\mu_1 - \mu_2) Using Paired Samples

\text{C.I. for } (\mu_1 - \mu_2) = \bar{d} \pm t \frac{s_d}{\sqrt{n}}

Where: - \bar{d} = mean of the paired differences - s_d = standard deviation of the differences - t = critical t-value with n-1 degrees of freedom - n = number of pairs

Degrees of freedom: df = n - 1 (NOT 2n - 2!)

10.8.4 Example: Employee Training Program Effectiveness

Scenario: Management Development Assessment

A company instituted a training program to improve employee performance scores. To evaluate effectiveness, 10 employees were scored before and after training.

Sample Data:

Employee	Before Training	After Training	Difference (d)
1	72	78	-6
2	65	71	-6
3	83	89	-6
4	91	93	-2
5	58	68	-10
6	77	82	-5
7	69	75	-6
8	74	79	-5
9	88	91	-3
10	81	86	-5

Note: d_i = \text{Before} - \text{After} (negative values indicate improvement)

Objective: Construct a 95% confidence interval for the mean improvement.

Solution:

Step 1: Calculate Mean Difference

\bar{d} = \frac{\sum d_i}{n} = \frac{-54}{10} = -5.4

Step 2: Calculate Standard Deviation

First, find \sum d_i^2: \sum d_i^2 = (-6)^2 + (-6)^2 + (-6)^2 + (-2)^2 + (-10)^2 + (-5)^2 + (-6)^2 + (-5)^2 + (-3)^2 + (-5)^2 = 36 + 36 + 36 + 4 + 100 + 25 + 36 + 25 + 9 + 25 = 332

Then: s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}} = \sqrt{\frac{332 - 10(-5.4)^2}{9}} = \sqrt{\frac{332 - 291.6}{9}} = \sqrt{\frac{40.4}{9}} = \sqrt{4.489} = 2.12

Step 3: Find Critical t-Value

95% confidence level
df = n - 1 = 10 - 1 = 9
From t-table: t_{0.025, 9} = 2.262

Step 4: Calculate Confidence Interval

\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}} = -5.4 \pm 2.262 \times \frac{2.12}{\sqrt{10}} = -5.4 \pm 2.262 \times 0.671 = -5.4 \pm 1.52 -6.92 \leq (\mu_{\text{before}} - \mu_{\text{after}}) \leq -3.88

Interpretation:

Converting to positive values (After - Before), we can be 95% confident that training improves average performance by 3.88 to 6.92 points.

Business Conclusion: Training Is Effective

The interval does not contain zero, providing strong evidence that training genuinely improves employee performance. The company should continue the program.

Estimated ROI: With average improvement of 5.4 points, management can justify training investment based on productivity gains.

10.8.5 Example 9.4: Hospital Billing Comparison (Vicki Peplow)

Scenario: Healthcare Cost Analysis

Vicki Peplow works for a large corporation that self-insures medical costs. The company has negotiated contracts with two hospitals. Vicki wants to determine if there’s a difference in average costs between the hospitals for identical procedures.

Strategy: Use paired samples by matching the same procedures at both hospitals.

Sample Data: 15 common procedures

Procedure	Hospital 1 Cost	Hospital 2 Cost	Difference (d)
Appendectomy	$4,250	$4,180	$70
Cholecystectomy	$6,780	$6,920	-$140
Hernia repair	$3,450	$3,610	-$160
Hysterectomy	$7,200	$7,050	$150
Knee arthroscopy	$5,100	$5,280	-$180
Cataract surgery	$2,800	$2,650	$150
Tonsillectomy	$1,950	$2,100	-$150
Colonoscopy	$1,680	$1,720	-$40
Hip replacement	$18,500	$18,700	-$200
Angioplasty	$12,300	$12,150	$150
Mastectomy	$8,900	$9,200	-$300
Spinal fusion	$22,100	$22,350	-$250
Cesarean section	$6,400	$6,300	$100
Cardiac bypass	$35,200	$35,100	$100
Total knee	$24,500	$24,800	-$300

Calculations:

\sum d_i = -884 \sum d_i^2 = 400,716

Objective: Construct a 95% confidence interval for the difference in average costs.

Solution:

Step 1: Calculate Mean Difference

\bar{d} = \frac{\sum d_i}{n} = \frac{-884}{15} = -58.93

Step 2: Calculate Standard Deviation

s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}} = \sqrt{\frac{400,716 - 15(-58.93)^2}{14}} = \sqrt{\frac{400,716 - 52,099}{14}} = \sqrt{\frac{348,617}{14}} = \sqrt{24,901.2} = 157.8

Step 3: Find Critical t-Value

95% confidence level
df = 15 - 1 = 14
From t-table: t_{0.025, 14} = 2.145

Step 4: Calculate Confidence Interval

\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}} = -58.93 \pm 2.145 \times \frac{157.8}{\sqrt{15}} = -58.93 \pm 2.145 \times 40.76 = -58.93 \pm 87.4 -146.33 \leq (\mu_1 - \mu_2) \leq 28.47

Interpretation:

We can be 95% confident that the difference in average costs is between: - $146.33 less at Hospital 1, OR - $28.47 less at Hospital 2

Business Recommendation: No Clear Cost Advantage

Because this interval contains zero, there’s no statistically significant difference in costs between hospitals at the 95% confidence level.

Vicki’s Report: “Our analysis of 15 common procedures shows no consistent cost difference between Hospital 1 and Hospital 2. The company can negotiate with either hospital without concern about systematic cost differences.”

Additional Consideration: Other factors (quality ratings, location convenience, specialist availability) should guide hospital contract decisions.

10.9 9.4 Confidence Intervals for Two Proportions

Many business decisions require comparing proportions from two populations: - Defect rates from two production processes - Customer satisfaction rates across product versions - Default rates between loan portfolios - Click-through rates for two ad campaigns

10.9.1 Point Estimate for Difference

The point estimate for the difference between population proportions is:

\text{Point estimate: } p_1 - p_2

Where p_1 and p_2 are the sample proportions.

10.9.2 Standard Error of the Difference

Standard Error for (p_1 - p_2)

s_{p_1 - p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}

Requirements for using Z-distribution: - Both n_1 p_1 \geq 5 and n_1(1-p_1) \geq 5 - Both n_2 p_2 \geq 5 and n_2(1-p_2) \geq 5

10.9.3 Confidence Interval Formula

C.I. for (\pi_1 - \pi_2)

\text{C.I. for } (\pi_1 - \pi_2) = (p_1 - p_2) \pm Z \cdot s_{p_1-p_2}

Where: - p_1, p_2 = sample proportions - Z = critical Z-value for desired confidence level - s_{p_1-p_2} = standard error of the difference

10.9.4 Example: Worker Absenteeism Study

Scenario: Human Resources Analysis

A large manufacturing company suspects that night shift workers have higher absenteeism rates than day shift workers. HR collects data from both shifts.

Sample Data:

Shift	Sample Size	Number Absent	Sample Proportion
Night shift	n₁ = 150	22	p₁ = 0.147
Day shift	n₂ = 150	14	p₂ = 0.093

Objective: Construct a 90% confidence interval for the difference in absenteeism rates.

Solution:

Step 1: Verify Sample Size Requirements

Night shift: n_1 p_1 = 150(0.147) = 22 \geq 5 ✓ and n_1(1-p_1) = 150(0.853) = 128 \geq 5 ✓
Day shift: n_2 p_2 = 150(0.093) = 14 \geq 5 ✓ and n_2(1-p_2) = 150(0.907) = 136 \geq 5 ✓

Step 2: Calculate Standard Error

s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} = \sqrt{\frac{0.147(0.853)}{150} + \frac{0.093(0.907)}{150}} = \sqrt{\frac{0.1254}{150} + \frac{0.0844}{150}} = \sqrt{0.000836 + 0.000563} = \sqrt{0.001399} = 0.0374

Step 3: Find Critical Z-Value

90% confidence level → α = 0.10
Z_{0.05} = 1.645

Step 4: Calculate Confidence Interval

\text{C.I.} = (p_1 - p_2) \pm Z \cdot s_{p_1-p_2} = (0.147 - 0.093) \pm 1.645(0.0374) = 0.054 \pm 0.0615 -0.0075 \leq (\pi_1 - \pi_2) \leq 0.1155

Or: -0.75% ≤ (π₁ - π₂) ≤ 11.55%

Interpretation:

Because the interval contains zero (-0.75% to +11.55%), we cannot conclusively state that night shift absenteeism is higher at the 90% confidence level.

HR Recommendation: While the data suggests night shift may have up to 11.55% higher absenteeism, the difference could also be as low as -0.75% (day shift slightly higher). More data would be needed for a definitive conclusion.

10.9.5 Example 9.5: Ice Capades Costume Defects

Scenario: Quality Control for Entertainment Production

Ice Capades produces elaborate costumes using two manufacturing methods. Quality control inspects costumes for defects.

Sample Data:

Method	Costumes Inspected	Defective	Defect Rate
Method A	n₁ = 200	42	p₁ = 0.21
Method B	n₂ = 250	65	p₂ = 0.26

Objective: Construct a 95% confidence interval for the difference in defect rates.

Solution:

Step 1: Calculate Standard Error

s_{p_1-p_2} = \sqrt{\frac{0.21(0.79)}{200} + \frac{0.26(0.74)}{250}} = \sqrt{\frac{0.1659}{200} + \frac{0.1924}{250}} = \sqrt{0.0008295 + 0.0007696} = \sqrt{0.0015991} = 0.04

Step 2: Find Critical Z-Value

95% confidence level
Z_{0.025} = 1.96

Step 3: Calculate Confidence Interval

\text{C.I.} = (0.21 - 0.26) \pm 1.96(0.04) = -0.05 \pm 0.0784 -0.1284 \leq (\pi_A - \pi_B) \leq 0.0284

Or: -12.84% ≤ (π_A - π_B) ≤ 2.84%

Interpretation:

Method A could have a defect rate that is: - As much as 12.84% lower than Method B (Method A better), OR - As much as 2.84% higher than Method B (Method B better)

Business Decision: Methods Appear Equivalent

The interval contains zero, suggesting no statistically significant difference at 95% confidence. Ice Capades can choose either method based on cost, production speed, or other non-quality factors.

If Method A is less expensive, use Method A. If Method B is faster, use Method B. Quality differences are not proven.

10.10 9.5 Sample Size Determination for Two-Population Studies

When planning a study comparing two populations, researchers must determine adequate sample sizes.

10.10.1 For Comparing Two Means

Sample Size Formula for Two Means (Equal n)

n = \frac{Z^2(\sigma_1^2 + \sigma_2^2)}{E^2}

Where: - Z = critical Z-value for desired confidence level - \sigma_1^2, \sigma_2^2 = population variances (estimate from pilot studies) - E = desired margin of error (maximum error) - Both samples use the same size n

Example: Estimate the difference in average customer wait times between two service centers within ±2 minutes at 95% confidence. From pilot data: \sigma_1 = 8 minutes, \sigma_2 = 10 minutes.

n = \frac{(1.96)^2(64 + 100)}{(2)^2} = \frac{3.8416(164)}{4} = \frac{630.02}{4} = 157.5

Required sample size: n = 158 customers from each center.

10.10.2 For Comparing Two Proportions

Sample Size Formula for Two Proportions (Equal n)

n = \frac{Z^2[\pi_1(1-\pi_1) + \pi_2(1-\pi_2)]}{E^2}

If no prior estimates: Use conservative \pi_1 = \pi_2 = 0.5

n = \frac{Z^2(0.5)}{E^2} = \frac{0.5Z^2}{E^2}

Example: Estimate the difference in customer satisfaction rates between two product versions within ±5% at 99% confidence. No prior data available.

n = \frac{0.5(2.576)^2}{(0.05)^2} = \frac{0.5(6.636)}{0.0025} = \frac{3.318}{0.0025} = 1,327.2

Required sample size: n = 1,328 customers per product version.

10.11 Section Exercises: Paired Samples and Proportions

13. Paired Data Interpretation: A confidence interval for paired data yields: -12.5 \leq (\mu_1 - \mu_2) \leq 5.3. What can you conclude about the relationship between the two population means?

14. Magazine Subscription Prices: A consumer group wants to compare subscription prices between two magazines available at newsstands and through mail subscriptions. They sample 8 magazines:

Magazine	Newsstand	Mail	Difference
Time	$4.95	$3.50	$1.45
Newsweek	$4.50	$3.25	$1.25
Fortune	$5.95	$4.75	$1.20
Sports Illustrated	$3.95	$2.95	$1.00
Vogue	$4.25	$3.50	$0.75
Business Week	$5.50	$4.50	$1.00
The Economist	$6.95	$5.95	$1.00
National Geographic	$3.50	$2.75	$0.75

Construct a 90% confidence interval for the average price difference.

15. Diet Program Effectiveness: A weight-loss clinic measures 12 clients before and after a 6-week program:

Before (lbs): 185, 220, 198, 175, 210, 195, 188, 203, 225, 192, 178, 207
After (lbs): 178, 210, 192, 170, 201, 189, 183, 196, 215, 186, 175, 200

Calculate a 95% confidence interval for the mean weight loss.

16. Department Store Credit Usage: Two samples of credit customers show: - Downtown store: 120 customers, 69 used credit (p₁ = 0.575) - Suburban store: 150 customers, 73 used credit (p₂ = 0.487)

Construct a 99% confidence interval for the difference in credit usage rates.

17. Manufacturing Defect Rates: Process A produces 500 units with 28 defects. Process B produces 700 units with 31 defects. Find a 90% confidence interval for the difference in defect rates.

18. Sample Size for Salary Comparison: A consultant wants to estimate the difference in average salaries between two regions within ±$5,000 at 95% confidence. Pilot data suggests \sigma_1 = \$18,000 and \sigma_2 = \$22,000. What sample size is needed?

19. Sample Size for Market Share: A company wants to estimate the difference in market share between two products within ±3% at 99% confidence. No prior data exists. What sample size is required?

20. Advertising Effectiveness (Paired): A retailer measures daily sales for 7 days before and after an advertising campaign:

Before: $12,500, $11,800, $13,200, $12,100, $11,900, $12,700, $13,500
After: $13,800, $12,900, $14,100, $13,200, $12,800, $13,900, $14,700

Does a 95% confidence interval suggest the campaign increased sales?

21. Employee Turnover Rates: Company A (n=300) had 42 employees leave last year. Company B (n=250) had 28 employees leave. Construct a 95% confidence interval for the difference in turnover rates and interpret the business implications.

End of Stage 3

This completes the third stage covering: - Paired samples (matched pairs) methodology - Why pairing reduces variability - Confidence intervals for paired differences - Employee training example - Hospital billing comparison (Example 9.4) - Confidence intervals for two proportions - Standard error for difference in proportions - Worker absenteeism study - Ice Capades costume defects (Example 9.5) - Sample size determination for means and proportions - 9 comprehensive section exercises

Coming in Stage 4: - Hypothesis testing for two means (independent samples) - Hypothesis testing for paired samples - Hypothesis testing for two proportions - Golf course playing times example - Johnson Manufacturing defect rates - Complete examples with business interpretations # Two-Population Inferences - Stage 4

10.12 9.6 Hypothesis Testing for Two Population Means

In previous sections, we focused on interval estimation—constructing confidence intervals to estimate the difference between population means. Now we turn to hypothesis testing—making decisions about whether a claimed difference exists.

Hypothesis Testing vs. Confidence Intervals

Both methods analyze the same data, but serve different purposes:

Confidence Intervals: - Estimate the magnitude of difference - Range of plausible values - Example: “The difference is between 2.5 and 7.8 units”

Hypothesis Tests: - Decide whether a specific difference exists - Yes/No decision at specified significance level - Example: “Is there evidence that μ₁ > μ₂?”

Important Connection: A confidence interval that contains zero corresponds to failing to reject H₀: μ₁ = μ₂

10.12.1 General Framework for Hypothesis Tests

The test statistic structure mirrors the confidence interval approach:

\text{Test Statistic} = \frac{(\text{Sample Difference}) - (\text{Hypothesized Difference})}{\text{Standard Error}}

Common null hypothesis: H_0: \mu_1 = \mu_2 (equivalent to \mu_1 - \mu_2 = 0)

Alternative hypotheses (three possibilities):

Two-tailed: H_A: \mu_1 \neq \mu_2 (difference exists, direction unknown)
Right-tailed: H_A: \mu_1 > \mu_2 (first population mean is greater)
Left-tailed: H_A: \mu_1 < \mu_2 (first population mean is smaller)

10.13 A. Hypothesis Tests with Large Samples

When both samples are large (n₁ ≥ 30 and n₂ ≥ 30), we use the Z-test.

Z-Test for Two Means (Large Samples)

Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_{\bar{X}_1 - \bar{X}_2}}

Where: s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

Decision Rule: - Two-tailed: Reject H₀ if |Z| > Z_{α/2} - Right-tailed: Reject H₀ if Z > Z_α - Left-tailed: Reject H₀ if Z < -Z_α

10.13.1 Example: Golf Course Playing Times (Men vs. Women)

Scenario: Course Management Resource Allocation

The manager of Pine Valley Golf Course wants to determine if there’s a difference in average playing times between men and women. This information will help schedule tee times and allocate course resources.

Sample Data:

Group	Sample Size	Mean Time	Std Dev
Men	n_m = 100	\bar{X}_m = 4.2 hrs	s_m = 0.8 hrs
Women	n_w = 75	\bar{X}_w = 4.7 hrs	s_w = 0.6 hrs

Hypotheses:

H_0: \mu_m = \mu_w \quad \text{(no difference in average playing times)} H_A: \mu_m \neq \mu_w \quad \text{(playing times differ)}

Significance Level: α = 0.01 (1%)

Solution:

Step 1: Calculate Standard Error

s_{\bar{X}_m - \bar{X}_w} = \sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}} = \sqrt{\frac{(0.8)^2}{100} + \frac{(0.6)^2}{75}} = \sqrt{\frac{0.64}{100} + \frac{0.36}{75}} = \sqrt{0.0064 + 0.0048} = \sqrt{0.0112} = 0.106

Step 2: Calculate Test Statistic

Z = \frac{(\bar{X}_m - \bar{X}_w) - 0}{s_{\bar{X}_m - \bar{X}_w}} = \frac{(4.2 - 4.7) - 0}{0.106} = \frac{-0.5}{0.106} = -4.72

Step 3: Determine Critical Values and Decision Rule

For α = 0.01 (two-tailed test): - Critical values: Z_{0.005} = \pm 2.576

Decision Rule: “Reject H₀ if Z < -2.576 or Z > +2.576”

Step 4: Make Decision

Z = -4.72 < -2.576 → Reject H₀

Step 5: Calculate p-value

For Z = -4.72: - Area in left tail ≈ 0.000001 - p-value = 2(0.000001) ≈ 0.000002 (two-tailed)

C:\Users\patod\AppData\Local\Temp\ipykernel_15512\2446534838.py:43: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  bp = ax2.boxplot([men_times, women_times],

Interpretation:

Business Conclusion: Significant Difference Exists

Statistical Finding: At α = 0.01, there’s strong evidence that average playing times differ between men and women. The p-value ≈ 0.000002 indicates this result is extremely unlikely to occur by chance.

Practical Finding: Women take an average of 0.5 hours (30 minutes) longer to complete a round.

Management Recommendations: 1. Tee time scheduling: Build in 30-minute buffers when women’s groups follow men’s groups 2. Course pacing: Post different pace-of-play guidelines for different groups 3. Resource allocation: Consider dedicated tee times for women’s leagues 4. Revenue optimization: Adjust pricing to account for longer course occupation times

Important Note: This finding reflects averages and should not influence individual golfer policies. Many women play faster than many men.

10.13.2 Alternative Hypothesis Formulation: One-Tailed Test

Suppose the golf course manager had a directional hypothesis—specifically suspecting that women take longer than men.

Revised Hypotheses:

H_0: \mu_w \leq \mu_m \quad \text{(women don't take longer)} H_A: \mu_w > \mu_m \quad \text{(women take longer)}

Equivalently (subtracting in opposite order):

H_0: \mu_m \geq \mu_w H_A: \mu_m < \mu_w

Solution for One-Tailed Test:

The test statistic remains: Z = -4.72

New Decision Rule (left-tailed test, α = 0.01): - Critical value: Z_{0.01} = -2.33 - Decision Rule: “Reject H₀ if Z < -2.33”

Decision: Z = -4.72 < -2.33 → Reject H₀

p-value (one-tailed): ≈ 0.000001

Conclusion: Same result—strong evidence that women take longer than men. The one-tailed test provides even stronger evidence (smaller p-value) because all α is concentrated in one tail.

10.14 B. Hypothesis Tests with Small Samples: The t-Distribution

When either sample is small (n < 30), we must use the t-distribution instead of the Z-distribution.

10.14.1 1. Equal Variances: Pooled t-Test

When population variances are equal (\sigma_1^2 = \sigma_2^2):

Pooled t-Test for Two Means (Small Samples, Equal Variances)

t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}

Where: s_p^2 = \frac{s_1^2(n_1-1) + s_2^2(n_2-1)}{n_1 + n_2 - 2}

Degrees of freedom: df = n_1 + n_2 - 2

10.14.2 Example: Revisiting Charles Schwab Training (from Example 9.1)

Context: In Example 9.1, we constructed a 99% confidence interval for the difference in competency levels between two employee training programs. The result was:

-8.34 \leq (\mu_1 - \mu_2) \leq 4.40

What if we wanted to TEST the hypothesis that the programs produce equal competency levels?

Sample Data (from Example 9.1):

Program	Sample Size	Mean Score	Std Dev
Program 1	n₁ = 45	\bar{X}_1 = 76.0	s₁ = 13.5
Program 2	n₂ = 40	\bar{X}_2 = 77.97	s₂ = 9.05

Hypotheses:

H_0: \mu_1 = \mu_2 \quad \text{(programs equally effective)} H_A: \mu_1 \neq \mu_2 \quad \text{(programs differ in effectiveness)}

Significance Level: α = 0.01

Solution:

Given the data from Example 9.1, the standard error was calculated as:

s_{\bar{X}_1 - \bar{X}_2} = 2.47

Step 1: Calculate Test Statistic

Z = \frac{(\bar{X}_1 - \bar{X}_2) - 0}{s_{\bar{X}_1 - \bar{X}_2}} = \frac{(76.0 - 77.97) - 0}{2.47} = \frac{-1.97}{2.47} = -0.79

Step 2: Critical Values and Decision Rule

For α = 0.01 (two-tailed): - Critical values: Z_{0.005} = \pm 2.58

Decision Rule: “Reject H₀ if |Z| > 2.58”

Step 3: Make Decision

|Z| = |-0.79| = 0.79 < 2.58 → Do NOT Reject H₀

Step 4: Calculate p-value

For Z = -0.79: - Area in left tail = 0.5 - 0.2852 = 0.2148 - p-value = 2(0.2148) = 0.4296

Interpretation:

Business Conclusion: No Evidence of Difference

At α = 0.01, there’s no statistical evidence that the two training programs produce different competency levels.

Key Observation: This conclusion is confirmed by the confidence interval from Example 9.1, which contained zero (-8.34 to 4.40). When a CI contains zero, the corresponding hypothesis test will fail to reject H₀: μ₁ = μ₂.

Charles Schwab Recommendation: Either training program is acceptable. Choose based on cost, time requirements, instructor availability, or employee preferences rather than effectiveness differences.

p-value interpretation: There’s a 43% probability of observing a difference this large (or larger) purely by chance if the programs are truly equal.

10.14.3 Example 9.6: Labor Negotiations Revisited (from Example 9.2)

In Example 9.2, we constructed a 98% confidence interval for wage differences between Atlanta and Newport News plants:

-5.09 \leq (\mu_A - \mu_N) \leq 9.15

Now test the hypothesis of equal wages.

Sample Data (from Example 9.2):

Plant	Sample Size	Mean Wage	Variance
Atlanta	n_A = 23	\bar{X}_A = \$17.53/hr	s_A^2 = 92.10
Newport News	n_N = 19	\bar{X}_N = \$15.50/hr	s_N^2 = 87.10

From Example 9.2: Pooled variance s_p^2 = 89.85

Hypotheses:

H_0: \mu_A = \mu_N H_A: \mu_A \neq \mu_N

Significance Level: α = 0.02

Solution:

Step 1: Calculate Test Statistic

t = \frac{(17.53 - 15.50) - 0}{\sqrt{89.85\left(\frac{1}{23} + \frac{1}{19}\right)}} = \frac{2.03}{\sqrt{89.85(0.0435 + 0.0526)}} = \frac{2.03}{\sqrt{89.85 \times 0.0961}} = \frac{2.03}{\sqrt{8.635}} = \frac{2.03}{2.939} = 0.69

Step 2: Critical Values and Decision Rule

df = 23 + 19 - 2 = 40
α = 0.02 (two-tailed)
From t-table: t_{0.01, 40} = \pm 2.423

Decision Rule: “Reject H₀ if |t| > 2.423”

Step 3: Make Decision

|t| = 0.69 < 2.423 → Do NOT Reject H₀

Interpretation:

No evidence of wage difference between plants. This confirms the confidence interval result (which contained zero). The labor negotiator can assure both sides that wages are statistically equivalent.

10.14.4 2. Unequal Variances: Separate Variance t-Test

When variances are unequal (\sigma_1^2 \neq \sigma_2^2):

Separate Variance t-Test (Unequal Variances)

t' = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Adjusted degrees of freedom: df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

10.14.5 Example: Acme Shock Absorbers Revisited (from Example 9.3)

In Example 9.3, we found a 98% confidence interval:

0.5 \leq (\mu_1 - \mu_2) \leq 7.1 \text{ weeks}

Test whether Type 1 shock absorbers are more durable than Type 2.

Sample Data (from Example 9.3):

Type	Sample Size	Mean Duration	Std Dev
Type 1	n₁ = 13	\bar{X}_1 = 11.3 wks	s₁ = 3.5 wks
Type 2	n₂ = 10	\bar{X}_2 = 7.5 wks	s₂ = 2.7 wks

Hypotheses:

H_0: \mu_1 = \mu_2 H_A: \mu_1 \neq \mu_2

Significance Level: α = 0.02

Solution:

From Example 9.3: Adjusted df = 20

Step 1: Calculate Test Statistic

t' = \frac{(11.3 - 7.5) - 0}{\sqrt{\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}}} = \frac{3.8}{\sqrt{0.942 + 0.729}} = \frac{3.8}{1.293} = 2.94

Step 2: Critical Values

df = 20, α = 0.02 (two-tailed)
From t-table: t'_{0.01, 20} = \pm 2.528

Decision Rule: “Reject H₀ if |t’| > 2.528”

Step 3: Make Decision

|t’| = 2.94 > 2.528 → Reject H₀

Interpretation:

Statistical Finding: Type 1 IS More Durable

At α = 0.02, there’s significant evidence that Type 1 shock absorbers last longer than Type 2.

However, recall from Example 9.3: The CEO requires at least 8 weeks additional durability to justify Type 1’s higher cost. The confidence interval (0.5 to 7.1 weeks) shows the true difference is likely less than 8 weeks.

Business Decision: Despite statistical significance, Type 1 does not meet the business requirement. Continue using Type 2 (less expensive option).

Key Lesson: Statistical significance ≠ Practical significance!

10.15 9.7 Hypothesis Testing for Paired Samples

Paired samples hypothesis testing follows the same logic as paired confidence intervals—analyze the differences as a single sample.

t-Test for Paired Samples

t = \frac{\bar{d} - (\mu_1 - \mu_2)}{\frac{s_d}{\sqrt{n}}}

Where: - \bar{d} = mean of paired differences - s_d = standard deviation of differences - n = number of pairs - df = n - 1

For testing equality: H_0: \mu_1 = \mu_2 becomes H_0: \mu_d = 0

10.15.1 Example: Hospital Billing Revisited (from Example 9.4)

In Example 9.4, Vicki Peplow constructed a 95% confidence interval for hospital cost differences:

-\$146.33 \leq (\mu_1 - \mu_2) \leq \$28.47

Test the hypothesis of equal average costs.

Sample Data (from Example 9.4): - n = 15 paired procedures - \sum d_i = -884 - \sum d_i^2 = 400,716

From Example 9.4 calculations: - \bar{d} = -58.93 - s_d = 157.8

Hypotheses:

H_0: \mu_1 = \mu_2 \quad \text{(equal costs)} H_A: \mu_1 \neq \mu_2 \quad \text{(costs differ)}

Significance Level: α = 0.05

Solution:

Step 1: Calculate Test Statistic

t = \frac{\bar{d} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{-58.93}{\frac{157.8}{\sqrt{15}}} = \frac{-58.93}{40.76} = -1.44

Step 2: Critical Values

df = 15 - 1 = 14
α = 0.05 (two-tailed)
From t-table: t_{0.025, 14} = \pm 2.145

Decision Rule: “Reject H₀ if |t| > 2.145”

Step 3: Make Decision

|t| = 1.44 < 2.145 → Do NOT Reject H₀

Interpretation:

No evidence of cost difference between hospitals. This confirms the confidence interval result (which contained zero). Vicki reports that both hospitals appear equivalent for cost purposes.

10.16 9.8 Hypothesis Testing for Two Proportions

Business problems frequently require comparing proportions between two populations: - Defect rates from two production methods - Default rates between loan portfolios
- Customer satisfaction between product versions - Response rates to different marketing campaigns

Z-Test for Difference Between Two Proportions

Z = \frac{(p_1 - p_2) - (\pi_1 - \pi_2)}{s_{p_1 - p_2}}

Where: s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}

Requirements: - n_1p_1 \geq 5, n_1(1-p_1) \geq 5 - n_2p_2 \geq 5, n_2(1-p_2) \geq 5

10.16.1 Example: Retail Credit Usage by Gender

Scenario: Credit Department Analysis

A retail store wants to test whether the proportion of male customers who use credit equals the proportion of female customers who use credit.

Sample Data:

Gender	Sample Size	Used Credit	Proportion
Men	n_m = 100	57	p_m = 0.57
Women	n_w = 110	52	p_w = 0.473

Hypotheses:

H_0: \pi_m = \pi_w H_A: \pi_m \neq \pi_w

Significance Level: α = 0.01

Solution:

Step 1: Verify Requirements

Men: 100(0.57) = 57 \geq 5 ✓ and 100(0.43) = 43 \geq 5 ✓
Women: 110(0.473) = 52 \geq 5 ✓ and 110(0.527) = 58 \geq 5 ✓

Step 2: Calculate Standard Error

s_{p_m - p_w} = \sqrt{\frac{0.57(0.43)}{100} + \frac{0.473(0.527)}{110}} = \sqrt{\frac{0.2451}{100} + \frac{0.2493}{110}} = \sqrt{0.002451 + 0.002266} = \sqrt{0.004717} = 0.069

Step 3: Calculate Test Statistic

Z = \frac{(0.57 - 0.473) - 0}{0.069} = \frac{0.097}{0.069} = 1.41

Step 4: Critical Values and Decision Rule

For α = 0.01 (two-tailed): - Critical values: Z_{0.005} = \pm 2.58

Decision Rule: “Reject H₀ if |Z| > 2.58”

Step 5: Make Decision

|Z| = 1.41 < 2.58 → Do NOT Reject H₀

Interpretation:

At α = 0.01, there’s no evidence that credit usage proportions differ between men and women. The store should not implement gender-specific credit marketing strategies.

10.16.2 Example 9.7: Johnson Manufacturing Defect Rates

Scenario: Quality Control for Shift Performance

Johnson Manufacturing has experienced increased defect rates. The production supervisor suspects the night shift produces a higher proportion of defects than the day shift.

Sample Data:

Shift	Units Inspected	Defects	Defect Rate
Day shift	n_D = 500	14	p_D = 0.028
Night shift	n_N = 700	22	p_N = 0.031

Decision Context: If night shift defect rate is significantly higher, institute a training program for night workers.

Hypotheses:

H_0: \pi_N \leq \pi_D \quad \text{(night shift not worse)} H_A: \pi_N > \pi_D \quad \text{(night shift has higher defects)}

Significance Level: α = 0.05

Solution:

Step 1: Calculate Standard Error

s_{p_N - p_D} = \sqrt{\frac{0.031(0.969)}{700} + \frac{0.028(0.972)}{500}} = \sqrt{\frac{0.0300}{700} + \frac{0.0272}{500}} = \sqrt{0.0000429 + 0.0000544} = \sqrt{0.0000973} = 0.0099

Step 2: Calculate Test Statistic

Z = \frac{(0.031 - 0.028) - 0}{0.0099} = \frac{0.003}{0.0099} = 0.303

Step 3: Critical Value (Right-Tailed Test)

For α = 0.05 (right-tailed): - Critical value: Z_{0.05} = 1.65

Decision Rule: “Reject H₀ if Z > 1.65”

Step 4: Make Decision

Z = 0.303 < 1.65 → Do NOT Reject H₀

Interpretation:

Business Decision: Do NOT Institute Training Program

Statistical Finding: At α = 0.05, there’s insufficient evidence to conclude that night shift workers produce a higher defect rate than day shift workers.

The observed difference (3.1% vs. 2.8%) could easily be due to random variation rather than a systematic problem with night shift performance.

Supervisor’s Recommendation: - Do not implement the training program (saves training costs) - Continue monitoring defect rates over time - Investigate other factors if defects remain elevated (equipment maintenance, raw material quality, environmental conditions)

Cost-Benefit Note: Training program avoided. If defects were truly a night shift issue, the test would have detected it with these sample sizes.

10.17 Section Exercises: Hypothesis Testing

22. Samples of sizes 50 and 60 reveal means of 512 and 587, and standard deviations of 125 and 145 respectively. At α = 0.02, test the hypothesis that μ₁ = μ₂.

23. At α = 0.01, test equality of means if samples of size 10 and 8 give means of 36 and 49, and standard deviations of 12 and 18, respectively. Assume variances are NOT equal.

24. Repeat problem 23 assuming variances ARE equal.

25. Paired samples of size 81 give a mean difference of 36.5 and a standard deviation of differences of 29.1. Test equality of means at α = 0.01.

26. Test H_0: \mu_1 \leq \mu_2 if samples of sizes 64 and 81 produce means of 65.2 and 58.6, and standard deviations of 21.2 and 25.3. Use α = 0.05.

27. Test H_0: \mu_1 \geq \mu_2 if two samples of size 100 produce means of 2.3 and 3.1 with standard deviations of 0.26 and 0.31. Use α = 0.01.

28. Paired samples of size 25 reported a mean difference of 45.2 and a standard deviation of differences of 21.6. Test equality of means at α = 0.05.

29. Samples of sizes 120 and 150 produced proportions of 0.69 and 0.73. Test equality of proportions at α = 0.05.

30. Two samples of size 500 each tested H_0: \pi_1 \leq \pi_2. Sample proportions are 14% and 11%. At α = 0.10, what is your conclusion?

31. Samples of sizes 200 and 250 reveal proportions of 21% and 26%. Test H_0: \pi_1 \geq \pi_2 at α = 0.01.

End of Stage 4

This completes the fourth stage covering: - Hypothesis testing framework for two populations - Large sample Z-tests for two means - Golf course playing times example (complete analysis) - Small sample t-tests (pooled and separate variances) - Charles Schwab training revisited (Example 9.6) - Labor negotiations revisited (Example 9.2) - Acme shock absorbers revisited (Example 9.3) - Paired samples hypothesis testing - Hospital billing revisited (Example 9.4) - Hypothesis testing for two proportions - Retail credit usage example - Johnson Manufacturing defect rates (Example 9.7) - Python visualizations (2 comprehensive graphs) - 10 section exercises

Coming in Stage 5 (Final): - F-test for equality of variances - Solved problems with step-by-step solutions - Formula summary list - Chapter summary and key takeaways - Closing scenario (Foreign investment decision) - Comprehensive chapter exercises # Two-Population Inferences - Stage 5 (Final)

10.18 9.9 Testing for Equality of Variances: The F-Test

Several statistical tests discussed earlier assumed equal population variances. We initially accepted this assumption without proof. Now we demonstrate how to formally test whether the assumption of equal variances is reasonable.

Why Test for Equal Variances?

Many statistical procedures depend on the variance assumption:

Pooled t-tests require \sigma_1^2 = \sigma_2^2
ANOVA (next chapter) assumes equal group variances
Some regression techniques assume homoscedasticity (equal variances)

The F-test helps decide whether to use: - Pooled methods (when variances are equal) - Separate variance methods (when variances differ)

10.18.1 The F-Distribution

The test for comparing variances uses the F-distribution, named in 1924 to honor Sir Ronald A. Fisher (1890-1962), one of the founders of modern statistics.

Key Properties of the F-Distribution:

Right-skewed (not symmetric)
Bounded by zero on the left (cannot be negative)
Two degrees of freedom parameters:
- df₁ = numerator degrees of freedom = n₁ - 1
- df₂ = denominator degrees of freedom = n₂ - 1
Values always ≥ 0

10.18.2 The F-Ratio

F-Ratio for Comparing Two Variances

F = \frac{s_L^2}{s_S^2}

Where: - s_L^2 = larger sample variance - s_S^2 = smaller sample variance

Convention: Always place the larger variance in the numerator to ensure F ≥ 1

Logic of the Test:

If population variances are truly equal (\sigma_1^2 = \sigma_2^2), then F \approx 1
The more s_L^2 exceeds s_S^2, the larger F becomes
A sufficiently large F provides evidence that \sigma_1^2 \neq \sigma_2^2

10.18.3 Critical Value Adjustment for Two-Tailed Tests

Important: Divide α by 2

Because we force F ≥ 1 by placing the larger variance in the numerator, we can only reject in the right tail. This eliminates the left tail rejection region.

For a two-tailed test of H_0: \sigma_1^2 = \sigma_2^2: - Use significance level α/2 (not α) - Critical value: F_{\alpha/2, df_1, df_2}

Example: For α = 0.10 (10% significance), use F_{0.05} from the F-table

10.18.4 Example: Management Consultant’s Variance Test

Scenario: Preliminary Analysis Before t-Test

A management consultant wants to test a hypothesis about two population means. Before conducting the t-test, the consultant must decide whether to assume equal variances.

Sample Data:

Sample	Size	Std Dev	Variance
Sample 1	n₁ = 10	s₁ = 12.2	s_1^2 = 148.84
Sample 2	n₂ = 10	s₂ = 15.4	s_2^2 = 237.16

Hypotheses:

H_0: \sigma_1^2 = \sigma_2^2 \quad \text{(variances are equal)} H_A: \sigma_1^2 \neq \sigma_2^2 \quad \text{(variances differ)}

Significance Level: α = 0.05

Solution:

Step 1: Calculate F-Ratio (Larger Variance in Numerator)

F = \frac{s_2^2}{s_1^2} = \frac{(15.4)^2}{(12.2)^2} = \frac{237.16}{148.84} = 1.59

Step 2: Determine Degrees of Freedom

Numerator (larger variance): df₁ = n₂ - 1 = 10 - 1 = 9
Denominator (smaller variance): df₂ = n₁ - 1 = 10 - 1 = 9

Step 3: Find Critical Value

For α = 0.05 (two-tailed test): - Use α/2 = 0.025 in F-table - Look up F_{0.025, 9, 9} = 4.03

Step 4: Decision Rule

Decision Rule: “Do not reject H₀ if F ≤ 4.03. Reject if F > 4.03”

Step 5: Make Decision

F = 1.59 < 4.03 → Do NOT Reject H₀

Interpretation:

Conclusion: Assume Equal Variances

At α = 0.05, there’s insufficient evidence to conclude that the population variances differ.

Practical Implication: The consultant can proceed with the hypothesis test for means using the pooled variance method (Section 9.2B), which assumes \sigma_1^2 = \sigma_2^2.

Statistical Note: Failing to reject H₀ doesn’t prove variances are equal—it simply means the sample evidence isn’t strong enough to conclude they’re different.

10.19 Solved Problems

The following worked examples demonstrate complete solutions to two-population inference problems, integrating concepts from throughout the chapter.

10.19.1 Solved Problem 1: Yuppies’ Work Ethic

Source: Fortune magazine (April 1991)

Context: Study of workaholic baby boomers (ages 25-43) in administrative positions

A Fortune article compared work hours between young executives on the corporate fast track (Group 1) versus those who spent less time at work (Group 2). While fast-trackers often reported 70, 80, or even 90 hours per week, approximately 60 hours was typical.

Sample Data:

Group	Mean Hours	Std Dev	Sample Size
Fast track	\bar{X}_1 = 62.5	s₁ = 23.7	n₁ = 175
Less time	\bar{X}_2 = 39.7	s₂ = 8.9	n₂ = 168

Tasks: 1. Construct a 90% confidence interval for the difference in average work hours 2. Test the hypothesis of equal means at α = 0.10

Solution:

Part 1: Confidence Interval

Step 1: Calculate Standard Error

s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} = \sqrt{\frac{(23.7)^2}{175} + \frac{(8.9)^2}{168}} = \sqrt{\frac{561.69}{175} + \frac{79.21}{168}} = \sqrt{3.210 + 0.472} = \sqrt{3.682} = 1.92

Step 2: Find Critical Z-Value

90% confidence level → α = 0.10 → Z_{0.05} = 1.65

Step 3: Calculate Confidence Interval

\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm Z \cdot s_{\bar{X}_1 - \bar{X}_2} = (62.5 - 39.7) \pm 1.65(1.92) = 22.8 \pm 3.17 19.63 \leq (\mu_1 - \mu_2) \leq 25.97 \text{ hours}

Interpretation: We can be 90% confident that fast-track executives work an average of 19.63 to 25.97 hours more per week than their less work-focused counterparts.

Part 2: Hypothesis Test

Hypotheses:

H_0: \mu_1 = \mu_2 \quad \text{(equal average work hours)} H_A: \mu_1 \neq \mu_2 \quad \text{(work hours differ)}

Test Statistic:

Z = \frac{(62.5 - 39.7) - 0}{1.92} = \frac{22.8}{1.92} = 11.88

Critical Values: For α = 0.10 (two-tailed): Z_{0.05} = \pm 1.65

Decision Rule: “Do not reject if -1.65 ≤ Z ≤ 1.65. Otherwise reject.”

Decision: Z = 11.88 > 1.65 → Reject H₀

Conclusion: There’s overwhelming evidence (p-value ≈ 0.0000) that fast-track executives work significantly more hours than other administrators. The 22.8-hour difference is far too large to attribute to chance.

10.19.2 Solved Problem 2: Inflation and Market Power

Context: Economic Study of Industry Concentration

Economists fear that industries with high concentration (market power in few firms’ hands) may exploit their dominance. Firms in nine high-concentration industries were paired with firms in nine industries where economic power was dispersed. Industries were matched on foreign competition, cost structures, and other price-affecting factors.

Data: Average percentage price increases by industry

Industry Pair	Concentrated (%)	Less Concentrated (%)	Difference (d)	d²
1	3.7	3.2	0.5	0.25
2	4.1	3.7	0.4	0.16
3	2.1	2.6	-0.5	0.25
4	-0.9	0.1	-1.0	1.00
5	4.6	4.1	0.5	0.25
6	5.2	4.8	0.4	0.16
7	6.7	5.2	1.5	2.25
8	3.8	3.9	-0.1	0.01
9	4.9	4.6	0.3	0.09
Totals			2.0	4.42

Question: At α = 0.10, do concentrated industries show more pronounced inflationary pressures?

Solution:

Step 1: Calculate Mean and Standard Deviation of Differences

\bar{d} = \frac{\sum d_i}{n} = \frac{2.0}{9} = 0.22\%

s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}} = \sqrt{\frac{4.42 - 9(0.22)^2}{8}} = \sqrt{\frac{4.42 - 0.436}{8}} = \sqrt{\frac{3.984}{8}} = \sqrt{0.498} = 0.706

Step 2: Construct 90% Confidence Interval

df = n - 1 = 9 - 1 = 8
For 90% CI: t_{0.05, 8} = 1.860

\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}} = 0.22 \pm 1.860 \times \frac{0.706}{\sqrt{9}} = 0.22 \pm 1.860 \times 0.235 = 0.22 \pm 0.438 -0.218 \leq \mu_d \leq 0.658

Interpretation: We’re 90% confident that concentrated industries have price increases that are between 0.218% lower and 0.658% higher than less concentrated industries.

Step 3: Hypothesis Test

Hypotheses:

H_0: \mu_{\text{conc}} = \mu_{\text{less}} \quad \text{(no inflation difference)} H_A: \mu_{\text{conc}} \neq \mu_{\text{less}} \quad \text{(inflation differs)}

Test Statistic:

t = \frac{\bar{d} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{0.22}{\frac{0.706}{\sqrt{9}}} = \frac{0.22}{0.235} = 0.935

Critical Values: For α = 0.10, df = 8: t_{0.05, 8} = \pm 1.860

Decision Rule: “Do not reject if -1.860 ≤ t ≤ 1.860. Otherwise reject.”

Decision: t = 0.935 < 1.860 → Do NOT Reject H₀

Conclusion: At α = 0.10, there’s insufficient evidence to conclude that concentrated industries have higher inflationary pressures than less concentrated industries. The observed 0.22% difference could easily be due to chance.

10.19.3 Solved Problem 3: Drilling Rig Bit Comparison

Context: Oil Drilling Equipment Testing

A drilling company tests two drill bits by drilling to a maximum depth of 112 feet and recording completion time.

Sample Data:

Bit	Wells Drilled	Mean Time	Std Dev
Bit 1	n₁ = 12	\bar{X}_1 = 27.3 hrs	s₁ = 8.7 hrs
Bit 2	n₂ = 10	\bar{X}_2 = 31.7 hrs	s₂ = 8.3 hrs

Conditions: - No evidence that variances are equal → use separate variance method - α = 0.10 - All wells drilled with same equipment and soil type

Question: Does one bit appear more effective?

Solution:

Step 1: Calculate Adjusted Degrees of Freedom

df = \frac{\left[\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right]^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

= \frac{\left[\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}\right]^2}{\frac{[(8.7)^2/12]^2}{11} + \frac{[(8.3)^2/10]^2}{9}}

= \frac{[6.303 + 6.889]^2}{\frac{(6.303)^2}{11} + \frac{(6.889)^2}{9}}

= \frac{(13.192)^2}{\frac{39.73}{11} + \frac{47.46}{9}} = \frac{174.03}{3.612 + 5.273}

= \frac{174.03}{8.885} = 19.59 \approx 19 \text{ (round down)}

Step 2: Construct 90% Confidence Interval

For df = 19, 90% CI: t'_{0.05, 19} = 1.729

\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

= (27.3 - 31.7) \pm 1.729 \sqrt{\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}}

= -4.4 \pm 1.729\sqrt{6.303 + 6.889} = -4.4 \pm 1.729(3.632)

= -4.4 \pm 6.28

-10.68 \leq (\mu_1 - \mu_2) \leq 1.88 \text{ hours}

Interpretation: We’re 90% confident that Bit 1 takes between 1.88 hours more and 10.68 hours less than Bit 2.

Step 3: Hypothesis Test

Hypotheses:

H_0: \mu_1 = \mu_2 H_A: \mu_1 \neq \mu_2

Test Statistic:

t' = \frac{(27.3 - 31.7) - 0}{\sqrt{\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}}} = \frac{-4.4}{3.632} = -1.211

Critical Values: t'_{0.05, 19} = \pm 1.729

Decision Rule: “Do not reject if -1.729 ≤ t’ ≤ 1.729. Otherwise reject.”

Decision: t’ = -1.211 > -1.729 → Do NOT Reject H₀

Conclusion: No evidence that one bit is more effective. The 4.4-hour difference could be due to chance.

Alternative: If Variances Were Equal

If equipment/soil similarity justified equal variances:

s_p^2 = \frac{(8.7)^2(11) + (8.3)^2(9)}{20} = \frac{832.23 + 620.01}{20} = 72.61

With df = 20: t_{0.10, 20} = 1.725

\text{C.I.} = -4.4 \pm 1.725\sqrt{72.61\left(\frac{1}{12} + \frac{1}{10}\right)} = -4.4 \pm 6.29 -10.69 \leq (\mu_1 - \mu_2) \leq 1.89

Result is nearly identical—conclusion unchanged.

10.19.4 Solved Problem 4: The Credit Crunch

Context: Retail Credit Card Usage by Gender

A Retail Management study examined credit card usage patterns.

Sample Data:

Gender	Shoppers	Used Card	Proportion
Women	n_w = 468	131	p_w = 0.28
Men	n_m = 237	57	p_m = 0.24

Question: At α = 0.05, is there evidence of a difference in credit card usage proportions?

Solution:

Part 1: Confidence Interval

Step 1: Calculate Standard Error

s_{p_w - p_m} = \sqrt{\frac{p_w(1-p_w)}{n_w} + \frac{p_m(1-p_m)}{n_m}}

= \sqrt{\frac{0.28(0.72)}{468} + \frac{0.24(0.76)}{237}}

= \sqrt{\frac{0.2016}{468} + \frac{0.1824}{237}} = \sqrt{0.000431 + 0.000770}

= \sqrt{0.001201} = 0.035

Step 2: Construct 95% Confidence Interval

Z_{0.025} = 1.96

\text{C.I.} = (p_w - p_m) \pm Z \cdot s_{p_w - p_m} = (0.28 - 0.24) \pm 1.96(0.035) = 0.04 \pm 0.069 -0.029 \leq (\pi_w - \pi_m) \leq 0.109

Or: -2.9% ≤ difference ≤ 10.9%

Interpretation: No evidence of a difference—the interval contains zero.

Part 2: Hypothesis Test

Hypotheses:

H_0: \pi_w = \pi_m H_A: \pi_w \neq \pi_m

Test Statistic:

Z = \frac{(0.28 - 0.24) - 0}{0.035} = \frac{0.04}{0.035} = 1.14

Critical Values: For α = 0.05: Z_{0.025} = \pm 1.96

Decision Rule: “Do not reject if -1.96 ≤ Z ≤ 1.96. Otherwise reject.”

Decision: Z = 1.14 < 1.96 → Do NOT Reject H₀

Conclusion: No evidence that credit card usage proportions differ by gender. Retailers should not implement gender-specific credit marketing strategies.

10.20 Formula Summary

Complete Formula List for Chapter 9

CONFIDENCE INTERVALS

[9.1] Two Means (Large Samples): (\bar{X}_1 - \bar{X}_2) \pm Z \sigma_{\bar{X}_1 - \bar{X}_2}

[9.2] Standard Error (Known σ): \sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

[9.3] Estimated Standard Error: s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

[9.4] Two Means (Large Samples, Unknown σ): (\bar{X}_1 - \bar{X}_2) \pm Z s_{\bar{X}_1 - \bar{X}_2}

[9.5] Pooled Variance: s_p^2 = \frac{s_1^2(n_1-1) + s_2^2(n_2-1)}{n_1 + n_2 - 2}

[9.6] Two Means (Equal Variances): (\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}

[9.7] Adjusted df (Unequal Variances): df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

[9.8] Two Means (Unequal Variances): (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

[9.9] Mean of Paired Differences: \bar{d} = \frac{\sum d_i}{n}

[9.10] Std Dev of Differences: s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}

[9.11] Paired Differences CI: \bar{d} \pm t \frac{s_d}{\sqrt{n}}

[9.12] Standard Error for Two Proportions: s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}

[9.13] Two Proportions CI: (p_1 - p_2) \pm Z \cdot s_{p_1-p_2}

SAMPLE SIZE FORMULAS

[9.14] Sample Size for Two Means: n = \frac{Z^2(\sigma_1^2 + \sigma_2^2)}{E^2}

[9.15] Sample Size for Two Proportions: n = \frac{Z^2[\pi_1(1-\pi_1) + \pi_2(1-\pi_2)]}{E^2}

HYPOTHESIS TESTS

[9.16] Z-Test for Two Means (Large): Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_{\bar{X}_1-\bar{X}_2}}

[9.17] t-Test (Equal Variances): t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}

[9.18] t-Test (Unequal Variances): t' = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

[9.19] t-Test for Paired Samples: t = \frac{\bar{d} - (\mu_1 - \mu_2)}{\frac{s_d}{\sqrt{n}}}

[9.20] Z-Test for Two Proportions: Z = \frac{(p_1 - p_2) - (\pi_1 - \pi_2)}{s_{p_1-p_2}}

[9.21] F-Test for Variances: F = \frac{s_L^2}{s_S^2}

10.21 Chapter Summary

10.21.1 Key Concepts Mastered

1. Two-Population Framework - Comparing means, proportions, and variances across two populations - Independent vs. paired samples - Large vs. small sample methods

2. Interval Estimation - Confidence intervals quantify uncertainty about population differences - Critical connection: CI contains zero ↔︎ fail to reject H₀: μ₁ = μ₂

3. Hypothesis Testing - Formal decision-making about population differences - One-tailed vs. two-tailed tests - Statistical significance ≠ practical significance (Acme example)

4. Variance Assumptions Matter - Pooled methods (equal variances): More powerful when assumption valid - Separate variance methods (unequal variances): More robust, wider CIs - F-test: Formal test for variance equality

5. Paired Samples Power - Pairing reduces variability by controlling for confounding factors - Transforms two-sample problem into one-sample problem - More powerful when pairing is appropriate

10.21.2 Decision Framework

Situation	Sample Size	Variances	Method
Compare means	Both ≥ 30	Any	Z-test, Formula [9.4]
Compare means	Either < 30	Equal	Pooled t-test, Formula [9.6]
Compare means	Either < 30	Unequal	Separate variance t-test, Formula [9.8]
Matched pairs	Any	N/A	Paired t-test, Formula [9.11]
Compare proportions	Large enough*	N/A	Z-test for proportions, Formula [9.13]
Compare variances	Any	N/A	F-test, Formula [9.21]

*Requires: np \geq 5 and n(1-p) \geq 5 for both samples

10.21.3 Business Applications

Throughout this chapter, we’ve seen two-population inference applied to:

Human Resources: Training program effectiveness, wage equity, employee turnover
Quality Control: Production process comparison, defect rates, product durability
Healthcare: Hospital cost analysis, treatment effectiveness
Marketing: Credit usage patterns, customer satisfaction, advertising effectiveness
Finance: Investment strategies, pricing decisions, cost comparison
Operations: Service delivery times, productivity comparisons, equipment efficiency

10.22 Closing Scenario: U.S. Foreign Investment Decision

Revisiting the Opening Scenario

Recall the Fortune magazine analysis (October 1996) of U.S. foreign investment:

Europe: $364 billion (17% increase)
Asia: $100 billion (16% increase)

Your Executive Summary (Based on Chapter 9 Methods):

As the analyst preparing the comparative report, you would apply the two-population methods learned in this chapter:

Investment Strategy Recommendation

Analysis Approach: 1. Construct confidence intervals for average returns in Europe vs. Asia 2. Test hypotheses about risk differences (variance comparison using F-test) 3. Consider paired analysis if same companies invested in both regions 4. Evaluate proportions of successful investments in each region

Key Questions Answered: - Is the average ROI significantly different? (t-test for means) - Is investment risk (variance) comparable? (F-test) - What’s the plausible range for the difference? (Confidence interval)

Strategic Implications: - If CIs overlap zero: No clear advantage—diversify across both regions - If Europe CI > 0: Europe provides superior returns—increase allocation - If Asia variance lower: Asia offers more stable returns—risk-averse preference

Final Recommendation: Use statistical evidence to support data-driven geographic allocation decisions, balancing return expectations with risk tolerance.

10.23 Chapter Exercises (Selected)

32. AT&T vs. Sprint: Phone service comparison - AT&T: n=145, \bar{X}=$4.07, s=$0.97 - Sprint: n=102, \bar{X}=$3.89, s=$0.85 What does a 95% CI reveal about mean cost difference?

37. Grant Applications: NSF (n=14, \bar{X}=45.7 weeks, s=12.6) vs. HHS (n=12, \bar{X}=32.9 weeks, s=16.8). Construct 90% CI. If NSF takes >5 weeks more, submit to HHS. What should James do? (Assume equal variances)

39. Quality Control Teams: Two teams solve 10 problems. Paired data provided. Construct 90% CI for difference in average solution times.

44. Mutual Funds: Income-oriented funds (n=12) vs. growth-oriented funds (n=14). Unequal variances assumed. a. Construct 80% CI for difference in average returns b. What sample size needed for 95% confidence with error ≤ $10?

45. Baldwin Piano Teaching Method: Your method (n=100, \bar{X}=149 hrs, s=37.7) vs. competitor (n=130, \bar{X}=186 hrs, s=42.2) a. 99% CI—is your method better? b. Sample size for 99% confidence with error ≤ 5 hours?

End of Chapter 9

This completes the comprehensive coverage of Two-Population Inferences, including:

✅ Confidence intervals for means (large/small, equal/unequal variances)
✅ Paired samples methodology
✅ Confidence intervals for proportions
✅ Sample size determination
✅ Hypothesis testing for all scenarios
✅ F-test for variance equality
✅ Four complete solved problems
✅ Comprehensive formula summary
✅ Business decision framework
✅ Real-world applications throughout

Total Chapter 9 Content: ~30,000 words across 5 stages

Next Chapter: Chapter 10 - Analysis of Variance (ANOVA)

# Two-Population Inferences **Learning Objectives** After completing this chapter, you will be able to: 1. Construct and interpret confidence intervals for the difference between two population means with large samples 2. Apply pooled variance methods for small sample comparisons when population variances are equal 3. Use adjusted degrees of freedom methods when population variances are unequal 4. Implement paired-sample analysis for before-after or matched-pairs designs 5. Estimate and test differences between two population proportions 6. Determine appropriate sample sizes for two-population studies 7. Conduct hypothesis tests comparing two means using independent samples 8. Perform paired-sample hypothesis tests 9. Test hypotheses about differences between two proportions 10. Apply two-population inference methods to real business decision-making scenarios --- ## Opening Scenario: U.S. Foreign Investment Strategy ::: {.callout-note icon="🌍" appearance="default"} ## International Investment Analyst Challenge **Context:** October 1996 - Fortune Magazine Analysis Fortune magazine published a series of articles examining trends in U.S. foreign trade, focusing on massive international transactions and the competition between Europe and Asia for American investment dollars. **The Numbers:** - **European Investment:** $364 billion in 1996, up 17% from previous year's record - **Asian Investment:** Over $100 billion, representing 16% growth in U.S. commercial participation **The Debate:** These articles challenged conventional wisdom that U.S. companies preferred investing in Asia's rapidly growing economies over Europe's established markets. Instead, the analysis suggested American business interests still view Europe as offering more lucrative opportunities for corporate growth. **Your Role:** You are an international analyst for a major U.S. corporation. You must prepare a comprehensive comparative report on the advantages of investing in each geopolitical region. This report will be presented to division executives who will decide the future course of foreign investment for the coming years. ::: ```{mermaid} %%| fig-cap: "Chapter 9: Two-Population Inferences Structure" %%| fig-width: 100% graph TD A[Inferences about Two Populations] --> B[Interval Estimation] A --> C[Hypothesis Testing] B --> B1[Independent Samples] B --> B2[Paired Sampling] B --> B3[Difference between Two Proportions] B1 --> B1a[Estimation with Large Samples] B1 --> B1b[Equal Variances - Pooled Data] B1 --> B1c[Unequal Variances] C --> C1[Independent Samples] C --> C2[Paired Samples] C --> C3[Tests for the Difference between Two Proportions] C1 --> C1a[Tests with Large Samples] C1 --> C1b[Equal Variances - Pooled Data] C1 --> C1c[Unequal Variances] style A fill:#333,stroke:#000,stroke-width:4px,color:#fff style B fill:#fff,stroke:#000,stroke-width:3px style C fill:#fff,stroke:#000,stroke-width:3px ``` ### Your Analysis Requirements To guide the investment decision, you must: - **Compare average return on investment** in Europe versus Asia - **Determine which region** has a lower percentage of failed investment projects - **Estimate average investment levels** in both Europe and Asia - **Provide thorough comparison** of all relevant financial measures in these two foreign markets This chapter provides the statistical tools you'll need to make these critical comparisons. --- ## 9.1 Introduction: Why Compare Two Populations? Chapters 7 and 8 demonstrated how to construct confidence intervals and test hypotheses for a **single population**. However, many of the most important business questions require comparing **two populations**: ::: {.callout-tip icon="💼" appearance="simple"} ## Real-World Comparison Questions **Manufacturing:** - What's the difference, if any, between the average durability of ski boots made by North Slope versus those produced by Head? **Operations:** - Do workers in Plant A produce more on average than workers in Plant B? **Quality Control:** - Is there a difference between the proportion of defective units produced by Method 1 versus Method 2? **Marketing:** - Does advertising campaign A generate more sales than campaign B? **Human Resources:** - Are employee satisfaction scores higher after implementing a new policy? **Finance:** - Do European investments yield higher returns than Asian investments? ::: ### Two Fundamental Sampling Approaches The exact statistical procedure depends on the **sampling technique** used: **1. Independent Samples** As the name indicates, independent sampling involves collecting **separate, unrelated samples** from each population. - Samples don't need to be the same size - Observations in one sample have no relationship to observations in the other - Example: Comparing durability of 100 Brand A tires versus 80 Brand B tires **2. Paired Samples (Matched Pairs)** With paired sampling, observations from each population are **matched** or **correspond** to each other. - Observations are paired based on similarity in relevant characteristics - Also called "before-after" comparisons when measuring same units twice - Example: Measuring employee productivity before and after training We'll begin with independent sampling methods. --- ## 9.2 Interval Estimation for Independent Samples When comparing two populations, we're interested in estimating the **difference between two population means**: $(\mu_1 - \mu_2)$ The appropriate method depends on the sample sizes $n_1$ and $n_2$: - **Large samples** (both $n_1 \geq 30$ and $n_2 \geq 30$): Use Z-distribution - **Small samples** (either $n_1 < 30$ or $n_2 < 30$): Use t-distribution ### A. Estimation with Large Samples (n₁ ≥ 30 and n₂ ≥ 30) **Point Estimate:** The point estimate of the difference $(\mu_1 - \mu_2)$ is the difference between sample means: $$\bar{X}_1 - \bar{X}_2$$ **Sampling Distribution:** When both $n_1$ and $n_2$ are large, the **sampling distribution of differences** $(\bar{X}_1 - \bar{X}_2)$ follows a **normal distribution** centered at $(\mu_1 - \mu_2)$. ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np from scipy import stats fig, ax = plt.subplots(figsize=(10, 5)) # Generate normal distribution centered at μ₁ - μ₂ x = np.linspace(-4, 4, 1000) y = stats.norm.pdf(x, 0, 1) ax.plot(x, y, 'b-', linewidth=2.5, label='Sampling Distribution of ($\\bar{X}_1 - \\bar{X}_2$)') ax.fill_between(x, y, alpha=0.2, color='blue') # Mark center ax.axvline(0, color='red', linestyle='--', linewidth=1.5, label='Center: $\\mu_1 - \\mu_2$') ax.annotate('$\\mu_1 - \\mu_2$', xy=(0, 0.42), fontsize=13, ha='center', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7)) # Add standard error markers ax.annotate('', xy=(-1, 0.1), xytext=(1, 0.1), arrowprops=dict(arrowstyle='<->', color='darkgreen', lw=2)) ax.text(0, 0.12, '$\\sigma_{\\bar{X}_1 - \\bar{X}_2}$', fontsize=11, ha='center', color='darkgreen', bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.6)) ax.set_xlabel('Difference in Sample Means ($\\bar{X}_1 - \\bar{X}_2$)', fontsize=11) ax.set_ylabel('Probability Density', fontsize=11) ax.set_title('Figure 9.1: Sampling Distribution of Differences Between Sample Means', fontsize=13, fontweight='bold') ax.legend(fontsize=10) ax.grid(alpha=0.3) plt.tight_layout() plt.savefig('09-sampling-distribution-differences.png', dpi=150, bbox_inches='tight') plt.show() ``` **Standard Error of the Difference:** The standard error measures how much the differences between sample means tend to vary: ::: {.callout-note icon="📊" appearance="minimal"} ## Standard Error of Difference Between Sample Means $$\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$$ Where: - $\sigma_1^2$, $\sigma_2^2$ = population variances - $n_1$, $n_2$ = sample sizes ::: In practice, population variances are usually unknown. We estimate them using sample variances: $$s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ **Confidence Interval Formula:** ::: {.callout-important icon="⚡" appearance="simple"} ## Confidence Interval for $(\mu_1 - \mu_2)$ — Large Samples $$\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm Z \cdot s_{\bar{X}_1 - \bar{X}_2}$$ Where: - $(\bar{X}_1 - \bar{X}_2)$ = point estimate - $Z$ = critical Z-value for desired confidence level - $s_{\bar{X}_1 - \bar{X}_2}$ = estimated standard error ::: **Important Note:** We're not interested in the individual values of $\mu_1$ or $\mu_2$, but only in their **difference**. --- ### Example: Transfer Trucking Route Comparison **Scenario:** Delivery Time Analysis Transfer Trucking transports shipments between Chicago and Kansas City using two different routes. The dispatcher, Delmar, wants to determine if there's a difference in average transit times. **Sample Data:** | Route | Sample Size | Mean Time | Std Dev | |:------|:------------|:----------|:--------| | **North** | n = 100 | $\bar{X}_N = 17.2$ hrs | $s_N = 5.3$ hrs | | **South** | n = 75 | $\bar{X}_S = 19.4$ hrs | $s_S = 4.5$ hrs | **Objective:** Develop a 95% confidence interval for the difference in average transit time. **Solution:** **Step 1: Calculate Standard Error** Since population standard deviations are unknown, we use the sample standard deviations: $$s_{\bar{X}_N - \bar{X}_S} = \sqrt{\frac{s_N^2}{n_N} + \frac{s_S^2}{n_S}} = \sqrt{\frac{(5.3)^2}{100} + \frac{(4.5)^2}{75}}$$ $$= \sqrt{\frac{28.09}{100} + \frac{20.25}{75}} = \sqrt{0.2809 + 0.27} = \sqrt{0.5509} = 0.742 \text{ hours}$$ **Step 2: Find Critical Value** For 95% confidence level: $Z = 1.96$ **Step 3: Compute Confidence Interval** $$\text{C.I.} = (\bar{X}_N - \bar{X}_S) \pm Z \cdot s_{\bar{X}_N - \bar{X}_S}$$ $$= (17.2 - 19.4) \pm (1.96)(0.742)$$ $$= -2.2 \pm 1.45$$ $$-3.65 \leq (\mu_N - \mu_S) \leq -0.75$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(10, 4)) # Confidence interval lower = -3.65 upper = -0.75 point_est = -2.2 # Draw confidence interval ax.plot([lower, upper], [0, 0], 'b-', linewidth=3, label='95% Confidence Interval') ax.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2) ax.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2) ax.plot(point_est, 0, 'ro', markersize=12, label=f'Point Estimate: {point_est} hrs') # Add zero reference line ax.axvline(0, color='red', linestyle='--', linewidth=1, alpha=0.5, label='Zero (no difference)') # Annotations ax.annotate(f'{lower} hrs', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate(f'{upper} hrs', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.set_xlim(-5, 1) ax.set_ylim(-0.25, 0.25) ax.set_xlabel('Difference in Mean Transit Time ($\\mu_N - \\mu_S$) in hours', fontsize=11) ax.set_title('95% Confidence Interval: North Route vs. South Route Transit Time', fontsize=12, fontweight='bold') ax.legend(fontsize=10) ax.grid(alpha=0.3, axis='x') ax.set_yticks([]) plt.tight_layout() plt.savefig('09-transfer-trucking-CI.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** The results can be interpreted two ways: 1. **Technical:** Delmar can be 95% confident that $(\mu_N - \mu_S)$ is between **-3.65 hours and -0.75 hours**. 2. **Practical:** Since we subtracted the South route mean from the North route mean and got negative numbers, Delmar can be 95% confident that the **South route takes between 0.75 and 3.65 hours longer** than the North route. **Business Decision:** The North route is consistently faster. If minimizing transit time is important, Transfer Trucking should prioritize the North route. --- ## Section Exercises: Large Sample Confidence Intervals **1.** **Clark Insurance Claims Analysis**: Clark Insurance sells policies to residents throughout the Chicago area. The owner wants to estimate the difference in average claims between people living in urban zones versus those in suburbs. - Urban sample: n = 180, $\bar{X}$ = $2,025, s = $918 - Suburban sample: n = 200, $\bar{X}$ = $1,802, s = $512 What does a 95% confidence interval tell the owner about average claims filed by these two groups? **2.** **Steel Tube Production Comparison**: Two production processes are used to produce steel tubes. - Process 1: n = 100, $\bar{X}$ = 27.3 inches, s = 10.3 inches - Process 2: n = 100, $\bar{X}$ = 30.1 inches, s = 5.2 inches What does a 99% confidence interval reveal about the difference in average lengths of tubes produced by these two methods? **3.** **Chapman Industries Phone System Comparison**: Chuck Chapman wants to determine if customers calling one phone system are kept on hold longer on average than those calling another system. - System 1: n = 75, $\bar{X}$ = 25.2 seconds, s = 4.8 seconds - System 2: n = 70, $\bar{X}$ = 21.3 seconds, s = 3.8 seconds What recommendation would you provide to Chuck based on a 90% confidence interval estimating the difference in average wait times if he wants to minimize customer wait time? **4.** **Production Design Time Comparison**: Two production designs are used to manufacture a certain product. - Old design: n = 150, $\bar{X}$ = 3.51 days, s = 0.79 days - New design: n = 150, $\bar{X}$ = 3.32 days, s = 0.73 days What does a 99% confidence interval reveal about the difference between average times required to make the product? Which design should be used? **5.** **Conceptual Question**: Explain exactly what the standard error of the difference between sample means actually measures. --- **End of Stage 1** This completes the first stage covering: - Introduction to two-population comparisons - Opening scenario (U.S. foreign investment strategy) - Large sample confidence intervals for independent samples - Standard error calculation - Complete worked example with interpretation - Section exercises **Coming in Stage 2:** - Small sample methods (t-distribution) - Pooled variance approach (equal variances) - Separate variance approach (unequal variances) - More comprehensive examples # Two-Population Inferences - Stage 2 ## 9.2B Estimation with Small Samples: The t-Distribution When either sample is small (n < 30), we cannot assume that the distribution of differences $(\bar{X}_1 - \bar{X}_2)$ follows a normal distribution. Therefore, we cannot use the Z-distribution. **Requirements for Using t-Distribution:** 1. **Populations are normally distributed** (or approximately normal) 2. **Population variances are unknown** When these conditions are met, we must use the **t-distribution**. ### Critical Question: Are the Variances Equal? An important consideration is whether the two population variances are equal ($\sigma_1^2 = \sigma_2^2$). ::: {.callout-note icon="🤔" appearance="minimal"} ## Why This Matters **The paradox:** How can we assume variances are equal if we don't know what they are? **The answer:** In many practical situations, there are reasonable grounds for assuming equal variances: - **Assembly line processes:** Machines periodically adjusted may have changing average fill levels, but variance in fill amounts remains constant - **Before-after comparisons:** Training may change average performance, but performance variability stays the same - **Quality control:** Defect rates may shift, but variation in defect patterns remains stable Later in this chapter, we'll present a formal **F-test** to statistically test whether two variances are equal. ::: We'll examine two approaches: 1. **Equal variances** ($\sigma_1^2 = \sigma_2^2$): Use pooled variance method 2. **Unequal variances** ($\sigma_1^2 \neq \sigma_2^2$): Use separate variance method with adjusted degrees of freedom --- ### 1. Equal Variances: Pooled Variance Method When population variances are equal, there exists some **common variance** $\sigma^2$ shared by both populations: $$\sigma_1^2 = \sigma_2^2 = \sigma^2$$ However, due to sampling error, the two sample variances $s_1^2$ and $s_2^2$ will likely differ from each other and from the common $\sigma^2$. **Solution:** **Pool** the data from both samples to obtain a single estimate of $\sigma^2$. ::: {.callout-important icon="⚡" appearance="simple"} ## Pooled Variance Formula $$s_p^2 = \frac{s_1^2(n_1 - 1) + s_2^2(n_2 - 1)}{n_1 + n_2 - 2}$$ This is a **weighted average** of the two sample variances, where the weights are the degrees of freedom ($n - 1$) for each sample. **Degrees of freedom:** $df = n_1 + n_2 - 2$ ::: **Confidence Interval Formula:** ::: {.callout-note icon="📊" appearance="minimal"} ## C.I. for $(\mu_1 - \mu_2)$ with Equal Variances (Small Samples) $$\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$$ Where: - $t$ = critical t-value with $n_1 + n_2 - 2$ degrees of freedom - $s_p^2$ = pooled variance estimate ::: --- ### Example: Vending Machine Beverage Dispenser **Scenario:** Quality Control for Student Cafeteria A vending machine in the student cafeteria dispenses beverages into paper cups. The facilities manager wants to know if a recent machine adjustment changed the average fill amount. **Sample Data:** | Timing | Sample Size | Mean | Variance | |:-------|:------------|:-----|:---------| | **Before adjustment** | n₁ = 15 | $\bar{X}_1 = 15.3$ oz | $s_1^2 = 3.5$ | | **After adjustment** | n₂ = 10 | $\bar{X}_2 = 17.1$ oz | $s_2^2 = 3.9$ | **Assumptions:** - Variance $\sigma^2$ is constant before and after adjustment - Dispensed amounts are normally distributed **Objective:** Construct a 95% confidence interval for the difference in average fill amounts. **Solution:** **Step 1: Calculate Pooled Variance** $$s_p^2 = \frac{s_1^2(n_1 - 1) + s_2^2(n_2 - 1)}{n_1 + n_2 - 2}$$ $$= \frac{3.5(14) + 3.9(9)}{15 + 10 - 2} = \frac{49 + 35.1}{23} = \frac{84.1}{23} = 3.66$$ **Step 2: Find Critical t-Value** - Confidence level: 95% → α = 0.05 - Degrees of freedom: $df = n_1 + n_2 - 2 = 15 + 10 - 2 = 23$ - From t-table: $t_{0.025, 23} = 2.069$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$$ $$= (15.3 - 17.1) \pm 2.069 \sqrt{3.66 \left(\frac{1}{15} + \frac{1}{10}\right)}$$ $$= -1.8 \pm 2.069 \sqrt{3.66(0.0667 + 0.1)} = -1.8 \pm 2.069\sqrt{3.66 \times 0.1667}$$ $$= -1.8 \pm 2.069\sqrt{0.6103} = -1.8 \pm 2.069(0.781) = -1.8 \pm 1.61$$ $$-3.41 \leq (\mu_1 - \mu_2) \leq -0.19$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(10, 4)) # Confidence interval lower = -3.41 upper = -0.19 point_est = -1.8 # Draw confidence interval ax.plot([lower, upper], [0, 0], 'b-', linewidth=3, label='95% Confidence Interval') ax.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2) ax.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2) ax.plot(point_est, 0, 'ro', markersize=12, label=f'Point Estimate: {point_est} oz') # Add zero reference line ax.axvline(0, color='red', linestyle='--', linewidth=1.5, alpha=0.5, label='Zero (no difference)') # Shade the region showing increase fill_x = [lower, upper] fill_y1 = [-0.02, -0.02] fill_y2 = [0.02, 0.02] ax.fill_between(fill_x, fill_y1, fill_y2, alpha=0.3, color='green', label='Adjustment increased fill') # Annotations ax.annotate(f'{lower} oz', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate(f'{upper} oz', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.set_xlim(-4, 1) ax.set_ylim(-0.25, 0.25) ax.set_xlabel('Difference in Mean Fill Amount ($\\mu_{before} - \\mu_{after}$) in ounces', fontsize=11) ax.set_title('95% CI: Vending Machine Fill Amount (Before vs. After Adjustment)', fontsize=12, fontweight='bold') ax.legend(fontsize=9, loc='upper left') ax.grid(alpha=0.3, axis='x') ax.set_yticks([]) plt.tight_layout() plt.savefig('09-vending-machine-CI.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** Subtracting the after-adjustment mean (17.1 oz) from the before-adjustment mean (15.3 oz) produces negative values for both endpoints. **Conclusion:** We can be 95% confident that the adjustment **increased** the average fill level by between **0.19 and 3.41 ounces**. The interval does **not contain zero**, confirming a real change occurred. --- ### Example 9.2: Labor Negotiations — Atlanta vs. Newport News **Scenario:** Wage Equity Analysis Labor negotiations between your company and the workers' union are on the verge of breaking down. There's considerable disagreement about average wage levels between workers at the Atlanta plant and the Newport News, Virginia plant. **Background:** - Wages were set by the old labor agreement three years ago - Wages are based strictly on seniority - Contract tightly controls wages → variance is same at both plants ✓ - Wages are normally distributed ✓ - Different seniority patterns → different average wages suspected **Your Task:** The management negotiator wants you to develop a **98% confidence interval** to estimate the difference between average wage levels. If a difference exists, adjustments must be made to bring lower wages up to match higher wages. **Sample Data:** | | Atlanta Plant | Newport News Plant | |:--|:--------------|:-------------------| | Sample size | $n_A = 23$ | $n_N = 19$ | | Sample mean | $\bar{X}_A = \$17.53$/hr | $\bar{X}_N = \$15.50$/hr | | Sample variance | $s_A^2 = 92.10$ | $s_N^2 = 87.10$ | **Solution:** **Step 1: Calculate Pooled Variance** $$s_p^2 = \frac{s_A^2(n_A - 1) + s_N^2(n_N - 1)}{n_A + n_N - 2}$$ $$= \frac{92.10(22) + 87.10(18)}{23 + 19 - 2} = \frac{2026.2 + 1567.8}{40} = \frac{3594}{40} = 89.85$$ **Step 2: Find Critical t-Value** - α = 0.02 (98% confidence level) - df = 23 + 19 - 2 = 40 - From t-table: $t_{0.01, 40} = 2.423$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (\bar{X}_A - \bar{X}_N) \pm t \sqrt{s_p^2 \left(\frac{1}{n_A} + \frac{1}{n_N}\right)}$$ $$= (17.53 - 15.50) \pm 2.423 \sqrt{89.85 \left(\frac{1}{23} + \frac{1}{19}\right)}$$ $$= 2.03 \pm 2.423 \sqrt{89.85(0.0435 + 0.0526)} = 2.03 \pm 2.423\sqrt{89.85 \times 0.0961}$$ $$= 2.03 \pm 2.423\sqrt{8.635} = 2.03 \pm 2.423(2.939) = 2.03 \pm 7.12$$ $$-5.09 \leq (\mu_A - \mu_N) \leq 9.15$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(11, 4.5)) # Confidence interval lower = -5.09 upper = 9.15 point_est = 2.03 # Draw confidence interval ax.plot([lower, upper], [0, 0], 'b-', linewidth=3.5, label='98% Confidence Interval') ax.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot(point_est, 0, 'go', markersize=13, label=f'Point Estimate: ${point_est}/hr') # Add zero reference line - CRITICAL ax.axvline(0, color='red', linestyle='--', linewidth=2.5, alpha=0.7, label='Zero (wages are equal)', zorder=5) # Shade regions neg_region = [lower, 0] pos_region = [0, upper] ax.fill_between(neg_region, [-0.03]*2, [0.03]*2, alpha=0.25, color='blue', label='Atlanta wages lower') ax.fill_between(pos_region, [-0.03]*2, [0.03]*2, alpha=0.25, color='orange', label='Atlanta wages higher') # Annotations ax.annotate(f'${lower}/hr', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate(f'${upper}/hr', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate('Interval CONTAINS zero!\nNo guaranteed difference', xy=(0, 0), xytext=(0, 0.17), fontsize=10.5, ha='center', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='red', lw=2)) ax.set_xlim(-8, 12) ax.set_ylim(-0.25, 0.25) ax.set_xlabel('Difference in Mean Wages ($\\mu_{Atlanta} - \\mu_{Newport}$) in $/hour', fontsize=11) ax.set_title('98% CI: Atlanta vs. Newport News Wage Levels', fontsize=13, fontweight='bold') ax.legend(fontsize=9, loc='upper right') ax.grid(alpha=0.3, axis='x') ax.set_yticks([]) plt.tight_layout() plt.savefig('09-wage-comparison-CI.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** We can be 98% confident that the average Atlanta wage is between: - **$5.09 less** than Newport News wages, OR - **$9.15 more** than Newport News wages ::: {.callout-important icon="⚡" appearance="simple"} ## Critical Finding: Interval Contains Zero Because this interval **contains $0**, we can be 98% confident that **no difference exists** in average wages between the two plants. **Business Recommendation:** No wage adjustment is warranted. The apparent difference of $2.03/hr could easily be due to sampling variation rather than a true population difference. ::: --- ## 2. Unequal Variances: Separate Variance Method When population variances are **unequal** ($\sigma_1^2 \neq \sigma_2^2$), or there's no evidence to assume equality, the pooled variance method doesn't apply. **The Challenge:** The distribution of $(\bar{X}_1 - \bar{X}_2)$ doesn't follow a t-distribution with $n_1 + n_2 - 2$ degrees of freedom. No exact distribution has been found, only approximations. **The Solution:** Use a **modified t-statistic** ($t'$) with **adjusted degrees of freedom**. ::: {.callout-note icon="📊" appearance="minimal"} ## Adjusted Degrees of Freedom (Welch-Satterthwaite) $$\text{df} = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}$$ **Rule:** If df is fractional, **round DOWN** to the nearest integer. ::: **Confidence Interval Formula:** ::: {.callout-important icon="⚡" appearance="simple"} ## C.I. for $(\mu_1 - \mu_2)$ with Unequal Variances $$\text{C.I. for } (\mu_1 - \mu_2) = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ Where: - $t'$ = critical t-value with adjusted df - No pooling of variances ::: --- ### Example: IBM Executive Training Programs **Scenario:** Wall Street Journal Report The Wall Street Journal described two training programs used by IBM. A comparison is needed to determine if one program is more effective. **Sample Data:** | Program | Sample Size | Mean Score | Variance | |:--------|:------------|:-----------|:---------| | **Program 1** | n₁ = 12 | $\bar{X}_1 = 73.5$ | $s_1^2 = 100.2$ | | **Program 2** | n₂ = 15 | $\bar{X}_2 = 79.8$ | $s_2^2 = 121.3$ | **Note:** Variances appear different → use separate variance method **Objective:** Construct a 95% confidence interval for the difference in average scores. **Solution:** **Step 1: Calculate Adjusted Degrees of Freedom** $$\text{df} = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}$$ $$= \frac{\left(\frac{100.2}{12} + \frac{121.3}{15}\right)^2}{\frac{(100.2/12)^2}{11} + \frac{(121.3/15)^2}{14}}$$ $$= \frac{(8.35 + 8.087)^2}{\frac{(8.35)^2}{11} + \frac{(8.087)^2}{14}} = \frac{(16.437)^2}{\frac{69.72}{11} + \frac{65.40}{14}}$$ $$= \frac{270.18}{6.338 + 4.671} = \frac{270.18}{11.009} = 24.55$$ Round down: **df = 24** **Step 2: Find Critical t'-Value** - 95% confidence level - df = 24 - From t-table: $t'_{0.025, 24} = 2.064$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ $$= (73.5 - 79.8) \pm 2.064 \sqrt{\frac{100.2}{12} + \frac{121.3}{15}}$$ $$= -6.3 \pm 2.064 \sqrt{8.35 + 8.087} = -6.3 \pm 2.064\sqrt{16.437}$$ $$= -6.3 \pm 2.064(4.054) = -6.3 \pm 8.36$$ $$-14.66 \leq (\mu_1 - \mu_2) \leq 2.06$$ **Interpretation:** Because the interval **contains zero**, there's no strong evidence of a difference in training program effectiveness. Either program appears equally suitable for training IBM executives. --- ### Example 9.3: Acme Ltd. Rubber Shock Absorbers **Scenario:** Product Durability Comparison Acme Ltd. sells two types of rubber shock absorbers for baby carriages. Wear tests measure durability. **Sample Data:** | Type | Sample Size | Mean Duration | Std Dev | |:-----|:------------|:--------------|:--------| | **Type 1** | n₁ = 13 | $\bar{X}_1 = 11.3$ weeks | s₁ = 3.5 weeks | | **Type 2** | n₂ = 10 | $\bar{X}_2 = 7.5$ weeks | s₂ = 2.7 weeks | **Business Context:** - Type 1 is more expensive to manufacture - CEO will only use Type 1 if it lasts **at least 8 weeks longer** than Type 2 - CEO tolerates only **2% probability of error** (α = 0.02) - **No evidence** that variances are equal → use separate variance method **Solution:** **Step 1: Calculate Adjusted Degrees of Freedom** $$\text{df} = \frac{\left[\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}\right]^2}{\frac{[(3.5)^2/13]^2}{12} + \frac{[(2.7)^2/10]^2}{9}}$$ $$= \frac{\left[\frac{12.25}{13} + \frac{7.29}{10}\right]^2}{\frac{(0.942)^2}{12} + \frac{(0.729)^2}{9}}$$ $$= \frac{(0.942 + 0.729)^2}{\frac{0.887}{12} + \frac{0.531}{9}} = \frac{(1.671)^2}{0.0739 + 0.059}$$ $$= \frac{2.792}{0.1329} = 20.99 \approx 20 \text{ (round down)}$$ **Step 2: Find Critical t'-Value** - 98% confidence level (α = 0.02) - df = 20 - From t-table: $t'_{0.01, 20} = 2.528$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (11.3 - 7.5) \pm 2.528 \sqrt{\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}}$$ $$= 3.8 \pm 2.528 \sqrt{0.942 + 0.729} = 3.8 \pm 2.528\sqrt{1.671}$$ $$= 3.8 \pm 2.528(1.293) = 3.8 \pm 3.3$$ $$0.5 \leq (\mu_1 - \mu_2) \leq 7.1 \text{ weeks}$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(11, 5)) # Confidence interval lower = 0.5 upper = 7.1 point_est = 3.8 required = 8.0 # Draw confidence interval ax.plot([lower, upper], [0, 0], 'b-', linewidth=3.5, label='98% Confidence Interval') ax.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot(point_est, 0, 'go', markersize=13, label=f'Point Estimate: {point_est} weeks') # Add required threshold line ax.axvline(required, color='red', linestyle='--', linewidth=2.5, alpha=0.8, label='Required: 8 weeks minimum', zorder=5) # Shade regions below_required = [lower, min(upper, required)] ax.fill_between(below_required, [-0.03]*len(below_required), [0.03]*len(below_required), alpha=0.3, color='orange', label='Below requirement') # Annotations ax.annotate(f'{lower} wks', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate(f'{upper} wks', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate('Required difference\nNOT in interval!', xy=(required, 0), xytext=(required, 0.17), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='red', lw=2)) ax.set_xlim(-1, 10) ax.set_ylim(-0.25, 0.25) ax.set_xlabel('Difference in Mean Duration ($\\mu_{Type1} - \\mu_{Type2}$) in weeks', fontsize=11) ax.set_title('98% CI: Type 1 vs. Type 2 Shock Absorber Durability', fontsize=13, fontweight='bold') ax.legend(fontsize=9, loc='upper left') ax.grid(alpha=0.3, axis='x') ax.set_yticks([]) plt.tight_layout() plt.savefig('09-acme-shock-absorbers-CI.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** Acme can be 98% confident that Type 1 lasts between **0.5 and 7.1 weeks longer** than Type 2. ::: {.callout-warning icon="🛡️" appearance="simple"} ## Business Decision: Do NOT Use Type 1 The required difference of **8 weeks is NOT in the interval**. Even at the upper limit (7.1 weeks), Type 1 doesn't meet the CEO's criterion. **Recommendation:** Continue using Type 2 (the less expensive option) since Type 1 doesn't provide sufficient additional durability to justify its higher manufacturing cost. ::: --- ## Section Exercises: Small Sample Confidence Intervals **6.** **Conceptual Question**: What conditions must the t-distribution meet before it can be used for two-population inference? **7.** **Croc Aid vs. Energy Pro**: Seventeen cans of Croc Aid show a mean of 17.2 ounces with a standard deviation of 3.2 ounces, and 13 cans of Energy Pro produce a mean of 18.1 ounces and s = 2.7 ounces. Assuming equal variances and normal distributions in population weights, what conclusions can be drawn regarding the difference in average weights based on a 98% confidence interval? **8.** **Grow-rite Fertilizer Complaint**: Grow-rite sells commercial fertilizer produced in two plants (Atlanta and Dallas). Recent customer complaints suggest Atlanta shipments are underweight compared to Dallas shipments. If 10 boxes from Atlanta average 96.3 pounds with s = 12.5, and 15 boxes from Dallas average 101.1 with s = 10.3, does a 99% confidence interval confirm this complaint? Assume equal variances. **9.** **Opus Gold Extraction**: Opus, Inc. has developed a process for producing gold from seawater. Fifteen gallons from the Pacific Ocean produced a mean of 12.7 ounces of gold per gallon with s = 4.2 ounces, and 12 gallons from the Atlantic Ocean produced similar figures of 15.9 and 1.7. Based on a 95% interval, what is your estimate of the difference in average ounces of gold from these two sources? There's no reason to assume variances are equal. **10.** **Ralphie's Apartment Search**: Ralphie starts college next fall. He samples apartments on the north and south ends of the city to see if there's a difference in average rents. - North apartments: $600, $650, $530, $800, $750, $700, $750 - South apartments: $500, $450, $800, $650, $500, $500, $450, $400 If there's no evidence that variances are equal, what does a 99% interval tell Ralphie about the difference in average rents? **11.** **Bigelow Products Sales Comparison**: Bigelow Products wants to develop a 95% interval to estimate the difference in average weekly sales in two target markets. A sample of 9 weeks in market 1 produced a mean and standard deviation (in hundreds of dollars) of 5.72 and 1.008 respectively. Comparable figures for market 2, based on a sample of 10 weeks, were 8.72 and 1.208. Assuming equal variances, what results do they report? **12.** **U.S. Manufacturing Supplier Comparison**: U.S. Manufacturing buys raw materials from two suppliers. Management is concerned about production delays from late shipment deliveries. A sample of 10 shipments from Supplier A have an average delivery time of 6.8 days and s = 2.57 days, while 12 shipments from Supplier B average 4.08 days and s = 1.93. If equal variances cannot be assumed, what recommendation would you make based on a 90% interval for the difference in average delivery times? --- **End of Stage 2** This completes the second stage covering: - Small sample confidence intervals using t-distribution - Pooled variance method (equal variances assumed) - Separate variance method (unequal variances) - Adjusted degrees of freedom calculation (Welch-Satterthwaite) - Multiple worked examples with business interpretations - Python visualizations of confidence intervals - 7 comprehensive section exercises **Coming in Stage 3:** - Paired samples (matched pairs) analysis - Confidence intervals for difference between two proportions - Sample size determination - Complete examples and exercises # Two-Population Inferences - Stage 3 ## 9.3 Paired Samples: Matched Pairs Analysis In the previous sections, we examined **independent samples**—observations from one population are completely unrelated to observations from the second population. Now we examine **paired samples** (also called **matched pairs** or **dependent samples**). ::: {.callout-note icon="👥" appearance="minimal"} ## What Are Paired Samples? **Paired samples** occur when observations are matched in pairs, creating a natural correspondence between measurements. **Common pairing scenarios:** 1. **Before-After comparisons**: Same subject measured twice - Employee performance before and after training - Hospital costs before and after new billing system - Product quality before and after process improvement 2. **Matched subjects**: Different subjects paired by characteristics - Twins in medical studies - Similar stores in different locations - Matched competitors in market research 3. **Repeated measurements**: Same unit tested under different conditions - Car performance with different fuels - Machine output with different settings - Customer satisfaction across product versions ::: ### Why Use Paired Samples? **Key Advantage:** Pairing **reduces variability** by controlling for differences between subjects. **Example:** Testing a training program's effectiveness - **Independent samples approach**: Compare Group A (trained) vs. Group B (untrained) - Problem: Groups may differ in experience, education, motivation - Result: High variability masks training effect - **Paired samples approach**: Measure each person before AND after training - Benefit: Each person serves as their own control - Result: Lower variability, more powerful test ### The Paired Difference Approach Instead of analyzing two separate samples, we analyze **one sample of differences**. ::: {.callout-important icon="⚡" appearance="simple"} ## Key Transformation: Two Samples → One Sample For each pair $(X_{1i}, X_{2i})$, calculate the difference: $$d_i = X_{1i} - X_{2i}$$ Then analyze the **differences** using single-sample methods! - Sample mean of differences: $\bar{d} = \frac{\sum d_i}{n}$ - Sample standard deviation: $s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}$ - Sample size: $n$ = number of pairs ::: --- ### Confidence Interval for Paired Differences ::: {.callout-note icon="📊" appearance="minimal"} ## C.I. for $(\mu_1 - \mu_2)$ Using Paired Samples $$\text{C.I. for } (\mu_1 - \mu_2) = \bar{d} \pm t \frac{s_d}{\sqrt{n}}$$ Where: - $\bar{d}$ = mean of the paired differences - $s_d$ = standard deviation of the differences - $t$ = critical t-value with $n-1$ degrees of freedom - $n$ = number of pairs **Degrees of freedom:** $df = n - 1$ (NOT $2n - 2$!) ::: --- ### Example: Employee Training Program Effectiveness **Scenario:** Management Development Assessment A company instituted a training program to improve employee performance scores. To evaluate effectiveness, 10 employees were scored before and after training. **Sample Data:** | Employee | Before Training | After Training | Difference (d) | |:---------|:---------------:|:--------------:|:--------------:| | 1 | 72 | 78 | -6 | | 2 | 65 | 71 | -6 | | 3 | 83 | 89 | -6 | | 4 | 91 | 93 | -2 | | 5 | 58 | 68 | -10 | | 6 | 77 | 82 | -5 | | 7 | 69 | 75 | -6 | | 8 | 74 | 79 | -5 | | 9 | 88 | 91 | -3 | | 10 | 81 | 86 | -5 | *Note: $d_i = \text{Before} - \text{After}$ (negative values indicate improvement)* **Objective:** Construct a 95% confidence interval for the mean improvement. **Solution:** **Step 1: Calculate Mean Difference** $$\bar{d} = \frac{\sum d_i}{n} = \frac{-54}{10} = -5.4$$ **Step 2: Calculate Standard Deviation** First, find $\sum d_i^2$: $$\sum d_i^2 = (-6)^2 + (-6)^2 + (-6)^2 + (-2)^2 + (-10)^2 + (-5)^2 + (-6)^2 + (-5)^2 + (-3)^2 + (-5)^2$$ $$= 36 + 36 + 36 + 4 + 100 + 25 + 36 + 25 + 9 + 25 = 332$$ Then: $$s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}} = \sqrt{\frac{332 - 10(-5.4)^2}{9}}$$ $$= \sqrt{\frac{332 - 291.6}{9}} = \sqrt{\frac{40.4}{9}} = \sqrt{4.489} = 2.12$$ **Step 3: Find Critical t-Value** - 95% confidence level - df = n - 1 = 10 - 1 = 9 - From t-table: $t_{0.025, 9} = 2.262$ **Step 4: Calculate Confidence Interval** $$\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}}$$ $$= -5.4 \pm 2.262 \times \frac{2.12}{\sqrt{10}}$$ $$= -5.4 \pm 2.262 \times 0.671 = -5.4 \pm 1.52$$ $$-6.92 \leq (\mu_{\text{before}} - \mu_{\text{after}}) \leq -3.88$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5)) # Left plot: Before-After comparison employees = np.arange(1, 11) before = np.array([72, 65, 83, 91, 58, 77, 69, 74, 88, 81]) after = np.array([78, 71, 89, 93, 68, 82, 75, 79, 91, 86]) ax1.plot(employees, before, 'ro-', linewidth=2, markersize=8, label='Before Training') ax1.plot(employees, after, 'go-', linewidth=2, markersize=8, label='After Training') # Draw lines connecting pairs for i in range(len(employees)): ax1.plot([employees[i], employees[i]], [before[i], after[i]], 'b--', alpha=0.3, linewidth=1) # Add improvement arrow if after[i] > before[i]: ax1.annotate('', xy=(employees[i], after[i]), xytext=(employees[i], before[i]), arrowprops=dict(arrowstyle='->', color='green', lw=1.5, alpha=0.6)) ax1.set_xlabel('Employee', fontsize=11) ax1.set_ylabel('Performance Score', fontsize=11) ax1.set_title('Before vs. After Training Scores\n(Paired Observations)', fontsize=12, fontweight='bold') ax1.legend(fontsize=10) ax1.grid(alpha=0.3) ax1.set_xticks(employees) # Right plot: Confidence interval for mean difference lower = -6.92 upper = -3.88 point_est = -5.4 ax2.plot([lower, upper], [0, 0], 'b-', linewidth=3.5, label='95% Confidence Interval') ax2.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2.5) ax2.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2.5) ax2.plot(point_est, 0, 'go', markersize=13, label=f'Mean Difference: {point_est} points') # Add zero reference line ax2.axvline(0, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Zero (no change)', zorder=5) # Shade improvement region improve_region = [lower, upper] ax2.fill_between(improve_region, [-0.03]*2, [0.03]*2, alpha=0.3, color='green', label='Training improved scores') # Annotations ax2.annotate(f'{lower} pts', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax2.annotate(f'{upper} pts', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax2.annotate('Interval does NOT\ncontain zero!\nTraining is effective', xy=(point_est, 0), xytext=(point_est, 0.17), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='green', lw=2)) ax2.set_xlim(-8, 1) ax2.set_ylim(-0.25, 0.25) ax2.set_xlabel('Mean Difference (Before - After) in points', fontsize=11) ax2.set_title('95% CI: Training Program Effect', fontsize=12, fontweight='bold') ax2.legend(fontsize=9, loc='upper left') ax2.grid(alpha=0.3, axis='x') ax2.set_yticks([]) plt.tight_layout() plt.savefig('09-training-paired-analysis.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** Converting to positive values (After - Before), we can be 95% confident that training improves average performance by **3.88 to 6.92 points**. ::: {.callout-important icon="✅" appearance="simple"} ## Business Conclusion: Training Is Effective The interval **does not contain zero**, providing strong evidence that training genuinely improves employee performance. The company should continue the program. **Estimated ROI:** With average improvement of 5.4 points, management can justify training investment based on productivity gains. ::: --- ### Example 9.4: Hospital Billing Comparison (Vicki Peplow) **Scenario:** Healthcare Cost Analysis Vicki Peplow works for a large corporation that self-insures medical costs. The company has negotiated contracts with two hospitals. Vicki wants to determine if there's a difference in average costs between the hospitals for **identical procedures**. **Strategy:** Use **paired samples** by matching the same procedures at both hospitals. **Sample Data:** 15 common procedures | Procedure | Hospital 1 Cost | Hospital 2 Cost | Difference (d) | |:----------|:---------------:|:---------------:|:--------------:| | Appendectomy | $4,250 | $4,180 | $70 | | Cholecystectomy | $6,780 | $6,920 | -$140 | | Hernia repair | $3,450 | $3,610 | -$160 | | Hysterectomy | $7,200 | $7,050 | $150 | | Knee arthroscopy | $5,100 | $5,280 | -$180 | | Cataract surgery | $2,800 | $2,650 | $150 | | Tonsillectomy | $1,950 | $2,100 | -$150 | | Colonoscopy | $1,680 | $1,720 | -$40 | | Hip replacement | $18,500 | $18,700 | -$200 | | Angioplasty | $12,300 | $12,150 | $150 | | Mastectomy | $8,900 | $9,200 | -$300 | | Spinal fusion | $22,100 | $22,350 | -$250 | | Cesarean section | $6,400 | $6,300 | $100 | | Cardiac bypass | $35,200 | $35,100 | $100 | | Total knee | $24,500 | $24,800 | -$300 | **Calculations:** $$\sum d_i = -884$$ $$\sum d_i^2 = 400,716$$ **Objective:** Construct a 95% confidence interval for the difference in average costs. **Solution:** **Step 1: Calculate Mean Difference** $$\bar{d} = \frac{\sum d_i}{n} = \frac{-884}{15} = -58.93$$ **Step 2: Calculate Standard Deviation** $$s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}$$ $$= \sqrt{\frac{400,716 - 15(-58.93)^2}{14}}$$ $$= \sqrt{\frac{400,716 - 52,099}{14}} = \sqrt{\frac{348,617}{14}}$$ $$= \sqrt{24,901.2} = 157.8$$ **Step 3: Find Critical t-Value** - 95% confidence level - df = 15 - 1 = 14 - From t-table: $t_{0.025, 14} = 2.145$ **Step 4: Calculate Confidence Interval** $$\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}}$$ $$= -58.93 \pm 2.145 \times \frac{157.8}{\sqrt{15}}$$ $$= -58.93 \pm 2.145 \times 40.76 = -58.93 \pm 87.4$$ $$-146.33 \leq (\mu_1 - \mu_2) \leq 28.47$$ ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(11, 5)) # Confidence interval lower = -146.33 upper = 28.47 point_est = -58.93 # Draw confidence interval ax.plot([lower, upper], [0, 0], 'b-', linewidth=3.5, label='95% Confidence Interval') ax.plot([lower, lower], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot([upper, upper], [-0.05, 0.05], 'b-', linewidth=2.5) ax.plot(point_est, 0, 'ro', markersize=13, label=f'Mean Difference: ${point_est}') # Add zero reference line - CRITICAL ax.axvline(0, color='red', linestyle='--', linewidth=2.5, alpha=0.7, label='Zero (equal costs)', zorder=5) # Shade regions neg_region = [lower, 0] pos_region = [0, upper] ax.fill_between(neg_region, [-0.03]*2, [0.03]*2, alpha=0.25, color='blue', label='Hospital 1 cheaper') ax.fill_between(pos_region, [-0.03]*2, [0.03]*2, alpha=0.25, color='orange', label='Hospital 2 cheaper') # Annotations ax.annotate(f'${lower}', xy=(lower, 0), xytext=(lower, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate(f'${upper}', xy=(upper, 0), xytext=(upper, -0.15), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightblue')) ax.annotate('Interval CONTAINS zero!\nNo guaranteed difference', xy=(0, 0), xytext=(0, 0.17), fontsize=10.5, ha='center', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='red', lw=2)) ax.set_xlim(-180, 60) ax.set_ylim(-0.25, 0.25) ax.set_xlabel('Difference in Mean Cost ($\\mu_{Hospital1} - \\mu_{Hospital2}$) in dollars', fontsize=11) ax.set_title('95% CI: Hospital 1 vs. Hospital 2 Procedure Costs (Paired Data)', fontsize=13, fontweight='bold') ax.legend(fontsize=9, loc='upper right') ax.grid(alpha=0.3, axis='x') ax.set_yticks([]) plt.tight_layout() plt.savefig('09-hospital-costs-paired-CI.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** We can be 95% confident that the difference in average costs is between: - **$146.33 less at Hospital 1**, OR - **$28.47 less at Hospital 2** ::: {.callout-note icon="💼" appearance="minimal"} ## Business Recommendation: No Clear Cost Advantage Because this interval **contains zero**, there's no statistically significant difference in costs between hospitals at the 95% confidence level. **Vicki's Report:** "Our analysis of 15 common procedures shows no consistent cost difference between Hospital 1 and Hospital 2. The company can negotiate with either hospital without concern about systematic cost differences." **Additional Consideration:** Other factors (quality ratings, location convenience, specialist availability) should guide hospital contract decisions. ::: --- ## 9.4 Confidence Intervals for Two Proportions Many business decisions require comparing **proportions** from two populations: - Defect rates from two production processes - Customer satisfaction rates across product versions - Default rates between loan portfolios - Click-through rates for two ad campaigns ### Point Estimate for Difference The point estimate for the difference between population proportions is: $$\text{Point estimate: } p_1 - p_2$$ Where $p_1$ and $p_2$ are the sample proportions. ### Standard Error of the Difference ::: {.callout-note icon="📊" appearance="minimal"} ## Standard Error for $(p_1 - p_2)$ $$s_{p_1 - p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$ **Requirements for using Z-distribution:** - Both $n_1 p_1 \geq 5$ and $n_1(1-p_1) \geq 5$ - Both $n_2 p_2 \geq 5$ and $n_2(1-p_2) \geq 5$ ::: ### Confidence Interval Formula ::: {.callout-important icon="⚡" appearance="simple"} ## C.I. for $(\pi_1 - \pi_2)$ $$\text{C.I. for } (\pi_1 - \pi_2) = (p_1 - p_2) \pm Z \cdot s_{p_1-p_2}$$ Where: - $p_1, p_2$ = sample proportions - $Z$ = critical Z-value for desired confidence level - $s_{p_1-p_2}$ = standard error of the difference ::: --- ### Example: Worker Absenteeism Study **Scenario:** Human Resources Analysis A large manufacturing company suspects that **night shift workers** have higher absenteeism rates than **day shift workers**. HR collects data from both shifts. **Sample Data:** | Shift | Sample Size | Number Absent | Sample Proportion | |:------|:-----------:|:-------------:|:-----------------:| | **Night shift** | n₁ = 150 | 22 | p₁ = 0.147 | | **Day shift** | n₂ = 150 | 14 | p₂ = 0.093 | **Objective:** Construct a 90% confidence interval for the difference in absenteeism rates. **Solution:** **Step 1: Verify Sample Size Requirements** - Night shift: $n_1 p_1 = 150(0.147) = 22 \geq 5$ ✓ and $n_1(1-p_1) = 150(0.853) = 128 \geq 5$ ✓ - Day shift: $n_2 p_2 = 150(0.093) = 14 \geq 5$ ✓ and $n_2(1-p_2) = 150(0.907) = 136 \geq 5$ ✓ **Step 2: Calculate Standard Error** $$s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$ $$= \sqrt{\frac{0.147(0.853)}{150} + \frac{0.093(0.907)}{150}}$$ $$= \sqrt{\frac{0.1254}{150} + \frac{0.0844}{150}} = \sqrt{0.000836 + 0.000563}$$ $$= \sqrt{0.001399} = 0.0374$$ **Step 3: Find Critical Z-Value** - 90% confidence level → α = 0.10 - $Z_{0.05} = 1.645$ **Step 4: Calculate Confidence Interval** $$\text{C.I.} = (p_1 - p_2) \pm Z \cdot s_{p_1-p_2}$$ $$= (0.147 - 0.093) \pm 1.645(0.0374)$$ $$= 0.054 \pm 0.0615$$ $$-0.0075 \leq (\pi_1 - \pi_2) \leq 0.1155$$ Or: **-0.75% ≤ (π₁ - π₂) ≤ 11.55%** **Interpretation:** Because the interval **contains zero** (-0.75% to +11.55%), we cannot conclusively state that night shift absenteeism is higher at the 90% confidence level. **HR Recommendation:** While the data suggests night shift may have up to 11.55% higher absenteeism, the difference could also be as low as -0.75% (day shift slightly higher). More data would be needed for a definitive conclusion. --- ### Example 9.5: Ice Capades Costume Defects **Scenario:** Quality Control for Entertainment Production Ice Capades produces elaborate costumes using two manufacturing methods. Quality control inspects costumes for defects. **Sample Data:** | Method | Costumes Inspected | Defective | Defect Rate | |:-------|:------------------:|:---------:|:-----------:| | **Method A** | n₁ = 200 | 42 | p₁ = 0.21 | | **Method B** | n₂ = 250 | 65 | p₂ = 0.26 | **Objective:** Construct a 95% confidence interval for the difference in defect rates. **Solution:** **Step 1: Calculate Standard Error** $$s_{p_1-p_2} = \sqrt{\frac{0.21(0.79)}{200} + \frac{0.26(0.74)}{250}}$$ $$= \sqrt{\frac{0.1659}{200} + \frac{0.1924}{250}}$$ $$= \sqrt{0.0008295 + 0.0007696} = \sqrt{0.0015991} = 0.04$$ **Step 2: Find Critical Z-Value** - 95% confidence level - $Z_{0.025} = 1.96$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (0.21 - 0.26) \pm 1.96(0.04)$$ $$= -0.05 \pm 0.0784$$ $$-0.1284 \leq (\pi_A - \pi_B) \leq 0.0284$$ Or: **-12.84% ≤ (π_A - π_B) ≤ 2.84%** **Interpretation:** Method A could have a defect rate that is: - As much as **12.84% lower** than Method B (Method A better), OR - As much as **2.84% higher** than Method B (Method B better) ::: {.callout-tip icon="💡" appearance="simple"} ## Business Decision: Methods Appear Equivalent The interval contains zero, suggesting no statistically significant difference at 95% confidence. Ice Capades can choose either method based on **cost, production speed, or other non-quality factors**. If Method A is less expensive, use Method A. If Method B is faster, use Method B. Quality differences are not proven. ::: --- ## 9.5 Sample Size Determination for Two-Population Studies When planning a study comparing two populations, researchers must determine adequate sample sizes. ### For Comparing Two Means ::: {.callout-note icon="📊" appearance="minimal"} ## Sample Size Formula for Two Means (Equal n) $$n = \frac{Z^2(\sigma_1^2 + \sigma_2^2)}{E^2}$$ Where: - $Z$ = critical Z-value for desired confidence level - $\sigma_1^2, \sigma_2^2$ = population variances (estimate from pilot studies) - $E$ = desired margin of error (maximum error) - Both samples use the same size $n$ ::: **Example:** Estimate the difference in average customer wait times between two service centers within ±2 minutes at 95% confidence. From pilot data: $\sigma_1 = 8$ minutes, $\sigma_2 = 10$ minutes. $$n = \frac{(1.96)^2(64 + 100)}{(2)^2} = \frac{3.8416(164)}{4} = \frac{630.02}{4} = 157.5$$ **Required sample size:** $n = 158$ customers from each center. ### For Comparing Two Proportions ::: {.callout-note icon="📊" appearance="minimal"} ## Sample Size Formula for Two Proportions (Equal n) $$n = \frac{Z^2[\pi_1(1-\pi_1) + \pi_2(1-\pi_2)]}{E^2}$$ **If no prior estimates:** Use conservative $\pi_1 = \pi_2 = 0.5$ $$n = \frac{Z^2(0.5)}{E^2} = \frac{0.5Z^2}{E^2}$$ ::: **Example:** Estimate the difference in customer satisfaction rates between two product versions within ±5% at 99% confidence. No prior data available. $$n = \frac{0.5(2.576)^2}{(0.05)^2} = \frac{0.5(6.636)}{0.0025} = \frac{3.318}{0.0025} = 1,327.2$$ **Required sample size:** $n = 1,328$ customers per product version. --- ## Section Exercises: Paired Samples and Proportions **13.** **Paired Data Interpretation**: A confidence interval for paired data yields: $-12.5 \leq (\mu_1 - \mu_2) \leq 5.3$. What can you conclude about the relationship between the two population means? **14.** **Magazine Subscription Prices**: A consumer group wants to compare subscription prices between two magazines available at newsstands and through mail subscriptions. They sample 8 magazines: | Magazine | Newsstand | Mail | Difference | |:---------|:---------:|:----:|:----------:| | Time | $4.95 | $3.50 | $1.45 | | Newsweek | $4.50 | $3.25 | $1.25 | | Fortune | $5.95 | $4.75 | $1.20 | | Sports Illustrated | $3.95 | $2.95 | $1.00 | | Vogue | $4.25 | $3.50 | $0.75 | | Business Week | $5.50 | $4.50 | $1.00 | | The Economist | $6.95 | $5.95 | $1.00 | | National Geographic | $3.50 | $2.75 | $0.75 | Construct a 90% confidence interval for the average price difference. **15.** **Diet Program Effectiveness**: A weight-loss clinic measures 12 clients before and after a 6-week program: **Before (lbs):** 185, 220, 198, 175, 210, 195, 188, 203, 225, 192, 178, 207 **After (lbs):** 178, 210, 192, 170, 201, 189, 183, 196, 215, 186, 175, 200 Calculate a 95% confidence interval for the mean weight loss. **16.** **Department Store Credit Usage**: Two samples of credit customers show: - Downtown store: 120 customers, 69 used credit (p₁ = 0.575) - Suburban store: 150 customers, 73 used credit (p₂ = 0.487) Construct a 99% confidence interval for the difference in credit usage rates. **17.** **Manufacturing Defect Rates**: Process A produces 500 units with 28 defects. Process B produces 700 units with 31 defects. Find a 90% confidence interval for the difference in defect rates. **18.** **Sample Size for Salary Comparison**: A consultant wants to estimate the difference in average salaries between two regions within ±$5,000 at 95% confidence. Pilot data suggests $\sigma_1 = \$18,000$ and $\sigma_2 = \$22,000$. What sample size is needed? **19.** **Sample Size for Market Share**: A company wants to estimate the difference in market share between two products within ±3% at 99% confidence. No prior data exists. What sample size is required? **20.** **Advertising Effectiveness (Paired)**: A retailer measures daily sales for 7 days before and after an advertising campaign: **Before:** $12,500, $11,800, $13,200, $12,100, $11,900, $12,700, $13,500 **After:** $13,800, $12,900, $14,100, $13,200, $12,800, $13,900, $14,700 Does a 95% confidence interval suggest the campaign increased sales? **21.** **Employee Turnover Rates**: Company A (n=300) had 42 employees leave last year. Company B (n=250) had 28 employees leave. Construct a 95% confidence interval for the difference in turnover rates and interpret the business implications. --- **End of Stage 3** This completes the third stage covering: - Paired samples (matched pairs) methodology - Why pairing reduces variability - Confidence intervals for paired differences - Employee training example - Hospital billing comparison (Example 9.4) - Confidence intervals for two proportions - Standard error for difference in proportions - Worker absenteeism study - Ice Capades costume defects (Example 9.5) - Sample size determination for means and proportions - 9 comprehensive section exercises **Coming in Stage 4:** - Hypothesis testing for two means (independent samples) - Hypothesis testing for paired samples - Hypothesis testing for two proportions - Golf course playing times example - Johnson Manufacturing defect rates - Complete examples with business interpretations # Two-Population Inferences - Stage 4 ## 9.6 Hypothesis Testing for Two Population Means In previous sections, we focused on **interval estimation**—constructing confidence intervals to estimate the difference between population means. Now we turn to **hypothesis testing**—making decisions about whether a claimed difference exists. ::: {.callout-note icon="🎯" appearance="minimal"} ## Hypothesis Testing vs. Confidence Intervals **Both methods analyze the same data**, but serve different purposes: **Confidence Intervals:** - **Estimate** the magnitude of difference - **Range of plausible values** - Example: "The difference is between 2.5 and 7.8 units" **Hypothesis Tests:** - **Decide** whether a specific difference exists - **Yes/No decision** at specified significance level - Example: "Is there evidence that μ₁ > μ₂?" **Important Connection:** A confidence interval that **contains zero** corresponds to **failing to reject** H₀: μ₁ = μ₂ ::: ### General Framework for Hypothesis Tests The test statistic structure mirrors the confidence interval approach: $$\text{Test Statistic} = \frac{(\text{Sample Difference}) - (\text{Hypothesized Difference})}{\text{Standard Error}}$$ **Common null hypothesis:** $H_0: \mu_1 = \mu_2$ (equivalent to $\mu_1 - \mu_2 = 0$) **Alternative hypotheses (three possibilities):** 1. **Two-tailed:** $H_A: \mu_1 \neq \mu_2$ (difference exists, direction unknown) 2. **Right-tailed:** $H_A: \mu_1 > \mu_2$ (first population mean is greater) 3. **Left-tailed:** $H_A: \mu_1 < \mu_2$ (first population mean is smaller) --- ## A. Hypothesis Tests with Large Samples When both samples are large (n₁ ≥ 30 and n₂ ≥ 30), we use the **Z-test**. ::: {.callout-important icon="⚡" appearance="simple"} ## Z-Test for Two Means (Large Samples) $$Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_{\bar{X}_1 - \bar{X}_2}}$$ Where: $$s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ **Decision Rule:** - Two-tailed: Reject H₀ if |Z| > Z_{α/2} - Right-tailed: Reject H₀ if Z > Z_α - Left-tailed: Reject H₀ if Z < -Z_α ::: --- ### Example: Golf Course Playing Times (Men vs. Women) **Scenario:** Course Management Resource Allocation The manager of Pine Valley Golf Course wants to determine if there's a difference in average playing times between **men** and **women**. This information will help schedule tee times and allocate course resources. **Sample Data:** | Group | Sample Size | Mean Time | Std Dev | |:------|:-----------:|:---------:|:-------:| | **Men** | n_m = 100 | $\bar{X}_m = 4.2$ hrs | s_m = 0.8 hrs | | **Women** | n_w = 75 | $\bar{X}_w = 4.7$ hrs | s_w = 0.6 hrs | **Hypotheses:** $$H_0: \mu_m = \mu_w \quad \text{(no difference in average playing times)}$$ $$H_A: \mu_m \neq \mu_w \quad \text{(playing times differ)}$$ **Significance Level:** α = 0.01 (1%) **Solution:** **Step 1: Calculate Standard Error** $$s_{\bar{X}_m - \bar{X}_w} = \sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}}$$ $$= \sqrt{\frac{(0.8)^2}{100} + \frac{(0.6)^2}{75}}$$ $$= \sqrt{\frac{0.64}{100} + \frac{0.36}{75}} = \sqrt{0.0064 + 0.0048}$$ $$= \sqrt{0.0112} = 0.106$$ **Step 2: Calculate Test Statistic** $$Z = \frac{(\bar{X}_m - \bar{X}_w) - 0}{s_{\bar{X}_m - \bar{X}_w}}$$ $$= \frac{(4.2 - 4.7) - 0}{0.106} = \frac{-0.5}{0.106} = -4.72$$ **Step 3: Determine Critical Values and Decision Rule** For α = 0.01 (two-tailed test): - Critical values: $Z_{0.005} = \pm 2.576$ **Decision Rule:** "Reject H₀ if Z < -2.576 or Z > +2.576" **Step 4: Make Decision** Z = -4.72 < -2.576 → **Reject H₀** **Step 5: Calculate p-value** For Z = -4.72: - Area in left tail ≈ 0.000001 - p-value = 2(0.000001) ≈ 0.000002 (two-tailed) ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np from scipy import stats fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) # Left plot: Test statistic visualization z_vals = np.linspace(-5, 5, 1000) z_pdf = stats.norm.pdf(z_vals, 0, 1) ax1.plot(z_vals, z_pdf, 'b-', linewidth=2, label='Standard Normal Distribution') ax1.fill_between(z_vals[z_vals <= -2.576], 0, stats.norm.pdf(z_vals[z_vals <= -2.576]), alpha=0.3, color='red', label='Rejection Region (left tail)') ax1.fill_between(z_vals[z_vals >= 2.576], 0, stats.norm.pdf(z_vals[z_vals >= 2.576]), alpha=0.3, color='red', label='Rejection Region (right tail)') # Mark critical values ax1.axvline(-2.576, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Critical Values (±2.576)') ax1.axvline(2.576, color='red', linestyle='--', linewidth=2, alpha=0.7) # Mark test statistic ax1.axvline(-4.72, color='green', linestyle='-', linewidth=3, alpha=0.9, label='Test Statistic (Z = -4.72)') ax1.plot(-4.72, 0, 'go', markersize=15) # Annotation ax1.annotate('Z = -4.72\nREJECT H₀!', xy=(-4.72, 0.05), xytext=(-4, 0.25), fontsize=11, ha='center', fontweight='bold', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='green', lw=2)) ax1.set_xlabel('Z-value', fontsize=11) ax1.set_ylabel('Probability Density', fontsize=11) ax1.set_title('Two-Tailed Z-Test: Golf Playing Times (α = 0.01)', fontsize=12, fontweight='bold') ax1.legend(fontsize=9, loc='upper right') ax1.grid(alpha=0.3) ax1.set_xlim(-5.5, 5.5) # Right plot: Comparative boxplot simulation np.random.seed(42) men_times = np.random.normal(4.2, 0.8, 100) women_times = np.random.normal(4.7, 0.6, 75) bp = ax2.boxplot([men_times, women_times], labels=['Men\n(n=100)', 'Women\n(n=75)'], patch_artist=True, widths=0.6) # Color the boxes colors = ['lightblue', 'lightpink'] for patch, color in zip(bp['boxes'], colors): patch.set_facecolor(color) patch.set_alpha(0.7) # Add mean markers ax2.plot(1, 4.2, 'ro', markersize=12, label='Sample Means', zorder=5) ax2.plot(2, 4.7, 'ro', markersize=12, zorder=5) # Add statistical annotation ax2.axhline(4.2, color='blue', linestyle=':', alpha=0.5) ax2.axhline(4.7, color='red', linestyle=':', alpha=0.5) ax2.annotate('', xy=(2.3, 4.2), xytext=(2.3, 4.7), arrowprops=dict(arrowstyle='<->', color='purple', lw=2.5)) ax2.text(2.5, 4.45, 'Difference:\n0.5 hrs', fontsize=10, ha='left', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7)) ax2.set_ylabel('Playing Time (hours)', fontsize=11) ax2.set_title('Sample Distributions: Men vs. Women Playing Times', fontsize=12, fontweight='bold') ax2.legend(fontsize=9) ax2.grid(alpha=0.3, axis='y') plt.tight_layout() plt.savefig('09-golf-hypothesis-test.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** ::: {.callout-important icon="✅" appearance="simple"} ## Business Conclusion: Significant Difference Exists **Statistical Finding:** At α = 0.01, there's **strong evidence** that average playing times differ between men and women. The p-value ≈ 0.000002 indicates this result is extremely unlikely to occur by chance. **Practical Finding:** Women take an average of **0.5 hours (30 minutes) longer** to complete a round. **Management Recommendations:** 1. **Tee time scheduling:** Build in 30-minute buffers when women's groups follow men's groups 2. **Course pacing:** Post different pace-of-play guidelines for different groups 3. **Resource allocation:** Consider dedicated tee times for women's leagues 4. **Revenue optimization:** Adjust pricing to account for longer course occupation times **Important Note:** This finding reflects averages and should not influence individual golfer policies. Many women play faster than many men. ::: --- ### Alternative Hypothesis Formulation: One-Tailed Test Suppose the golf course manager had a **directional hypothesis**—specifically suspecting that **women take longer** than men. **Revised Hypotheses:** $$H_0: \mu_w \leq \mu_m \quad \text{(women don't take longer)}$$ $$H_A: \mu_w > \mu_m \quad \text{(women take longer)}$$ Equivalently (subtracting in opposite order): $$H_0: \mu_m \geq \mu_w$$ $$H_A: \mu_m < \mu_w$$ **Solution for One-Tailed Test:** The test statistic remains: Z = -4.72 **New Decision Rule (left-tailed test, α = 0.01):** - Critical value: $Z_{0.01} = -2.33$ - Decision Rule: "Reject H₀ if Z < -2.33" **Decision:** Z = -4.72 < -2.33 → **Reject H₀** **p-value (one-tailed):** ≈ 0.000001 **Conclusion:** Same result—strong evidence that women take longer than men. The one-tailed test provides even stronger evidence (smaller p-value) because all α is concentrated in one tail. --- ## B. Hypothesis Tests with Small Samples: The t-Distribution When either sample is small (n < 30), we must use the **t-distribution** instead of the Z-distribution. ### 1. Equal Variances: Pooled t-Test When population variances are equal ($\sigma_1^2 = \sigma_2^2$): ::: {.callout-important icon="⚡" appearance="simple"} ## Pooled t-Test for Two Means (Small Samples, Equal Variances) $$t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$ Where: $$s_p^2 = \frac{s_1^2(n_1-1) + s_2^2(n_2-1)}{n_1 + n_2 - 2}$$ **Degrees of freedom:** $df = n_1 + n_2 - 2$ ::: --- ### Example: Revisiting Charles Schwab Training (from Example 9.1) **Context:** In Example 9.1, we constructed a **99% confidence interval** for the difference in competency levels between two employee training programs. The result was: $$-8.34 \leq (\mu_1 - \mu_2) \leq 4.40$$ **What if we wanted to TEST the hypothesis** that the programs produce equal competency levels? **Sample Data (from Example 9.1):** | Program | Sample Size | Mean Score | Std Dev | |:--------|:-----------:|:----------:|:-------:| | **Program 1** | n₁ = 45 | $\bar{X}_1 = 76.0$ | s₁ = 13.5 | | **Program 2** | n₂ = 40 | $\bar{X}_2 = 77.97$ | s₂ = 9.05 | **Hypotheses:** $$H_0: \mu_1 = \mu_2 \quad \text{(programs equally effective)}$$ $$H_A: \mu_1 \neq \mu_2 \quad \text{(programs differ in effectiveness)}$$ **Significance Level:** α = 0.01 **Solution:** Given the data from Example 9.1, the standard error was calculated as: $$s_{\bar{X}_1 - \bar{X}_2} = 2.47$$ **Step 1: Calculate Test Statistic** $$Z = \frac{(\bar{X}_1 - \bar{X}_2) - 0}{s_{\bar{X}_1 - \bar{X}_2}}$$ $$= \frac{(76.0 - 77.97) - 0}{2.47} = \frac{-1.97}{2.47} = -0.79$$ **Step 2: Critical Values and Decision Rule** For α = 0.01 (two-tailed): - Critical values: $Z_{0.005} = \pm 2.58$ **Decision Rule:** "Reject H₀ if |Z| > 2.58" **Step 3: Make Decision** |Z| = |-0.79| = 0.79 < 2.58 → **Do NOT Reject H₀** **Step 4: Calculate p-value** For Z = -0.79: - Area in left tail = 0.5 - 0.2852 = 0.2148 - p-value = 2(0.2148) = **0.4296** **Interpretation:** ::: {.callout-note icon="💼" appearance="minimal"} ## Business Conclusion: No Evidence of Difference At α = 0.01, there's **no statistical evidence** that the two training programs produce different competency levels. **Key Observation:** This conclusion is **confirmed** by the confidence interval from Example 9.1, which **contained zero** (-8.34 to 4.40). When a CI contains zero, the corresponding hypothesis test will fail to reject H₀: μ₁ = μ₂. **Charles Schwab Recommendation:** Either training program is acceptable. Choose based on **cost, time requirements, instructor availability, or employee preferences** rather than effectiveness differences. **p-value interpretation:** There's a 43% probability of observing a difference this large (or larger) purely by chance if the programs are truly equal. ::: --- ### Example 9.6: Labor Negotiations Revisited (from Example 9.2) In Example 9.2, we constructed a **98% confidence interval** for wage differences between Atlanta and Newport News plants: $$-5.09 \leq (\mu_A - \mu_N) \leq 9.15$$ Now test the hypothesis of equal wages. **Sample Data (from Example 9.2):** | Plant | Sample Size | Mean Wage | Variance | |:------|:-----------:|:---------:|:--------:| | **Atlanta** | n_A = 23 | $\bar{X}_A = \$17.53$/hr | $s_A^2 = 92.10$ | | **Newport News** | n_N = 19 | $\bar{X}_N = \$15.50$/hr | $s_N^2 = 87.10$ | From Example 9.2: Pooled variance $s_p^2 = 89.85$ **Hypotheses:** $$H_0: \mu_A = \mu_N$$ $$H_A: \mu_A \neq \mu_N$$ **Significance Level:** α = 0.02 **Solution:** **Step 1: Calculate Test Statistic** $$t = \frac{(17.53 - 15.50) - 0}{\sqrt{89.85\left(\frac{1}{23} + \frac{1}{19}\right)}}$$ $$= \frac{2.03}{\sqrt{89.85(0.0435 + 0.0526)}} = \frac{2.03}{\sqrt{89.85 \times 0.0961}}$$ $$= \frac{2.03}{\sqrt{8.635}} = \frac{2.03}{2.939} = 0.69$$ **Step 2: Critical Values and Decision Rule** - df = 23 + 19 - 2 = 40 - α = 0.02 (two-tailed) - From t-table: $t_{0.01, 40} = \pm 2.423$ **Decision Rule:** "Reject H₀ if |t| > 2.423" **Step 3: Make Decision** |t| = 0.69 < 2.423 → **Do NOT Reject H₀** **Interpretation:** No evidence of wage difference between plants. This confirms the confidence interval result (which contained zero). The labor negotiator can assure both sides that wages are statistically equivalent. --- ### 2. Unequal Variances: Separate Variance t-Test When variances are unequal ($\sigma_1^2 \neq \sigma_2^2$): ::: {.callout-important icon="⚡" appearance="simple"} ## Separate Variance t-Test (Unequal Variances) $$t' = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$ **Adjusted degrees of freedom:** $$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$ ::: --- ### Example: Acme Shock Absorbers Revisited (from Example 9.3) In Example 9.3, we found a 98% confidence interval: $$0.5 \leq (\mu_1 - \mu_2) \leq 7.1 \text{ weeks}$$ Test whether Type 1 shock absorbers are more durable than Type 2. **Sample Data (from Example 9.3):** | Type | Sample Size | Mean Duration | Std Dev | |:-----|:-----------:|:-------------:|:-------:| | **Type 1** | n₁ = 13 | $\bar{X}_1 = 11.3$ wks | s₁ = 3.5 wks | | **Type 2** | n₂ = 10 | $\bar{X}_2 = 7.5$ wks | s₂ = 2.7 wks | **Hypotheses:** $$H_0: \mu_1 = \mu_2$$ $$H_A: \mu_1 \neq \mu_2$$ **Significance Level:** α = 0.02 **Solution:** From Example 9.3: Adjusted df = 20 **Step 1: Calculate Test Statistic** $$t' = \frac{(11.3 - 7.5) - 0}{\sqrt{\frac{(3.5)^2}{13} + \frac{(2.7)^2}{10}}}$$ $$= \frac{3.8}{\sqrt{0.942 + 0.729}} = \frac{3.8}{1.293} = 2.94$$ **Step 2: Critical Values** - df = 20, α = 0.02 (two-tailed) - From t-table: $t'_{0.01, 20} = \pm 2.528$ **Decision Rule:** "Reject H₀ if |t'| > 2.528" **Step 3: Make Decision** |t'| = 2.94 > 2.528 → **Reject H₀** **Interpretation:** ::: {.callout-important icon="✅" appearance="simple"} ## Statistical Finding: Type 1 IS More Durable At α = 0.02, there's **significant evidence** that Type 1 shock absorbers last longer than Type 2. **However, recall from Example 9.3:** The CEO requires **at least 8 weeks** additional durability to justify Type 1's higher cost. The confidence interval (0.5 to 7.1 weeks) shows the true difference is **likely less than 8 weeks**. **Business Decision:** Despite statistical significance, Type 1 **does not meet the business requirement**. Continue using Type 2 (less expensive option). **Key Lesson:** Statistical significance ≠ Practical significance! ::: --- ## 9.7 Hypothesis Testing for Paired Samples Paired samples hypothesis testing follows the same logic as paired confidence intervals—analyze the **differences** as a single sample. ::: {.callout-important icon="⚡" appearance="simple"} ## t-Test for Paired Samples $$t = \frac{\bar{d} - (\mu_1 - \mu_2)}{\frac{s_d}{\sqrt{n}}}$$ Where: - $\bar{d}$ = mean of paired differences - $s_d$ = standard deviation of differences - $n$ = number of pairs - **df = n - 1** **For testing equality:** $H_0: \mu_1 = \mu_2$ becomes $H_0: \mu_d = 0$ ::: --- ### Example: Hospital Billing Revisited (from Example 9.4) In Example 9.4, Vicki Peplow constructed a 95% confidence interval for hospital cost differences: $$-\$146.33 \leq (\mu_1 - \mu_2) \leq \$28.47$$ Test the hypothesis of equal average costs. **Sample Data (from Example 9.4):** - n = 15 paired procedures - $\sum d_i = -884$ - $\sum d_i^2 = 400,716$ From Example 9.4 calculations: - $\bar{d} = -58.93$ - $s_d = 157.8$ **Hypotheses:** $$H_0: \mu_1 = \mu_2 \quad \text{(equal costs)}$$ $$H_A: \mu_1 \neq \mu_2 \quad \text{(costs differ)}$$ **Significance Level:** α = 0.05 **Solution:** **Step 1: Calculate Test Statistic** $$t = \frac{\bar{d} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{-58.93}{\frac{157.8}{\sqrt{15}}}$$ $$= \frac{-58.93}{40.76} = -1.44$$ **Step 2: Critical Values** - df = 15 - 1 = 14 - α = 0.05 (two-tailed) - From t-table: $t_{0.025, 14} = \pm 2.145$ **Decision Rule:** "Reject H₀ if |t| > 2.145" **Step 3: Make Decision** |t| = 1.44 < 2.145 → **Do NOT Reject H₀** **Interpretation:** No evidence of cost difference between hospitals. This confirms the confidence interval result (which contained zero). Vicki reports that both hospitals appear equivalent for cost purposes. --- ## 9.8 Hypothesis Testing for Two Proportions Business problems frequently require comparing proportions between two populations: - Defect rates from two production methods - Default rates between loan portfolios - Customer satisfaction between product versions - Response rates to different marketing campaigns ::: {.callout-important icon="⚡" appearance="simple"} ## Z-Test for Difference Between Two Proportions $$Z = \frac{(p_1 - p_2) - (\pi_1 - \pi_2)}{s_{p_1 - p_2}}$$ Where: $$s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$ **Requirements:** - $n_1p_1 \geq 5$, $n_1(1-p_1) \geq 5$ - $n_2p_2 \geq 5$, $n_2(1-p_2) \geq 5$ ::: --- ### Example: Retail Credit Usage by Gender **Scenario:** Credit Department Analysis A retail store wants to test whether the proportion of **male customers** who use credit equals the proportion of **female customers** who use credit. **Sample Data:** | Gender | Sample Size | Used Credit | Proportion | |:-------|:-----------:|:-----------:|:----------:| | **Men** | n_m = 100 | 57 | p_m = 0.57 | | **Women** | n_w = 110 | 52 | p_w = 0.473 | **Hypotheses:** $$H_0: \pi_m = \pi_w$$ $$H_A: \pi_m \neq \pi_w$$ **Significance Level:** α = 0.01 **Solution:** **Step 1: Verify Requirements** - Men: $100(0.57) = 57 \geq 5$ ✓ and $100(0.43) = 43 \geq 5$ ✓ - Women: $110(0.473) = 52 \geq 5$ ✓ and $110(0.527) = 58 \geq 5$ ✓ **Step 2: Calculate Standard Error** $$s_{p_m - p_w} = \sqrt{\frac{0.57(0.43)}{100} + \frac{0.473(0.527)}{110}}$$ $$= \sqrt{\frac{0.2451}{100} + \frac{0.2493}{110}}$$ $$= \sqrt{0.002451 + 0.002266} = \sqrt{0.004717} = 0.069$$ **Step 3: Calculate Test Statistic** $$Z = \frac{(0.57 - 0.473) - 0}{0.069} = \frac{0.097}{0.069} = 1.41$$ **Step 4: Critical Values and Decision Rule** For α = 0.01 (two-tailed): - Critical values: $Z_{0.005} = \pm 2.58$ **Decision Rule:** "Reject H₀ if |Z| > 2.58" **Step 5: Make Decision** |Z| = 1.41 < 2.58 → **Do NOT Reject H₀** **Interpretation:** At α = 0.01, there's **no evidence** that credit usage proportions differ between men and women. The store should **not** implement gender-specific credit marketing strategies. --- ### Example 9.7: Johnson Manufacturing Defect Rates **Scenario:** Quality Control for Shift Performance Johnson Manufacturing has experienced increased defect rates. The production supervisor suspects the **night shift** produces a higher proportion of defects than the **day shift**. **Sample Data:** | Shift | Units Inspected | Defects | Defect Rate | |:------|:---------------:|:-------:|:-----------:| | **Day shift** | n_D = 500 | 14 | p_D = 0.028 | | **Night shift** | n_N = 700 | 22 | p_N = 0.031 | **Decision Context:** If night shift defect rate is significantly higher, institute a training program for night workers. **Hypotheses:** $$H_0: \pi_N \leq \pi_D \quad \text{(night shift not worse)}$$ $$H_A: \pi_N > \pi_D \quad \text{(night shift has higher defects)}$$ **Significance Level:** α = 0.05 **Solution:** **Step 1: Calculate Standard Error** $$s_{p_N - p_D} = \sqrt{\frac{0.031(0.969)}{700} + \frac{0.028(0.972)}{500}}$$ $$= \sqrt{\frac{0.0300}{700} + \frac{0.0272}{500}}$$ $$= \sqrt{0.0000429 + 0.0000544} = \sqrt{0.0000973} = 0.0099$$ **Step 2: Calculate Test Statistic** $$Z = \frac{(0.031 - 0.028) - 0}{0.0099} = \frac{0.003}{0.0099} = 0.303$$ **Step 3: Critical Value (Right-Tailed Test)** For α = 0.05 (right-tailed): - Critical value: $Z_{0.05} = 1.65$ **Decision Rule:** "Reject H₀ if Z > 1.65" **Step 4: Make Decision** Z = 0.303 < 1.65 → **Do NOT Reject H₀** ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np from scipy import stats fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) # Left plot: Hypothesis test visualization z_vals = np.linspace(-3, 4, 1000) z_pdf = stats.norm.pdf(z_vals, 0, 1) ax1.plot(z_vals, z_pdf, 'b-', linewidth=2, label='Standard Normal Distribution') ax1.fill_between(z_vals[z_vals >= 1.65], 0, stats.norm.pdf(z_vals[z_vals >= 1.65]), alpha=0.3, color='red', label='Rejection Region (α = 0.05)') # Mark critical value ax1.axvline(1.65, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Critical Value (Z = 1.65)') # Mark test statistic ax1.axvline(0.303, color='green', linestyle='-', linewidth=3, alpha=0.9, label='Test Statistic (Z = 0.303)') ax1.plot(0.303, 0, 'go', markersize=15) # Annotations ax1.annotate('Z = 0.303\nDo NOT Reject H₀', xy=(0.303, 0.05), xytext=(-1, 0.3), fontsize=11, ha='center', fontweight='bold', bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8), arrowprops=dict(arrowstyle='->', color='green', lw=2)) ax1.annotate('No training\nprogram needed', xy=(1.65, 0.05), xytext=(2.3, 0.25), fontsize=10, ha='left', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='red', lw=1.5)) ax1.set_xlabel('Z-value', fontsize=11) ax1.set_ylabel('Probability Density', fontsize=11) ax1.set_title('Right-Tailed Test: Night Shift vs. Day Shift Defects (α = 0.05)', fontsize=12, fontweight='bold') ax1.legend(fontsize=9, loc='upper left') ax1.grid(alpha=0.3) ax1.set_xlim(-3, 4) # Right plot: Defect rate comparison shifts = ['Day Shift\n(n=500)', 'Night Shift\n(n=700)'] defect_rates = [0.028, 0.031] defect_pcts = [2.8, 3.1] colors_bar = ['skyblue', 'lightcoral'] bars = ax2.bar(shifts, defect_pcts, color=colors_bar, alpha=0.7, edgecolor='black', linewidth=2) # Add value labels on bars for i, (bar, rate) in enumerate(zip(bars, defect_pcts)): height = bar.get_height() ax2.text(bar.get_x() + bar.get_width()/2., height, f'{rate}%\n({int(defect_rates[i]*[500, 700][i])} defects)', ha='center', va='bottom', fontsize=11, fontweight='bold') # Add difference annotation ax2.annotate('', xy=(1, 3.1), xytext=(1, 2.8), arrowprops=dict(arrowstyle='<->', color='purple', lw=2)) ax2.text(1.15, 2.95, 'Difference:\n0.3%', fontsize=10, ha='left', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7)) ax2.set_ylabel('Defect Rate (%)', fontsize=11) ax2.set_title('Defect Rates: Day vs. Night Shift', fontsize=12, fontweight='bold') ax2.set_ylim(0, 4) ax2.grid(alpha=0.3, axis='y') plt.tight_layout() plt.savefig('09-johnson-manufacturing-test.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** ::: {.callout-important icon="💼" appearance="simple"} ## Business Decision: Do NOT Institute Training Program **Statistical Finding:** At α = 0.05, there's **insufficient evidence** to conclude that night shift workers produce a higher defect rate than day shift workers. **The observed difference (3.1% vs. 2.8%)** could easily be due to random variation rather than a systematic problem with night shift performance. **Supervisor's Recommendation:** - **Do not** implement the training program (saves training costs) - **Continue monitoring** defect rates over time - **Investigate other factors** if defects remain elevated (equipment maintenance, raw material quality, environmental conditions) **Cost-Benefit Note:** Training program avoided. If defects were truly a night shift issue, the test would have detected it with these sample sizes. ::: --- ## Section Exercises: Hypothesis Testing **22.** Samples of sizes 50 and 60 reveal means of 512 and 587, and standard deviations of 125 and 145 respectively. At α = 0.02, test the hypothesis that μ₁ = μ₂. **23.** At α = 0.01, test equality of means if samples of size 10 and 8 give means of 36 and 49, and standard deviations of 12 and 18, respectively. Assume variances are NOT equal. **24.** Repeat problem 23 assuming variances ARE equal. **25.** Paired samples of size 81 give a mean difference of 36.5 and a standard deviation of differences of 29.1. Test equality of means at α = 0.01. **26.** Test $H_0: \mu_1 \leq \mu_2$ if samples of sizes 64 and 81 produce means of 65.2 and 58.6, and standard deviations of 21.2 and 25.3. Use α = 0.05. **27.** Test $H_0: \mu_1 \geq \mu_2$ if two samples of size 100 produce means of 2.3 and 3.1 with standard deviations of 0.26 and 0.31. Use α = 0.01. **28.** Paired samples of size 25 reported a mean difference of 45.2 and a standard deviation of differences of 21.6. Test equality of means at α = 0.05. **29.** Samples of sizes 120 and 150 produced proportions of 0.69 and 0.73. Test equality of proportions at α = 0.05. **30.** Two samples of size 500 each tested $H_0: \pi_1 \leq \pi_2$. Sample proportions are 14% and 11%. At α = 0.10, what is your conclusion? **31.** Samples of sizes 200 and 250 reveal proportions of 21% and 26%. Test $H_0: \pi_1 \geq \pi_2$ at α = 0.01. --- **End of Stage 4** This completes the fourth stage covering: - Hypothesis testing framework for two populations - Large sample Z-tests for two means - Golf course playing times example (complete analysis) - Small sample t-tests (pooled and separate variances) - Charles Schwab training revisited (Example 9.6) - Labor negotiations revisited (Example 9.2) - Acme shock absorbers revisited (Example 9.3) - Paired samples hypothesis testing - Hospital billing revisited (Example 9.4) - Hypothesis testing for two proportions - Retail credit usage example - Johnson Manufacturing defect rates (Example 9.7) - Python visualizations (2 comprehensive graphs) - 10 section exercises **Coming in Stage 5 (Final):** - F-test for equality of variances - Solved problems with step-by-step solutions - Formula summary list - Chapter summary and key takeaways - Closing scenario (Foreign investment decision) - Comprehensive chapter exercises # Two-Population Inferences - Stage 5 (Final) ## 9.9 Testing for Equality of Variances: The F-Test Several statistical tests discussed earlier assumed **equal population variances**. We initially accepted this assumption without proof. Now we demonstrate how to formally test whether the assumption of equal variances is reasonable. ::: {.callout-note icon="🔬" appearance="minimal"} ## Why Test for Equal Variances? **Many statistical procedures depend on the variance assumption:** 1. **Pooled t-tests** require $\sigma_1^2 = \sigma_2^2$ 2. **ANOVA** (next chapter) assumes equal group variances 3. **Some regression techniques** assume homoscedasticity (equal variances) **The F-test** helps decide whether to use: - **Pooled methods** (when variances are equal) - **Separate variance methods** (when variances differ) ::: ### The F-Distribution The test for comparing variances uses the **F-distribution**, named in 1924 to honor **Sir Ronald A. Fisher** (1890-1962), one of the founders of modern statistics. **Key Properties of the F-Distribution:** 1. **Right-skewed** (not symmetric) 2. **Bounded by zero** on the left (cannot be negative) 3. **Two degrees of freedom parameters:** - df₁ = numerator degrees of freedom = n₁ - 1 - df₂ = denominator degrees of freedom = n₂ - 1 4. **Values always ≥ 0** ### The F-Ratio ::: {.callout-important icon="⚡" appearance="simple"} ## F-Ratio for Comparing Two Variances $$F = \frac{s_L^2}{s_S^2}$$ Where: - $s_L^2$ = **larger** sample variance - $s_S^2$ = **smaller** sample variance **Convention:** Always place the larger variance in the numerator to ensure F ≥ 1 ::: **Logic of the Test:** - If population variances are truly equal ($\sigma_1^2 = \sigma_2^2$), then $F \approx 1$ - The more $s_L^2$ exceeds $s_S^2$, the larger F becomes - A sufficiently large F provides evidence that $\sigma_1^2 \neq \sigma_2^2$ ### Critical Value Adjustment for Two-Tailed Tests ::: {.callout-warning icon="⚠️" appearance="minimal"} ## Important: Divide α by 2 Because we **force F ≥ 1** by placing the larger variance in the numerator, we can only reject in the **right tail**. This eliminates the left tail rejection region. **For a two-tailed test of $H_0: \sigma_1^2 = \sigma_2^2$:** - Use significance level **α/2** (not α) - Critical value: $F_{\alpha/2, df_1, df_2}$ **Example:** For α = 0.10 (10% significance), use $F_{0.05}$ from the F-table ::: ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np from scipy import stats fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) # Left plot: F-distribution shape f_vals = np.linspace(0.01, 6, 1000) # F-distribution with df1=10, df2=15 f_pdf = stats.f.pdf(f_vals, 10, 15) ax1.plot(f_vals, f_pdf, 'b-', linewidth=2.5, label='F-distribution (df₁=10, df₂=15)') ax1.fill_between(f_vals, 0, f_pdf, alpha=0.2, color='blue') # Mark critical value for α/2 = 0.025 f_critical = stats.f.ppf(0.975, 10, 15) # approximately 3.06 ax1.fill_between(f_vals[f_vals >= f_critical], 0, stats.f.pdf(f_vals[f_vals >= f_critical], 10, 15), alpha=0.4, color='red', label=f'Rejection Region (α/2 = 0.025)') ax1.axvline(f_critical, color='red', linestyle='--', linewidth=2, label=f'Critical Value F = {f_critical:.2f}') ax1.axvline(1, color='green', linestyle=':', linewidth=1.5, alpha=0.7, label='F = 1 (equal variances)') ax1.set_xlabel('F-value', fontsize=11) ax1.set_ylabel('Probability Density', fontsize=11) ax1.set_title('F-Distribution: Right-Skewed, Bounded at Zero', fontsize=12, fontweight='bold') ax1.legend(fontsize=9, loc='upper right') ax1.grid(alpha=0.3) ax1.set_xlim(0, 6) ax1.set_ylim(0, ax1.get_ylim()[1]) # Right plot: Effect of degrees of freedom f_vals2 = np.linspace(0.01, 5, 1000) # Different df combinations configs = [(5, 5), (10, 10), (20, 20), (10, 30)] colors = ['red', 'blue', 'green', 'purple'] labels = ['df₁=5, df₂=5', 'df₁=10, df₂=10', 'df₁=20, df₂=20', 'df₁=10, df₂=30'] for (df1, df2), color, label in zip(configs, colors, labels): f_pdf_temp = stats.f.pdf(f_vals2, df1, df2) ax2.plot(f_vals2, f_pdf_temp, color=color, linewidth=2, label=label, alpha=0.8) ax2.axvline(1, color='black', linestyle='--', linewidth=1.5, alpha=0.5, label='F = 1') ax2.set_xlabel('F-value', fontsize=11) ax2.set_ylabel('Probability Density', fontsize=11) ax2.set_title('F-Distribution Shape Varies with Degrees of Freedom', fontsize=12, fontweight='bold') ax2.legend(fontsize=9, loc='upper right') ax2.grid(alpha=0.3) ax2.set_xlim(0, 5) plt.tight_layout() plt.savefig('09-f-distribution-properties.png', dpi=150, bbox_inches='tight') plt.show() ``` --- ### Example: Management Consultant's Variance Test **Scenario:** Preliminary Analysis Before t-Test A management consultant wants to test a hypothesis about two population means. Before conducting the t-test, the consultant must decide whether to assume equal variances. **Sample Data:** | Sample | Size | Std Dev | Variance | |:-------|:----:|:-------:|:--------:| | **Sample 1** | n₁ = 10 | s₁ = 12.2 | $s_1^2 = 148.84$ | | **Sample 2** | n₂ = 10 | s₂ = 15.4 | $s_2^2 = 237.16$ | **Hypotheses:** $$H_0: \sigma_1^2 = \sigma_2^2 \quad \text{(variances are equal)}$$ $$H_A: \sigma_1^2 \neq \sigma_2^2 \quad \text{(variances differ)}$$ **Significance Level:** α = 0.05 **Solution:** **Step 1: Calculate F-Ratio (Larger Variance in Numerator)** $$F = \frac{s_2^2}{s_1^2} = \frac{(15.4)^2}{(12.2)^2} = \frac{237.16}{148.84} = 1.59$$ **Step 2: Determine Degrees of Freedom** - Numerator (larger variance): df₁ = n₂ - 1 = 10 - 1 = 9 - Denominator (smaller variance): df₂ = n₁ - 1 = 10 - 1 = 9 **Step 3: Find Critical Value** For α = 0.05 (two-tailed test): - Use **α/2 = 0.025** in F-table - Look up $F_{0.025, 9, 9}$ = **4.03** **Step 4: Decision Rule** **Decision Rule:** "Do not reject H₀ if F ≤ 4.03. Reject if F > 4.03" **Step 5: Make Decision** F = 1.59 < 4.03 → **Do NOT Reject H₀** ```{python} #| echo: false #| output: true import matplotlib.pyplot as plt import numpy as np from scipy import stats fig, ax = plt.subplots(figsize=(10, 5)) # F-distribution with df1=9, df2=9 f_vals = np.linspace(0.01, 7, 1000) f_pdf = stats.f.pdf(f_vals, 9, 9) ax.plot(f_vals, f_pdf, 'b-', linewidth=2.5, label='F(9, 9) Distribution') ax.fill_between(f_vals, 0, f_pdf, alpha=0.15, color='blue') # Rejection region f_critical = 4.03 ax.fill_between(f_vals[f_vals >= f_critical], 0, stats.f.pdf(f_vals[f_vals >= f_critical], 9, 9), alpha=0.4, color='red', label='Rejection Region (α/2 = 0.025)') # Mark critical value and test statistic ax.axvline(f_critical, color='red', linestyle='--', linewidth=2.5, label=f'Critical Value: F = {f_critical}') ax.axvline(1.59, color='green', linestyle='-', linewidth=3, label=f'Test Statistic: F = 1.59') ax.plot(1.59, 0, 'go', markersize=15, zorder=5) # Annotations ax.annotate('F = 1.59\nDo NOT Reject H₀\nAssume equal variances', xy=(1.59, 0.05), xytext=(2.5, 0.35), fontsize=11, ha='center', fontweight='bold', bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8), arrowprops=dict(arrowstyle='->', color='green', lw=2)) ax.annotate('Not enough evidence\nto conclude\nvariances differ', xy=(4.03, 0.02), xytext=(5.2, 0.25), fontsize=10, ha='left', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.8), arrowprops=dict(arrowstyle='->', color='red', lw=1.5)) ax.set_xlabel('F-value', fontsize=11) ax.set_ylabel('Probability Density', fontsize=11) ax.set_title('F-Test for Equality of Variances (df₁=9, df₂=9, α=0.05)', fontsize=12, fontweight='bold') ax.legend(fontsize=10, loc='upper right') ax.grid(alpha=0.3) ax.set_xlim(0, 7) plt.tight_layout() plt.savefig('09-f-test-example.png', dpi=150, bbox_inches='tight') plt.show() ``` **Interpretation:** ::: {.callout-important icon="✅" appearance="simple"} ## Conclusion: Assume Equal Variances At α = 0.05, there's **insufficient evidence** to conclude that the population variances differ. **Practical Implication:** The consultant can proceed with the hypothesis test for means using the **pooled variance method** (Section 9.2B), which assumes $\sigma_1^2 = \sigma_2^2$. **Statistical Note:** Failing to reject H₀ doesn't prove variances are equal—it simply means the sample evidence isn't strong enough to conclude they're different. ::: --- ## Solved Problems The following worked examples demonstrate complete solutions to two-population inference problems, integrating concepts from throughout the chapter. --- ### Solved Problem 1: Yuppies' Work Ethic **Source:** Fortune magazine (April 1991) **Context:** Study of workaholic baby boomers (ages 25-43) in administrative positions A Fortune article compared work hours between young executives on the **corporate fast track** (Group 1) versus those who spent **less time at work** (Group 2). While fast-trackers often reported 70, 80, or even 90 hours per week, approximately 60 hours was typical. **Sample Data:** | Group | Mean Hours | Std Dev | Sample Size | |:------|:----------:|:-------:|:-----------:| | **Fast track** | $\bar{X}_1 = 62.5$ | s₁ = 23.7 | n₁ = 175 | | **Less time** | $\bar{X}_2 = 39.7$ | s₂ = 8.9 | n₂ = 168 | **Tasks:** 1. Construct a **90% confidence interval** for the difference in average work hours 2. Test the hypothesis of **equal means** at α = 0.10 --- **Solution:** **Part 1: Confidence Interval** **Step 1: Calculate Standard Error** $$s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ $$= \sqrt{\frac{(23.7)^2}{175} + \frac{(8.9)^2}{168}}$$ $$= \sqrt{\frac{561.69}{175} + \frac{79.21}{168}} = \sqrt{3.210 + 0.472}$$ $$= \sqrt{3.682} = 1.92$$ **Step 2: Find Critical Z-Value** 90% confidence level → α = 0.10 → $Z_{0.05} = 1.65$ **Step 3: Calculate Confidence Interval** $$\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm Z \cdot s_{\bar{X}_1 - \bar{X}_2}$$ $$= (62.5 - 39.7) \pm 1.65(1.92)$$ $$= 22.8 \pm 3.17$$ $$19.63 \leq (\mu_1 - \mu_2) \leq 25.97 \text{ hours}$$ **Interpretation:** We can be **90% confident** that fast-track executives work an average of **19.63 to 25.97 hours more per week** than their less work-focused counterparts. --- **Part 2: Hypothesis Test** **Hypotheses:** $$H_0: \mu_1 = \mu_2 \quad \text{(equal average work hours)}$$ $$H_A: \mu_1 \neq \mu_2 \quad \text{(work hours differ)}$$ **Test Statistic:** $$Z = \frac{(62.5 - 39.7) - 0}{1.92} = \frac{22.8}{1.92} = 11.88$$ **Critical Values:** For α = 0.10 (two-tailed): $Z_{0.05} = \pm 1.65$ **Decision Rule:** "Do not reject if -1.65 ≤ Z ≤ 1.65. Otherwise reject." **Decision:** Z = 11.88 > 1.65 → **Reject H₀** **Conclusion:** There's **overwhelming evidence** (p-value ≈ 0.0000) that fast-track executives work significantly more hours than other administrators. The 22.8-hour difference is far too large to attribute to chance. --- ### Solved Problem 2: Inflation and Market Power **Context:** Economic Study of Industry Concentration Economists fear that industries with **high concentration** (market power in few firms' hands) may exploit their dominance. Firms in nine high-concentration industries were paired with firms in nine industries where **economic power was dispersed**. Industries were matched on foreign competition, cost structures, and other price-affecting factors. **Data:** Average percentage price increases by industry | Industry Pair | Concentrated (%) | Less Concentrated (%) | Difference (d) | d² | |:-------------:|:----------------:|:---------------------:|:--------------:|:---:| | 1 | 3.7 | 3.2 | 0.5 | 0.25 | | 2 | 4.1 | 3.7 | 0.4 | 0.16 | | 3 | 2.1 | 2.6 | -0.5 | 0.25 | | 4 | -0.9 | 0.1 | -1.0 | 1.00 | | 5 | 4.6 | 4.1 | 0.5 | 0.25 | | 6 | 5.2 | 4.8 | 0.4 | 0.16 | | 7 | 6.7 | 5.2 | 1.5 | 2.25 | | 8 | 3.8 | 3.9 | -0.1 | 0.01 | | 9 | 4.9 | 4.6 | 0.3 | 0.09 | | **Totals** | | | **2.0** | **4.42** | **Question:** At α = 0.10, do concentrated industries show more pronounced inflationary pressures? --- **Solution:** **Step 1: Calculate Mean and Standard Deviation of Differences** $$\bar{d} = \frac{\sum d_i}{n} = \frac{2.0}{9} = 0.22\%$$ $$s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}$$ $$= \sqrt{\frac{4.42 - 9(0.22)^2}{8}} = \sqrt{\frac{4.42 - 0.436}{8}}$$ $$= \sqrt{\frac{3.984}{8}} = \sqrt{0.498} = 0.706$$ **Step 2: Construct 90% Confidence Interval** - df = n - 1 = 9 - 1 = 8 - For 90% CI: $t_{0.05, 8} = 1.860$ $$\text{C.I.} = \bar{d} \pm t \frac{s_d}{\sqrt{n}}$$ $$= 0.22 \pm 1.860 \times \frac{0.706}{\sqrt{9}}$$ $$= 0.22 \pm 1.860 \times 0.235 = 0.22 \pm 0.438$$ $$-0.218 \leq \mu_d \leq 0.658$$ **Interpretation:** We're **90% confident** that concentrated industries have price increases that are between **0.218% lower** and **0.658% higher** than less concentrated industries. --- **Step 3: Hypothesis Test** **Hypotheses:** $$H_0: \mu_{\text{conc}} = \mu_{\text{less}} \quad \text{(no inflation difference)}$$ $$H_A: \mu_{\text{conc}} \neq \mu_{\text{less}} \quad \text{(inflation differs)}$$ **Test Statistic:** $$t = \frac{\bar{d} - 0}{\frac{s_d}{\sqrt{n}}} = \frac{0.22}{\frac{0.706}{\sqrt{9}}} = \frac{0.22}{0.235} = 0.935$$ **Critical Values:** For α = 0.10, df = 8: $t_{0.05, 8} = \pm 1.860$ **Decision Rule:** "Do not reject if -1.860 ≤ t ≤ 1.860. Otherwise reject." **Decision:** t = 0.935 < 1.860 → **Do NOT Reject H₀** **Conclusion:** At α = 0.10, there's **insufficient evidence** to conclude that concentrated industries have higher inflationary pressures than less concentrated industries. The observed 0.22% difference could easily be due to chance. --- ### Solved Problem 3: Drilling Rig Bit Comparison **Context:** Oil Drilling Equipment Testing A drilling company tests two drill bits by drilling to a maximum depth of 112 feet and recording completion time. **Sample Data:** | Bit | Wells Drilled | Mean Time | Std Dev | |:----|:-------------:|:---------:|:-------:| | **Bit 1** | n₁ = 12 | $\bar{X}_1 = 27.3$ hrs | s₁ = 8.7 hrs | | **Bit 2** | n₂ = 10 | $\bar{X}_2 = 31.7$ hrs | s₂ = 8.3 hrs | **Conditions:** - **No evidence** that variances are equal → use separate variance method - α = 0.10 - All wells drilled with same equipment and soil type **Question:** Does one bit appear more effective? --- **Solution:** **Step 1: Calculate Adjusted Degrees of Freedom** $$df = \frac{\left[\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right]^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$ $$= \frac{\left[\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}\right]^2}{\frac{[(8.7)^2/12]^2}{11} + \frac{[(8.3)^2/10]^2}{9}}$$ $$= \frac{[6.303 + 6.889]^2}{\frac{(6.303)^2}{11} + \frac{(6.889)^2}{9}}$$ $$= \frac{(13.192)^2}{\frac{39.73}{11} + \frac{47.46}{9}} = \frac{174.03}{3.612 + 5.273}$$ $$= \frac{174.03}{8.885} = 19.59 \approx 19 \text{ (round down)}$$ **Step 2: Construct 90% Confidence Interval** For df = 19, 90% CI: $t'_{0.05, 19} = 1.729$ $$\text{C.I.} = (\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ $$= (27.3 - 31.7) \pm 1.729 \sqrt{\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}}$$ $$= -4.4 \pm 1.729\sqrt{6.303 + 6.889} = -4.4 \pm 1.729(3.632)$$ $$= -4.4 \pm 6.28$$ $$-10.68 \leq (\mu_1 - \mu_2) \leq 1.88 \text{ hours}$$ **Interpretation:** We're **90% confident** that Bit 1 takes between **1.88 hours more** and **10.68 hours less** than Bit 2. --- **Step 3: Hypothesis Test** **Hypotheses:** $$H_0: \mu_1 = \mu_2$$ $$H_A: \mu_1 \neq \mu_2$$ **Test Statistic:** $$t' = \frac{(27.3 - 31.7) - 0}{\sqrt{\frac{(8.7)^2}{12} + \frac{(8.3)^2}{10}}} = \frac{-4.4}{3.632} = -1.211$$ **Critical Values:** $t'_{0.05, 19} = \pm 1.729$ **Decision Rule:** "Do not reject if -1.729 ≤ t' ≤ 1.729. Otherwise reject." **Decision:** t' = -1.211 > -1.729 → **Do NOT Reject H₀** **Conclusion:** No evidence that one bit is more effective. The 4.4-hour difference could be due to chance. --- **Alternative: If Variances Were Equal** If equipment/soil similarity justified equal variances: $$s_p^2 = \frac{(8.7)^2(11) + (8.3)^2(9)}{20} = \frac{832.23 + 620.01}{20} = 72.61$$ With df = 20: $t_{0.10, 20} = 1.725$ $$\text{C.I.} = -4.4 \pm 1.725\sqrt{72.61\left(\frac{1}{12} + \frac{1}{10}\right)}$$ $$= -4.4 \pm 6.29$$ $$-10.69 \leq (\mu_1 - \mu_2) \leq 1.89$$ Result is nearly identical—conclusion unchanged. --- ### Solved Problem 4: The Credit Crunch **Context:** Retail Credit Card Usage by Gender A *Retail Management* study examined credit card usage patterns. **Sample Data:** | Gender | Shoppers | Used Card | Proportion | |:-------|:--------:|:---------:|:----------:| | **Women** | n_w = 468 | 131 | p_w = 0.28 | | **Men** | n_m = 237 | 57 | p_m = 0.24 | **Question:** At α = 0.05, is there evidence of a difference in credit card usage proportions? --- **Solution:** **Part 1: Confidence Interval** **Step 1: Calculate Standard Error** $$s_{p_w - p_m} = \sqrt{\frac{p_w(1-p_w)}{n_w} + \frac{p_m(1-p_m)}{n_m}}$$ $$= \sqrt{\frac{0.28(0.72)}{468} + \frac{0.24(0.76)}{237}}$$ $$= \sqrt{\frac{0.2016}{468} + \frac{0.1824}{237}} = \sqrt{0.000431 + 0.000770}$$ $$= \sqrt{0.001201} = 0.035$$ **Step 2: Construct 95% Confidence Interval** $Z_{0.025} = 1.96$ $$\text{C.I.} = (p_w - p_m) \pm Z \cdot s_{p_w - p_m}$$ $$= (0.28 - 0.24) \pm 1.96(0.035)$$ $$= 0.04 \pm 0.069$$ $$-0.029 \leq (\pi_w - \pi_m) \leq 0.109$$ Or: **-2.9% ≤ difference ≤ 10.9%** **Interpretation:** No evidence of a difference—the interval contains zero. --- **Part 2: Hypothesis Test** **Hypotheses:** $$H_0: \pi_w = \pi_m$$ $$H_A: \pi_w \neq \pi_m$$ **Test Statistic:** $$Z = \frac{(0.28 - 0.24) - 0}{0.035} = \frac{0.04}{0.035} = 1.14$$ **Critical Values:** For α = 0.05: $Z_{0.025} = \pm 1.96$ **Decision Rule:** "Do not reject if -1.96 ≤ Z ≤ 1.96. Otherwise reject." **Decision:** Z = 1.14 < 1.96 → **Do NOT Reject H₀** **Conclusion:** No evidence that credit card usage proportions differ by gender. Retailers should **not** implement gender-specific credit marketing strategies. --- ## Formula Summary ::: {.callout-note icon="📋" appearance="minimal"} ## Complete Formula List for Chapter 9 **CONFIDENCE INTERVALS** **[9.1] Two Means (Large Samples):** $$(\bar{X}_1 - \bar{X}_2) \pm Z \sigma_{\bar{X}_1 - \bar{X}_2}$$ **[9.2] Standard Error (Known σ):** $$\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$$ **[9.3] Estimated Standard Error:** $$s_{\bar{X}_1 - \bar{X}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ **[9.4] Two Means (Large Samples, Unknown σ):** $$(\bar{X}_1 - \bar{X}_2) \pm Z s_{\bar{X}_1 - \bar{X}_2}$$ **[9.5] Pooled Variance:** $$s_p^2 = \frac{s_1^2(n_1-1) + s_2^2(n_2-1)}{n_1 + n_2 - 2}$$ **[9.6] Two Means (Equal Variances):** $$(\bar{X}_1 - \bar{X}_2) \pm t \sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$$ **[9.7] Adjusted df (Unequal Variances):** $$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$ **[9.8] Two Means (Unequal Variances):** $$(\bar{X}_1 - \bar{X}_2) \pm t' \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$ **[9.9] Mean of Paired Differences:** $$\bar{d} = \frac{\sum d_i}{n}$$ **[9.10] Std Dev of Differences:** $$s_d = \sqrt{\frac{\sum d_i^2 - n\bar{d}^2}{n-1}}$$ **[9.11] Paired Differences CI:** $$\bar{d} \pm t \frac{s_d}{\sqrt{n}}$$ **[9.12] Standard Error for Two Proportions:** $$s_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$$ **[9.13] Two Proportions CI:** $$(p_1 - p_2) \pm Z \cdot s_{p_1-p_2}$$ **SAMPLE SIZE FORMULAS** **[9.14] Sample Size for Two Means:** $$n = \frac{Z^2(\sigma_1^2 + \sigma_2^2)}{E^2}$$ **[9.15] Sample Size for Two Proportions:** $$n = \frac{Z^2[\pi_1(1-\pi_1) + \pi_2(1-\pi_2)]}{E^2}$$ **HYPOTHESIS TESTS** **[9.16] Z-Test for Two Means (Large):** $$Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_{\bar{X}_1-\bar{X}_2}}$$ **[9.17] t-Test (Equal Variances):** $$t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$ **[9.18] t-Test (Unequal Variances):** $$t' = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$ **[9.19] t-Test for Paired Samples:** $$t = \frac{\bar{d} - (\mu_1 - \mu_2)}{\frac{s_d}{\sqrt{n}}}$$ **[9.20] Z-Test for Two Proportions:** $$Z = \frac{(p_1 - p_2) - (\pi_1 - \pi_2)}{s_{p_1-p_2}}$$ **[9.21] F-Test for Variances:** $$F = \frac{s_L^2}{s_S^2}$$ ::: --- ## Chapter Summary ### Key Concepts Mastered **1. Two-Population Framework** - Comparing means, proportions, and variances across two populations - Independent vs. paired samples - Large vs. small sample methods **2. Interval Estimation** - Confidence intervals quantify uncertainty about population differences - Critical connection: **CI contains zero ↔ fail to reject H₀: μ₁ = μ₂** **3. Hypothesis Testing** - Formal decision-making about population differences - One-tailed vs. two-tailed tests - **Statistical significance ≠ practical significance** (Acme example) **4. Variance Assumptions Matter** - Pooled methods (equal variances): More powerful when assumption valid - Separate variance methods (unequal variances): More robust, wider CIs - F-test: Formal test for variance equality **5. Paired Samples Power** - Pairing reduces variability by controlling for confounding factors - Transforms two-sample problem into one-sample problem - More powerful when pairing is appropriate ### Decision Framework | Situation | Sample Size | Variances | Method | |:----------|:------------|:----------|:-------| | Compare means | Both ≥ 30 | Any | Z-test, Formula [9.4] | | Compare means | Either < 30 | Equal | Pooled t-test, Formula [9.6] | | Compare means | Either < 30 | Unequal | Separate variance t-test, Formula [9.8] | | Matched pairs | Any | N/A | Paired t-test, Formula [9.11] | | Compare proportions | Large enough* | N/A | Z-test for proportions, Formula [9.13] | | Compare variances | Any | N/A | F-test, Formula [9.21] | *Requires: $np \geq 5$ and $n(1-p) \geq 5$ for both samples ### Business Applications Throughout this chapter, we've seen two-population inference applied to: - **Human Resources:** Training program effectiveness, wage equity, employee turnover - **Quality Control:** Production process comparison, defect rates, product durability - **Healthcare:** Hospital cost analysis, treatment effectiveness - **Marketing:** Credit usage patterns, customer satisfaction, advertising effectiveness - **Finance:** Investment strategies, pricing decisions, cost comparison - **Operations:** Service delivery times, productivity comparisons, equipment efficiency --- ## Closing Scenario: U.S. Foreign Investment Decision **Revisiting the Opening Scenario** Recall the Fortune magazine analysis (October 1996) of U.S. foreign investment: - **Europe:** $364 billion (17% increase) - **Asia:** $100 billion (16% increase) **Your Executive Summary (Based on Chapter 9 Methods):** As the analyst preparing the comparative report, you would apply the two-population methods learned in this chapter: ::: {.callout-important icon="💼" appearance="simple"} ## Investment Strategy Recommendation **Analysis Approach:** 1. **Construct confidence intervals** for average returns in Europe vs. Asia 2. **Test hypotheses** about risk differences (variance comparison using F-test) 3. **Consider paired analysis** if same companies invested in both regions 4. **Evaluate proportions** of successful investments in each region **Key Questions Answered:** - Is the average ROI significantly different? (t-test for means) - Is investment risk (variance) comparable? (F-test) - What's the plausible range for the difference? (Confidence interval) **Strategic Implications:** - **If CIs overlap zero:** No clear advantage—diversify across both regions - **If Europe CI > 0:** Europe provides superior returns—increase allocation - **If Asia variance lower:** Asia offers more stable returns—risk-averse preference **Final Recommendation:** Use statistical evidence to support data-driven geographic allocation decisions, balancing return expectations with risk tolerance. ::: --- ## Chapter Exercises (Selected) **32.** **AT&T vs. Sprint:** Phone service comparison - AT&T: n=145, $\bar{X}$=$4.07, s=$0.97 - Sprint: n=102, $\bar{X}$=$3.89, s=$0.85 What does a 95% CI reveal about mean cost difference? **37.** **Grant Applications:** NSF (n=14, $\bar{X}$=45.7 weeks, s=12.6) vs. HHS (n=12, $\bar{X}$=32.9 weeks, s=16.8). Construct 90% CI. If NSF takes >5 weeks more, submit to HHS. What should James do? (Assume equal variances) **39.** **Quality Control Teams:** Two teams solve 10 problems. Paired data provided. Construct 90% CI for difference in average solution times. **44.** **Mutual Funds:** Income-oriented funds (n=12) vs. growth-oriented funds (n=14). Unequal variances assumed. a. Construct 80% CI for difference in average returns b. What sample size needed for 95% confidence with error ≤ $10? **45.** **Baldwin Piano Teaching Method:** Your method (n=100, $\bar{X}$=149 hrs, s=37.7) vs. competitor (n=130, $\bar{X}$=186 hrs, s=42.2) a. 99% CI—is your method better? b. Sample size for 99% confidence with error ≤ 5 hours? --- **End of Chapter 9** This completes the comprehensive coverage of **Two-Population Inferences**, including: ✅ Confidence intervals for means (large/small, equal/unequal variances) ✅ Paired samples methodology ✅ Confidence intervals for proportions ✅ Sample size determination ✅ Hypothesis testing for all scenarios ✅ F-test for variance equality ✅ Four complete solved problems ✅ Comprehensive formula summary ✅ Business decision framework ✅ Real-world applications throughout **Total Chapter 9 Content:** ~30,000 words across 5 stages **Next Chapter:** Chapter 10 - Analysis of Variance (ANOVA)