6 Probability Distributions

Modeling Business Uncertainty with Discrete and Continuous Distributions

7 Chapter 5: Probability Distributions

graph TD
    A[Probability Distributions] --> B[Discrete Distributions]
    A --> C[Continuous Distributions]
    
    B --> B1[Binomial<br/>Distribution]
    B --> B2[Cumulative Binomial<br/>Distribution]
    B --> B3[Hypergeometric<br/>Distribution]
    B --> B4[Poisson<br/>Distribution]
    
    C --> C1[Exponential<br/>Distribution]
    C --> C2[Uniform<br/>Distribution]
    C --> C3[Normal<br/>Distribution]
    
    C3 --> C3a[Standard Normal<br/>Distribution]
    C3 --> C3b[Probability<br/>Calculation]
    C3 --> C3c[Determining the<br/>Value of X]
    C3 --> C3d[Approximation to<br/>Binomial Distribution]
    
    style A fill:#000,stroke:#000,color:#fff
    style B fill:#000,stroke:#000,color:#fff
    style C fill:#000,stroke:#000,color:#fff

Learning Objectives

By the end of this chapter, you will be able to:

Calculate expected values and variances for discrete probability distributions
Apply the binomial distribution to yes/no business scenarios
Use the hypergeometric distribution for sampling without replacement
Model rare events with the Poisson distribution
Analyze time-between-events using the exponential distribution
Work with uniform distributions for equally likely outcomes
Master the normal distribution for continuous business data
Convert problems to standard normal (Z-scores) for probability calculations
Approximate binomial with normal distribution for large samples

7.1 5.1 Introduction: From Probability Principles to Distributions

In Chapter 4, we learned to calculate the probability of individual events. Now we extend those principles to probability distributions - comprehensive models that describe the likelihood of all possible outcomes for a random variable.

Real-World Business Applications:

A Bradley University study of Peoria, Illinois emergency services revealed:
- 911 call response times: Uniformly distributed between 1.2 and 4.6 minutes
- Call arrival rate: Poisson distribution with average of 9 calls per hour
- Home values: Normally distributed with mean $45,750, SD $15,110
- Budget impact: 42,089 homes potentially subject to new property tax

The mayor wanted to reduce average response time to 2 minutes at a cost of $575,000 per 30-second improvement. This chapter provides the statistical toolkit to analyze such decisions.

Key Concept: Random Variables

Random Variable: A variable whose value is determined by a random experiment.

Discrete Random Variable: Can assume only specific values (usually integers) - results from counting
- Examples: Number of defects, customer arrivals, sales calls, coin flips

Continuous Random Variable: Can assume any value within a range - results from measuring
- Examples: Weight, time, temperature, income, response time

Probability Distribution: A listing of all possible outcomes and their associated probabilities.

7.1.1 Visualizing Probability Distributions

Discrete Distribution Example - Rolling a die:

Outcome	1	2	3	4	5	6
P(X)	1/6	1/6	1/6	1/6	1/6	1/6

{.striped .hover}

Properties of ALL probability distributions:
1. 0 \leq P(X = x_i) \leq 1 (each probability between 0 and 1)
2. \sum P(X = x_i) = 1 (all probabilities sum to 1)

7.2 5.2 Mean and Variance of Discrete Distributions

Just as we calculated mean and variance for data sets in Chapter 3, we can compute them for probability distributions.

Expected Value (Mean) of Discrete Distribution

\mu = E(X) = \sum [x_i \cdot P(x_i)]

Interpretation: The long-run average value if we repeat the experiment many times.

Variance of Discrete Distribution

\sigma^2 = \sum [(x_i - \mu)^2 \cdot P(x_i)]

Standard Deviation

\sigma = \sqrt{\sigma^2}

7.2.1 Example 5.1: Ponder Real Estate Monthly Sales

Business Scenario

Ponder Real Estate tracked monthly home sales over 24 months:

Houses Sold (x)	Number of Months	Houses Sold (x)	Number of Months
5	3	12	5
8	7	17	3
10	4	20	2

Mr. Ponder previously averaged 7.3 sales/month with \sigma = 5.7 sales.
He’ll quit and become a rodeo clown if the new data doesn’t show improvement (higher mean, lower variability).

Should Mr. Ponder stick with real estate?

Solution - Step 1: Convert to Probability Distribution

Houses (x)	Months	P(x)	x · P(x)	(x - μ)² · P(x)
5	3	3/24 = 0.125	0.625	(5-10.912)² (0.125) = 4.369
8	7	7/24 = 0.292	2.336	(8-10.912)² (0.292) = 2.476
10	4	4/24 = 0.167	1.670	(10-10.912)² (0.167) = 0.139
12	5	5/24 = 0.208	2.496	(12-10.912)² (0.208) = 0.246
17	3	3/24 = 0.125	2.125	(17-10.912)² (0.125) = 4.633
20	2	2/24 = 0.083	1.660	(20-10.912)² (0.083) = 6.855
Total	24	1.000	10.912	18.718

{.striped .hover}

Step 2: Calculate Statistics

\mu = E(X) = 10.912 \text{ houses/month}

\sigma^2 = 18.718 \text{ houses}^2

\sigma = \sqrt{18.718} = 4.326 \text{ houses}

Step 3: Compare to Previous Performance

Metric	Previous	New	Change
Mean (μ)	7.3	10.912	+3.612 ✓
SD (σ)	5.7	4.326	-1.374 ✓

{.striped .hover}

Business Insight

Good news for Mr. Ponder!

✅ Increased sales: Average jumped from 7.3 to 10.9 houses/month (+49.5%)
✅ Reduced variability: Standard deviation dropped from 5.7 to 4.3 (-24.1%)

Translation: More consistent, higher performance. He should stay in real estate and skip the rodeo career!

Strategic implication: Whatever changed in the past 24 months (marketing, market conditions, sales process) is working. Document and replicate the success factors.

7.3 5.3 The Binomial Distribution - Modeling Yes/No Outcomes

Many business situations involve binary outcomes repeated multiple times:
- Will this customer buy? (Yes/No) × 50 sales calls
- Is this product defective? (Yes/No) × 100 units inspected
- Did the student pass? (Yes/No) × 200 test takers

The binomial distribution is perfect for these scenarios.

Binomial Distribution Requirements

Four Properties (Bernoulli Process):

Fixed number of trials (n)
Only two outcomes per trial (success/failure)
Constant probability (\pi) for each trial
Independent trials (one doesn’t affect others)

Binomial Probability Formula

P(X = x) = {}_nC_x \cdot \pi^x \cdot (1-\pi)^{n-x}

Where:
- n = number of trials
- x = number of successes desired
- \pi = probability of success on single trial
- {}_nC_x = \frac{n!}{x!(n-x)!} = combinations

Mean and Variance (Shortcuts)

\mu = n\pi

\sigma^2 = n\pi(1-\pi)

\sigma = \sqrt{n\pi(1-\pi)}

7.3.1 Example 5.2: Journal of Higher Education - Summer Jobs

College Student Employment

The Journal of Higher Education reports 40% of high school graduates work summer jobs to earn college tuition.

If we randomly select 7 graduates, find the probability that:
a) Exactly 5 have summer jobs
b) None have summer jobs
c) All 7 have summer jobs

Solution:

Given: n = 7 trials, \pi = 0.40 probability of success

a) Exactly 5 have jobs: P(X = 5)

P(X=5) = {}_7C_5 \cdot (0.40)^5 \cdot (0.60)^2

= \frac{7!}{5!2!} \cdot (0.01024) \cdot (0.36)

= 21 \cdot 0.01024 \cdot 0.36 = 0.0774

From Binomial Table (Appendix III, Table B):
Look up n=7, \pi=0.40, x=5 → 0.0774

b) None have jobs: P(X = 0)

P(X=0) = {}_7C_0 \cdot (0.40)^0 \cdot (0.60)^7 = 1 \cdot 1 \cdot 0.0280 = 0.0280

From table: n=7, \pi=0.40, x=0 → 0.0280

c) All 7 have jobs: P(X = 7)

P(X=7) = {}_7C_7 \cdot (0.40)^7 \cdot (0.60)^0 = 1 \cdot 0.0016 \cdot 1 = 0.0016

From table: n=7, \pi=0.40, x=7 → 0.0016

Interpretation

7.74% chance exactly 5 of 7 work (moderately likely)
2.80% chance none work (rare - less than 3%)
0.16% chance all 7 work (very rare - less than 2 in 1000)

Most likely outcome: E(X) = n\pi = 7(0.40) = 2.8 \approx 3 students working

The extremes (0 or 7) are both unlikely. We’d typically see 2-4 students with summer jobs in a sample of 7.

7.3.2 Handling π > 0.50: The Complement Trick

Problem: Binomial tables only go up to \pi = 0.50. What if \pi = 0.70?

Solution: Use the complement!

7.3.3 Example 5.3: Internet Connectivity in Flatbush

Scenario

70% of Flatbush residents have internet connectivity.
Of 10 randomly selected residents, what’s the probability exactly 6 are connected?

Challenge: \pi = 0.70 > 0.50 (not in table)

Solution - The Complement Trick:

Key insight: If 70% are connected (success), then 30% are not connected (failure).

Reframe: 6 successes at \pi = 0.70 equals 4 failures at \pi = 0.30

Visual proof:

0	1	2	3	4	5	6	7	8	9	10	(π=0.70)
10	9	8	7	6	5	4	3	2	1	0	(π=0.30)

Therefore:
P(X = 6 | n=10, \pi=0.70) = P(X = 4 | n=10, \pi=0.30)

From Binomial Table: n=10, \pi=0.30, x=4 → 0.2001

Tip

Rule: When \pi > 0.50, find P(X = x) by looking up P(X = n-x) with \pi' = 1 - \pi

7.4 5.4 Cumulative Binomial Distributions

Often we need probability of a range rather than exact value:
- “At most 3 defects” → P(X \leq 3)
- “At least 5 sales” → P(X \geq 5)
- “Between 2 and 4 complaints” → P(2 \leq X \leq 4)

7.4.1 Using Cumulative Binomial Tables (Table C)

Table C provides: P(X \leq k) = P(X=0) + P(X=1) + ... + P(X=k)

7.4.2 Example 5.4: Student Employment (Continued)

Using the summer job data (n=7, \pi=0.40), find probability that:
a) 3 or fewer have jobs
b) At least 5 have jobs
c) Between 3 and 5 (inclusive) have jobs

Solution:

a) 3 or fewer: P(X \leq 3)

From Cumulative Table C: n=7, \pi=0.40, x=3 → 0.7102

b) At least 5: P(X \geq 5)

Tables give P(X \leq k), not P(X \geq k). Use complement:

P(X \geq 5) = 1 - P(X \leq 4)

From Table C: n=7, \pi=0.40, x=4 → 0.9037

P(X \geq 5) = 1 - 0.9037 = 0.0963

c) Between 3 and 5 (inclusive): P(3 \leq X \leq 5)

Strategy: P(3 \leq X \leq 5) = P(X \leq 5) - P(X \leq 2)

From Table C:
- P(X \leq 5) = 0.9812
- P(X \leq 2) = 0.4199

P(3 \leq X \leq 5) = 0.9812 - 0.4199 = 0.5613

Summary

71.02% probability 3 or fewer work (high - most samples will be in this range)
9.63% probability at least 5 work (low - only 1 in 10 samples)
56.13% probability between 3-5 work (moderate - slightly better than coin flip)

Central tendency: The distribution clusters around expected value \mu = 2.8, making 3-5 the most probable range.

7.5 5.5 The Hypergeometric Distribution

The binomial distribution requires constant probability across trials. But what if we sample without replacement from a small population?

Example: Drawing cards from a deck. First draw: P(Ace) = 4/52. Second draw: P(Ace) = 3/51 (if first was Ace) or 4/51 (if not).

Probability changed → Binomial doesn’t apply → Use Hypergeometric!

Hypergeometric Distribution Formula

P(X = x) = \frac{{}_rC_x \times {}_{N-r}C_{n-x}}{{}_NC_n}

Where:
- N = population size
- r = number of successes in population
- n = sample size
- x = number of successes desired in sample

When to Use:
- Sampling without replacement
- From finite population
- Sample size is large relative to population (typically n/N > 0.05)

7.5.1 Example 5.5: Racehorse Contagious Disease

Veterinary Scenario

A stable contains N = 10 racehorses.
r = 4 horses have a contagious disease.
A veterinarian randomly selects n = 3 horses for testing.

What’s the probability exactly 2 of the 3 tested horses are sick?

Solution:

P(X = 2) = \frac{{}_4C_2 \times {}_{10-4}C_{3-2}}{{}_10C_3}

Step 1: Calculate combinations

{}_4C_2 = \frac{4!}{2!2!} = \frac{24}{4} = 6 \text{ (ways to select 2 sick from 4)}

{}_{6}C_1 = \frac{6!}{1!5!} = 6 \text{ (ways to select 1 healthy from 6)}

{}_10C_3 = \frac{10!}{3!7!} = \frac{720}{6} = 120 \text{ (total ways to select 3 from 10)}

Step 2: Calculate probability

P(X = 2) = \frac{6 \times 6}{120} = \frac{36}{120} = 0.30

Veterinary Insight

30% chance of finding exactly 2 sick horses in the sample of 3.

Expected value: E(X) = n \times \frac{r}{N} = 3 \times \frac{4}{10} = 1.2 sick horses

Finding 2 sick horses is above average but not rare. The vet should:
- Test the remaining 7 horses immediately
- Quarantine the stable
- Begin treatment protocol for confirmed cases

7.5.2 Example 5.6: Employment Discrimination Case - Johnson District

Legal Application of Hypergeometric Distribution

Background: Three women sued a Kansas City utility company for gender discrimination.

Facts:
- N = 9 people eligible for promotion
- r = 4 were women
- n = 3 people actually promoted
- x = 1 woman was promoted

Legal question: If gender played no role (random selection), what’s the probability at most 1 woman would be promoted?

If this probability is high, it suggests random chance could explain the outcome (no discrimination).

Solution:

We need: P(X \leq 1) = P(X=0) + P(X=1)

P(X = 1): Exactly 1 woman promoted

P(X=1) = \frac{{}_4C_1 \times {}_5C_2}{{}_9C_3} = \frac{4 \times 10}{84} = \frac{40}{84} = 0.4762

P(X = 0): No women promoted

P(X=0) = \frac{{}_4C_0 \times {}_5C_3}{{}_9C_3} = \frac{1 \times 10}{84} = \frac{10}{84} = 0.1190

Total probability:

P(X \leq 1) = 0.4762 + 0.1190 = 0.5952

Legal Interpretation

59.52% chance (nearly 60%) that at most 1 woman would be promoted purely by random selection.

Court’s conclusion: This is not unusually low. Random chance could easily produce this outcome without discrimination.

Statistical standard: Courts typically look for probabilities < 5% before inferring discrimination. Here, 59.52% is far above that threshold.

Verdict: Insufficient statistical evidence of gender bias. Case dismissed.

Note: This doesn’t prove no discrimination occurred - only that the statistical evidence is weak.

7.6 5.6 The Poisson Distribution - Modeling Rare Events

The Poisson distribution models the number of rare events occurring in a fixed interval (time, space, area, volume):
- Customer arrivals per hour
- Defects per 100 units
- Website crashes per month
- Typos per page
- Accidents per quarter

Poisson Distribution Formula

P(X = x) = \frac{\mu^x \cdot e^{-\mu}}{x!}

Where:
- \mu = average number of occurrences in the interval
- x = specific number of occurrences
- e = 2.71828 (natural logarithm base)

Requirements:
1. Events are rare (low probability)
2. Events are independent
3. Average rate (\mu) is constant

Mean and Variance (Same value!)

E(X) = \mu \text{Var}(X) = \mu \sigma = \sqrt{\mu}

7.6.1 Example 5.7: University Tutorial Office

Student Arrivals

Students arrive at the statistics tutorial office at an average rate of μ = 5.2 students per hour (Poisson distribution).

Calculate the probability that in any given hour:
a) Exactly 4 students arrive
b) No students arrive
c) Exactly 8 students arrive

Solution:

a) Exactly 4 students: P(X = 4)

P(X=4) = \frac{(5.2)^4 \cdot e^{-5.2}}{4!} = \frac{731.16 \cdot 0.00552}{24} = \frac{4.036}{24} = 0.1681

From Poisson Table (Table D): \mu = 5.2, x = 4 → 0.1681

b) No students: P(X = 0)

P(X=0) = \frac{(5.2)^0 \cdot e^{-5.2}}{0!} = \frac{1 \cdot 0.00552}{1} = 0.0055

From table: \mu = 5.2, x = 0 → 0.0055

c) Exactly 8 students: P(X = 8)

From table: \mu = 5.2, x = 8 → 0.0731

Tutorial Office Staffing Insights

16.81% chance of exactly 4 students (common outcome)
0.55% chance of zero students (very rare - empty office is unusual)
7.31% chance of 8 students (occasional surge)

Operational planning:
- Expect 5-6 students most hours (around the mean)
- Staff for peak: Plan capacity for 8-10 students to handle surges (95th percentile ≈ 9 students)
- Quiet hours rare: Empty office only ~1 in 200 hours

Expected variance: \sigma = \sqrt{5.2} = 2.28 students - moderate variability

7.6.2 Adjusting μ for Different Time Intervals

Critical skill: The mean must match the time period in the problem!

7.6.3 Example 5.8: Defect Rate Adjustment

A manufacturing process averages μ = 4 defects per 100 units.

Find the probability of:
a) Exactly 10 defects in 200 units
b) No defects in 50 units

Solution:

a) 200 units: Adjust μ

\mu_{200} = 4 \times \frac{200}{100} = 8 \text{ defects}

P(X=10 | \mu=8) = 0.0993 \text{ (from table)}

b) 50 units: Adjust μ

\mu_{50} = 4 \times \frac{50}{100} = 2 \text{ defects}

P(X=0 | \mu=2) = 0.1353 \text{ (from table)}

7.7 5.7 The Exponential Distribution - Time Between Events

While Poisson models how many events occur, Exponential models how long between events.

If arrivals follow Poisson → time between arrivals follows Exponential.

Exponential Distribution Formula

P(X \leq t) = 1 - e^{-\mu t}

Where:
- t = time period of interest
- \mu = average rate of occurrence
- e = 2.71828

Relationship to Poisson:
- Poisson: “3 customers per hour” → How many?
- Exponential: “Time until next customer” → How long?

Mean and Variance

E(X) = \frac{1}{\mu} \sigma^2 = \frac{1}{\mu^2}

7.7.1 Example 5.9: Cross City Cab Company

Airport Taxi Service

Taxis arrive at the local airport following a Poisson distribution with μ = 12 taxis per hour.

You just landed and need to reach downtown for an important business deal. Your boss won’t tolerate failure.

Decision rule: If probability of taxi within 5 minutes < 50%, rent a car instead.

Should you wait for a taxi or rent a car?

Solution:

Time conversion: μ = 12 per 60 minutes. What fraction is 5 minutes?

t = \frac{5}{60} = \frac{1}{12}

Probability taxi arrives within 5 minutes:

P(X \leq 5 \text{ min}) = 1 - e^{-\mu t} = 1 - e^{-(12)(1/12)} = 1 - e^{-1}

From Table D (Poisson table where x=0 gives e^{-\mu}):
e^{-1} = 0.3679

P(X \leq 5) = 1 - 0.3679 = 0.6321

Business Decision

✅ Wait for the taxi!

63.21% probability a taxi arrives within 5 minutes (> 50% threshold).

Additional insights:
- P(X \leq 10 \text{ min}) = 1 - e^{-2} = 1 - 0.1353 = 0.8647 (86.5% within 10 min)
- P(5 < X \leq 10) = 0.8647 - 0.6321 = 0.2326 (23.3% arrive between 5-10 min)

Expected wait time: E(X) = \frac{1}{\mu} = \frac{1}{12} hour = 5 minutes

With 12 taxis/hour on average, you’ll likely wait around 5 minutes - acceptable for most business travelers.

7.8 5.8 The Uniform Distribution - Equally Likely Outcomes

The uniform distribution applies when all outcomes in a range are equally likely.

Think of it as a “flat” distribution - constant probability density across the range.

Uniform Distribution Formulas

Continuous Uniform Distribution on [a, b]

Mean: \mu = \frac{a + b}{2}

Variance: \sigma^2 = \frac{(b-a)^2}{12}

Height (probability density): \text{Height} = \frac{1}{b-a}

Probability X falls between X_1 and X_2: P(X_1 \leq X \leq X_2) = \frac{X_2 - X_1}{b - a}

7.8.1 Example 5.10: Dow Chemical Fertilizer Bags

Uniform Weight Distribution

Dow Chemical produces lawn fertilizer in bags with uniformly distributed weight:
- Mean: μ = 25 pounds
- Range: 2.4 pounds

Harry Homeowner’s concerns:
a) He needs at least 23 pounds. Will any bag suffice?
b) What’s the probability a bag weighs more than 25.5 pounds?

Solution:

Step 1: Determine a and b

Range = 2.4 pounds spreads evenly around mean of 25:

a = \mu - \frac{\text{Range}}{2} = 25 - \frac{2.4}{2} = 25 - 1.2 = 23.8 \text{ lbs}

b = \mu + \frac{\text{Range}}{2} = 25 + 1.2 = 26.2 \text{ lbs}

a) Will Harry get at least 23 pounds?

Minimum bag weight = 23.8 pounds

✅ YES - Even the lightest bag (23.8 lbs) exceeds Harry’s 23 lb requirement!

b) Probability of bag > 25.5 pounds:

P(X > 25.5) = P(25.5 < X \leq 26.2) = \frac{26.2 - 25.5}{26.2 - 23.8}

= \frac{0.7}{2.4} = 0.2917

Consumer Insight

✅ Harry has no worries!
- Guaranteed at least 23 pounds (minimum is 23.8)
- 29.17% chance of getting bonus weight (> 25.5 lbs)
- Average bag: 25 pounds (exactly what’s advertised)

Quality perspective:
- Consistent product: Uniform distribution means predictable range
- No short-weighting: Minimum 23.8 lbs protects consumers
- Bonus potential: Nearly 1 in 3 bags exceed 25.5 lbs

Compared to normal distribution: Uniform has no extreme values - all bags within narrow 2.4 lb range.

7.9 5.9 The Normal Distribution - The Crown Jewel of Statistics

Of all probability distributions, the normal distribution is the most important in statistics. We introduced its bell-shaped, symmetric form in Chapter 3 with the empirical rule. Now we’ll unlock its full analytical power.

Key characteristics:
- Continuous distribution (not discrete)
- Infinite range (theoretically from -∞ to +∞)
- Used for measured variables: heights, weights, temperatures, IQ scores, financial returns
- Completely defined by two parameters: mean (μ) and standard deviation (σ)

7.9.1 Real-World Case: ToppsWear Clothing Manufacturer

ToppsWear recognized the public was constantly changing in physical size and proportions. To produce better-fitting clothing, management commissioned a comprehensive study of customer body measurements.

Findings: Customer heights are normally distributed with:
- Mean: μ = 67 inches
- Standard deviation: σ = 2 inches

This normal distribution allows ToppsWear to:
1. Predict what percentage of customers fall into each size range
2. Optimize inventory allocation across sizes
3. Minimize both stockouts and excess inventory

Properties of the Normal Distribution

Symmetry:
- 50% of observations above the mean
- 50% of observations below the mean
- Mirror image around μ

Area Under the Curve:
- Total area = 1.00 (100%)
- Area = Probability
- 50% of area to right of μ, 50% to left

Empirical Rule (68-95-99.7):
- 68.3% of observations within μ ± 1σ
- 95.5% of observations within μ ± 2σ
- 99.7% of observations within μ ± 3σ

7.9.2 Comparing Different Normal Distributions

Three scenarios for ToppsWear:

Distribution I: μ = 67 inches, σ = 2 inches (adult population)
Distribution II: μ = 79 inches, σ = 2 inches (basketball players)
Distribution III: μ = 67 inches, σ = 4 inches (more diverse population)

Key insights:
- Different means → Shift left/right (Distributions I vs II)
- Different standard deviations → Change width/flatness (Distributions I vs III)
- Same area percentages apply regardless of μ or σ (Empirical Rule)

Distribution III is flatter and more spread out because σ = 4 (double the variability of Distribution I).

Empirical Rule application:
- Distribution I: 68.3% of heights between 65-69 inches (μ ± 1σ)
- Distribution III: 68.3% of heights between 63-71 inches (wider range due to larger σ)

7.10 5.10 The Standard Normal Distribution (Z-Scores)

Since there are infinite possible normal distributions (each with different μ and σ), statisticians created a standard form for all calculations.

Z-Score Transformation Formula

Z = \frac{X - \mu}{\sigma}

Where:
- Z = standard normal deviate (Z-score)
- X = original value
- μ = population mean
- σ = population standard deviation

Standard Normal Distribution:
- Mean = 0
- Standard deviation = 1
- Symmetric around Z = 0

Interpretation of Z:
“The number of standard deviations an observation is above (+) or below (-) the mean.”

7.10.1 Example 5.11: ToppsWear Customer Heights - Z-Score Conversions

Three Customers

Tom Typical: 67 inches tall
Paula Petite: 63 inches tall
Steve Stretch: 70 inches tall

Convert each height to Z-scores given μ = 67 inches, σ = 2 inches.

Solution:

Tom Typical: X = 67

Z = \frac{67 - 67}{2} = \frac{0}{2} = 0

Tom is exactly average → Z = 0 (at the mean)

Paula Petite: X = 63

Z = \frac{63 - 67}{2} = \frac{-4}{2} = -2.00

Paula is 2 standard deviations below average → Z = -2.00

Steve Stretch: X = 70

Z = \frac{70 - 67}{2} = \frac{3}{2} = 1.50

Steve is 1.5 standard deviations above average → Z = 1.50

Z-Score Interpretation Guide

Z = 0: Exactly at the mean (perfectly average)
Z = +1: One standard deviation above average (taller/larger)
Z = -1: One standard deviation below average (shorter/smaller)
Z > +3: Extremely high (top 0.15%)
Z < -3: Extremely low (bottom 0.15%)

ToppsWear application:
- Paula (Z = -2) is shorter than 97.7% of customers → Small/Petite sizes
- Tom (Z = 0) is median customer → Medium/Regular sizes
- Steve (Z = 1.5) is taller than 93.3% of customers → Large/Tall sizes

7.11 5.11 Calculating Probabilities with the Normal Distribution

The beauty of standardization: If you know the area, you know the probability!

Think of it like a dartboard:
- 2/3 of target painted green
- 1/3 of target painted red
- Equal chance of hitting any point
- P(hitting green) = 2/3 because 2/3 of area is green

Same logic for normal curves: Area under curve = Probability

7.11.1 Using the Standard Normal Table (Table E)

Table E provides: Area from mean (Z=0) to any Z-value

7.11.2 Example 5.12: TelCom Satellite Transmission Times

Business Communication Service

TelCom Satellite provides communication services to Chicago businesses.

Data:
- Mean transmission time: μ = 150 seconds
- Standard deviation: σ = 15 seconds
- Distribution: Normal

The service director needs probability estimates for:
a) Transmission between 125 and 150 seconds
b) Transmission less than 125 seconds
c) Transmission between 145 and 155 seconds
d) Transmission between 160 and 165 seconds

Solution:

a) P(125 ≤ X ≤ 150):

Z = \frac{125 - 150}{15} = \frac{-25}{15} = -1.67

From Table E: Z = 1.67 → Area = 0.4525

Note

Visual: 45.25% of all transmissions fall between 125-150 seconds

b) P(X < 125):

Strategy: Total area below mean = 0.5000
Area between 125-150 = 0.4525

P(X < 125) = 0.5000 - 0.4525 = 0.0475

Only 4.75% of transmissions are shorter than 125 seconds (rare!)

c) P(145 ≤ X ≤ 155):

Step 1: Find area from 145 to 150

Z = \frac{145 - 150}{15} = -0.33

From Table E: Area = 0.1293

Step 2: By symmetry, area from 150 to 155 = 0.1293

P(145 \leq X \leq 155) = 0.1293 + 0.1293 = 0.2586

25.86% of transmissions fall in this narrow 10-second window

d) P(160 ≤ X ≤ 165):

Step 1: Area from 150 to 165

Z = \frac{165 - 150}{15} = 1.00

From Table E: Area = 0.3413

Step 2: Area from 150 to 160

Z = \frac{160 - 150}{15} = 0.67

From Table E: Area = 0.2486

Step 3: Subtract

P(160 \leq X \leq 165) = 0.3413 - 0.2486 = 0.0927

Strategic Business Implications for TelCom

Service demand profile:
- Most transmissions (95%) between 120-180 seconds (μ ± 2σ)
- Very short calls (<125 sec) rare at 4.75% → Premium pricing opportunity
- Mid-range calls (145-155) common at 25.86% → Standard pricing tier
- Longer calls (160-165) moderate at 9.27% → Volume discount potential

Capacity planning:
- Plan for peak at 180 seconds (μ + 2σ) to serve 97.5% of calls
- Reserve overflow capacity for extreme cases (>180 sec)

Revenue optimization:
- Tiered pricing: <120s (premium), 120-180s (standard), >180s (discounted)
- Bundle packages targeting 145-155 second average usage (26% of market)

7.12 5.12 Finding X-Values from Known Probabilities (Inverse Normal)

Sometimes we know the desired probability and must find the corresponding X-value.

Business applications:
- “What score puts a student in top 10%?” (scholarships)
- “What income level defines the poorest 15%?” (welfare programs)
- “What response time separates best 10% from worst 10%?” (performance evaluation)

7.12.1 Example 5.13: Presidential Economic Policy - Welfare Threshold

Income Distribution Analysis

Presidential economic advisors propose a welfare program for the poorest 15% of the nation.

Question: What income level separates the bottom 15% from the rest?

Data (1996 dollars):
- Mean income: μ = $13,812
- Standard deviation: σ = $3,550
- Distribution: Normal

Solution - The Inverse Process:

Step 1: Visualize the problem
- We know area = 0.15 (left tail)
- We need to find X = income threshold

Step 2: Convert to table lookup area

Table E shows area from mean to Z, not tail area.

\text{Area from mean to Z} = 0.5000 - 0.1500 = 0.3500

Step 3: Find Z from Table E

Look inside table body for area closest to 0.3500
→ Find 0.3508 at Z = 1.04

Step 4: Assign correct sign

We’re working in the left tail (below mean) → Z = -1.04

Step 5: Solve for X

Z = \frac{X - \mu}{\sigma}

-1.04 = \frac{X - 13,812}{3,550}

X = 13,812 + (-1.04)(3,550)

X = 13,812 - 3,692 = \$10,120

Policy Implications

Income threshold: Anyone earning $10,120 or less receives government assistance (bottom 15%).

Budget impact:
- U.S. population ≈ 265 million (1996)
- 15% = 39.75 million people qualify
- At $5,000/person/year → $198.75 billion annual cost

Political considerations:
- Threshold creates “welfare cliff” at $10,121
- May discourage earning just above cutoff
- Recommend graduated phase-out from $10,120-$15,000

7.12.2 Example 5.14: Fire Department Response Times - Performance Benchmarking

Urban Fire Prevention Initiative

A state commission identifies:
- Top 10% fastest fire departments (models)
- Bottom 10% slowest departments (needing improvement)

Data:
- Mean response time: μ = 12.8 minutes
- Standard deviation: σ = 3.7 minutes
- Distribution: Normal

Find: Two response times that separate top 10%, middle 80%, bottom 10%

Solution:

Find X₁ (bottom 10% cutoff):

Step 1: Area from mean to Z = 0.5000 - 0.1000 = 0.4000

Step 2: In Table E, find 0.3997 (closest to 0.4000) → Z = 1.28

Step 3: Left tail → Z = -1.28

-1.28 = \frac{X_1 - 12.8}{3.7}

X_1 = 12.8 + (-1.28)(3.7) = 12.8 - 4.74 = 8.06 \text{ minutes}

Find X₂ (top 10% cutoff):

Step 1: Right tail, same logic → Z = +1.28

1.28 = \frac{X_2 - 12.8}{3.7}

X_2 = 12.8 + (1.28)(3.7) = 12.8 + 4.74 = 17.54 \text{ minutes}

Performance Classification System

Excellent (Top 10%): Response time < 8.06 minutes
→ Serve as model programs for improvement initiatives

Acceptable (Middle 80%): Response time 8.06 - 17.54 minutes
→ Meet standard performance expectations

Needs Improvement (Bottom 10%): Response time > 17.54 minutes
→ Receive training, resources, and mentorship from excellent departments

Implementation strategy:
1. Pair each “needs improvement” department with “excellent” mentor
2. Analyze best practices: dispatch protocols, routing algorithms, staffing
3. Set 12-month improvement target: reduce times by 20%
4. Monthly progress reviews with state commission

Expected impact: If bottom 10% improve to average (12.8 min), estimated 45 lives saved annually statewide.

7.13 5.13 Normal Approximation to the Binomial Distribution

When n is large, calculating binomial probabilities becomes tedious:
- Tables don’t extend to large n
- Formulas involve massive factorials (100!)
- Computers can still struggle with extreme values

Solution: Use normal distribution as approximation!

When to Use Normal Approximation

Requirements:
- n\pi \geq 5 (at least 5 expected successes)
- n(1-\pi) \geq 5 (at least 5 expected failures)
- π reasonably close to 0.50 (symmetric)

Formulas:
\mu = n\pi \sigma = \sqrt{n\pi(1-\pi)}

Continuity Correction Factor:
Because normal is continuous but binomial is discrete:
- P(X = 10) → P(9.5 ≤ X ≤ 10.5)
- P(X ≤ 10) → P(X ≤ 10.5)
- P(X ≥ 10) → P(X ≥ 9.5)

7.13.1 Example 5.15: Labor Union Strike Vote

Union Democracy

40% of union members favor a strike.
15 members selected randomly.

Question: Probability exactly 10 members support strike?

Compare: Binomial (exact) vs Normal (approximation)

Solution:

Binomial (Exact) - From Table B:

P(X = 10 | n = 15, \pi = 0.40) = 0.0245

Normal Approximation:

Step 1: Check requirements

n\pi = 15(0.40) = 6 \geq 5 \quad ✓ n(1-\pi) = 15(0.60) = 9 \geq 5 \quad ✓

Step 2: Calculate normal parameters

\mu = n\pi = 15(0.40) = 6 \sigma = \sqrt{15(0.40)(0.60)} = \sqrt{3.6} = 1.897

Step 3: Apply continuity correction

P(X = 10) → P(9.5 ≤ X ≤ 10.5)

Step 4: Convert to Z-scores

Z_1 = \frac{9.5 - 6}{1.897} = 1.85 \quad \text{(Area = 0.4678)}

Z_2 = \frac{10.5 - 6}{1.897} = 2.37 \quad \text{(Area = 0.4911)}

Step 5: Calculate probability

P(9.5 \leq X \leq 10.5) = 0.4911 - 0.4678 = 0.0233

Accuracy Comparison

Binomial (exact): 0.0245 (2.45%)
Normal (approx): 0.0233 (2.33%)
Difference: 0.0012 (0.12 percentage points)

Error: Only 4.9% relative error - excellent approximation!

When approximation improves:
- Larger n (n > 30)
- π closer to 0.50
- Example: n = 100, π = 0.40 → error typically < 1%

When to stick with binomial:
- Small n (n < 10)
- Extreme π (π < 0.10 or π > 0.90)
- Software available (use exact calculation)

7.14 Problemas Resueltos (Solved Problems)

7.14.1 Problema 1: Medical Device Reliability

A medical device manufacturer produces pacemakers with a 0.3% defect rate (π = 0.003).

A hospital orders n = 500 units.

Questions:
a) What’s the probability at most 2 are defective?
b) What’s the expected number of defectives?
c) Should the hospital inspect all units before implantation?

Solution:

a) P(X ≤ 2): Use Poisson approximation (rare events)

\mu = n\pi = 500(0.003) = 1.5 \text{ defectives}

From Poisson Table (μ = 1.5):
- P(X = 0) = 0.2231
- P(X = 1) = 0.3347
- P(X = 2) = 0.2510

P(X \leq 2) = 0.2231 + 0.3347 + 0.2510 = 0.8088

80.88% probability of 2 or fewer defects

b) Expected defectives:

E(X) = \mu = 1.5 \text{ units}

c) Risk assessment:

P(X ≥ 1) = 1 - P(X = 0) = 1 - 0.2231 = 0.7769 (77.7% chance of at least 1 defect)

Recommendation: ✅ YES, inspect all units
- Pacemaker failure = life-threatening
- 77.7% chance of defect in order is unacceptable risk
- Cost of inspection << cost of patient death/lawsuit

7.14.2 Problema 2: Quality Control - Hypergeometric Application

A shipment contains N = 50 electronic components.
Unknown to the buyer, r = 8 are defective.

Quality inspector randomly selects n = 5 units for testing.

Find: Probability of finding exactly 2 defectives in sample

Solution:

P(X = 2) = \frac{{}_8C_2 \times {}_{42}C_3}{{}_50C_5}

Step 1: Calculate combinations

{}_8C_2 = \frac{8!}{2!6!} = \frac{8 \times 7}{2} = 28

{}_{42}C_3 = \frac{42!}{3!39!} = \frac{42 \times 41 \times 40}{6} = 11,480

{}_50C_5 = \frac{50!}{5!45!} = 2,118,760

Step 2: Calculate probability

P(X = 2) = \frac{28 \times 11,480}{2,118,760} = \frac{321,440}{2,118,760} = 0.1517

15.17% chance of finding exactly 2 defectives

Expected defectives in sample:

E(X) = n \times \frac{r}{N} = 5 \times \frac{8}{50} = 0.8 \text{ units}

Quality decision: Finding 2 defectives (vs expected 0.8) suggests shipment quality is worse than average → Reject entire shipment

7.14.3 Problema 3: Customer Service Call Center

Calls arrive at rate of μ = 18 per hour (Poisson).

Management questions:
a) Probability of exactly 25 calls in next hour?
b) Probability of 15-20 calls (inclusive)?
c) What’s the probability next call arrives within 2 minutes? (Exponential)

Solution:

a) P(X = 25 | μ = 18):

From Poisson Table: μ = 18, X = 25 → 0.0201 (2.01%)

b) P(15 ≤ X ≤ 20):

From Cumulative Poisson Table:
- P(X ≤ 20) = 0.8355
- P(X ≤ 14) = 0.3518

P(15 \leq X \leq 20) = 0.8355 - 0.3518 = 0.4837

48.37% probability (nearly half the time)

c) Exponential - Time until next call:

μ = 18 calls/hour = 18 calls/60 min = 0.30 calls/minute

P(X \leq 2 \text{ min}) = 1 - e^{-\mu t} = 1 - e^{-(0.30)(2)} = 1 - e^{-0.6}

= 1 - 0.5488 = 0.4512

45.12% probability next call within 2 minutes

Average wait: E(X) = \frac{1}{\mu} = \frac{1}{0.30} = 3.33 minutes

7.14.4 Problema 4: Manufacturing Tolerance Analysis

Steel bolts have normally distributed lengths:
- Mean: μ = 5.00 cm
- Standard deviation: σ = 0.12 cm

Specifications: 4.85 cm - 5.15 cm (anything outside is scrap)

Calculate:
a) Percentage within specs
b) Scrap rate
c) New σ needed to achieve 99% within specs

Solution:

a) P(4.85 ≤ X ≤ 5.15):

Z_1 = \frac{4.85 - 5.00}{0.12} = -1.25 \quad \text{(Area = 0.3944)}

Z_2 = \frac{5.15 - 5.00}{0.12} = 1.25 \quad \text{(Area = 0.3944)}

P(4.85 \leq X \leq 5.15) = 0.3944 + 0.3944 = 0.7888

78.88% within specifications

b) Scrap rate:

\text{Scrap} = 1 - 0.7888 = 0.2112

21.12% scrap rate (unacceptable! Target: < 1%)

c) Find σ for 99% within specs:

Target: P(4.85 ≤ X ≤ 5.15) = 0.99
→ Each tail = 0.005
→ Area from mean to spec limit = 0.495

From Table E: Area 0.4950 → Z = 2.58

2.58 = \frac{5.15 - 5.00}{\sigma}

\sigma = \frac{0.15}{2.58} = 0.0581 \text{ cm}

Recommendation: Reduce variability from σ = 0.12 to σ = 0.058 (51% reduction)
→ Requires process improvement: better machines, training, quality control

7.15 Lista de Fórmulas (Formula Reference)

7.15.1 Discrete Distribution Fundamentals

Expected Value (Mean): \mu = E(X) = \sum [x_i \cdot P(x_i)]

Variance: \sigma^2 = \sum [(x_i - \mu)^2 \cdot P(x_i)]

Standard Deviation: \sigma = \sqrt{\sigma^2}

7.15.2 Binomial Distribution

Probability Formula: P(X = x) = {}_nC_x \cdot \pi^x \cdot (1-\pi)^{n-x}

Combinations: {}_nC_x = \frac{n!}{x!(n-x)!}

Mean: \mu = n\pi

Variance: \sigma^2 = n\pi(1-\pi)

Standard Deviation: \sigma = \sqrt{n\pi(1-\pi)}

7.15.3 Hypergeometric Distribution

Probability Formula: P(X = x) = \frac{{}_rC_x \times {}_{N-r}C_{n-x}}{{}_NC_n}

Mean: \mu = n \cdot \frac{r}{N}

7.15.4 Poisson Distribution

Probability Formula: P(X = x) = \frac{\mu^x \cdot e^{-\mu}}{x!}

Mean: \mu = \lambda t \quad \text{(rate × time)}

Variance: \sigma^2 = \mu

Standard Deviation: \sigma = \sqrt{\mu}

7.15.5 Exponential Distribution

Cumulative Probability: P(X \leq t) = 1 - e^{-\mu t}

Mean: E(X) = \frac{1}{\mu}

Variance: \sigma^2 = \frac{1}{\mu^2}

7.15.6 Uniform Distribution

Mean: \mu = \frac{a + b}{2}

Variance: \sigma^2 = \frac{(b-a)^2}{12}

Probability: P(X_1 \leq X \leq X_2) = \frac{X_2 - X_1}{b - a}

Probability Density: f(x) = \frac{1}{b-a}

7.15.7 Normal Distribution

Z-Score Transformation: Z = \frac{X - \mu}{\sigma}

Inverse (Finding X from Z): X = \mu + Z\sigma

Normal Approximation to Binomial: \mu = n\pi \sigma = \sqrt{n\pi(1-\pi)}

Continuity Correction: - P(X = a) → P(a - 0.5 ≤ X ≤ a + 0.5)
- P(X ≤ a) → P(X ≤ a + 0.5)
- P(X ≥ a) → P(X ≥ a - 0.5)

7.15.8 Empirical Rule (Normal Distributions Only)

68.3% within μ ± 1σ
95.5% within μ ± 2σ
99.7% within μ ± 3σ

7.16 Chapter Summary

This chapter introduced the major probability distributions used in business statistics:

Discrete Distributions:
- Binomial: Fixed trials, constant probability, independent (credit approvals, quality sampling)
- Hypergeometric: Sampling without replacement from finite population (discrimination cases, shipment inspection)
- Poisson: Rare events over time/space (customer arrivals, defects, accidents)

Continuous Distributions:
- Exponential: Time between events (taxi arrivals, equipment failure, service times)
- Uniform: Equally likely outcomes (random number generation, arrival times)
- Normal: The crown jewel - heights, weights, IQ, financial returns, measurement error

Key Skills Mastered:
✓ Calculate probabilities using distribution formulas and tables
✓ Convert normal distributions to standard normal (Z-scores)
✓ Find probabilities from Z-scores and vice versa
✓ Apply continuity correction for normal approximation to binomial
✓ Select appropriate distribution based on business context

Next Chapter: We build on these foundations to explore sampling distributions - the bridge between probability theory and statistical inference!

Next Chapter: Sampling Distributions

--- title: "Probability Distributions" subtitle: "Modeling Business Uncertainty with Discrete and Continuous Distributions" --- # Chapter 5: Probability Distributions {#sec-probability-distributions} ```{mermaid} graph TD A[Probability Distributions] --> B[Discrete Distributions] A --> C[Continuous Distributions] B --> B1[Binomial Distribution] B --> B2[Cumulative Binomial Distribution] B --> B3[Hypergeometric Distribution] B --> B4[Poisson Distribution] C --> C1[Exponential Distribution] C --> C2[Uniform Distribution] C --> C3[Normal Distribution] C3 --> C3a[Standard Normal Distribution] C3 --> C3b[Probability Calculation] C3 --> C3c[Determining the Value of X] C3 --> C3d[Approximation to Binomial Distribution] style A fill:#000,stroke:#000,color:#fff style B fill:#000,stroke:#000,color:#fff style C fill:#000,stroke:#000,color:#fff ``` ::: {.callout-note icon="🎯"} ## Learning Objectives By the end of this chapter, you will be able to: - **Calculate expected values and variances** for discrete probability distributions - **Apply the binomial distribution** to yes/no business scenarios - **Use the hypergeometric distribution** for sampling without replacement - **Model rare events** with the Poisson distribution - **Analyze time-between-events** using the exponential distribution - **Work with uniform distributions** for equally likely outcomes - **Master the normal distribution** for continuous business data - **Convert problems to standard normal** (Z-scores) for probability calculations - **Approximate binomial** with normal distribution for large samples ::: ## 5.1 Introduction: From Probability Principles to Distributions In Chapter 4, we learned to calculate the probability of **individual events**. Now we extend those principles to **probability distributions** - comprehensive models that describe the likelihood of **all possible outcomes** for a random variable. **Real-World Business Applications:** A Bradley University study of Peoria, Illinois emergency services revealed: - **911 call response times**: Uniformly distributed between 1.2 and 4.6 minutes - **Call arrival rate**: Poisson distribution with average of 9 calls per hour - **Home values**: Normally distributed with mean $45,750, SD $15,110 - **Budget impact**: 42,089 homes potentially subject to new property tax The mayor wanted to reduce average response time to 2 minutes at a cost of **$575,000 per 30-second improvement**. This chapter provides the statistical toolkit to analyze such decisions. ::: {.callout-tip icon="📚"} ## Key Concept: Random Variables **Random Variable**: A variable whose value is determined by a random experiment. **Discrete Random Variable**: Can assume only specific values (usually integers) - results from **counting** - Examples: Number of defects, customer arrivals, sales calls, coin flips **Continuous Random Variable**: Can assume any value within a range - results from **measuring** - Examples: Weight, time, temperature, income, response time **Probability Distribution**: A listing of all possible outcomes and their associated probabilities. ::: ### Visualizing Probability Distributions **Discrete Distribution Example** - Rolling a die: | Outcome | 1 | 2 | 3 | 4 | 5 | 6 | |---------|--:|--:|--:|--:|--:|--:| | P(X) | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | {.striped .hover} **Properties of ALL probability distributions:** 1. $0 \leq P(X = x_i) \leq 1$ (each probability between 0 and 1) 2. $\sum P(X = x_i) = 1$ (all probabilities sum to 1) ## 5.2 Mean and Variance of Discrete Distributions Just as we calculated mean and variance for data sets in Chapter 3, we can compute them for probability distributions. ::: {.callout-tip icon="📚"} ## Expected Value (Mean) of Discrete Distribution $$\mu = E(X) = \sum [x_i \cdot P(x_i)]$$ **Interpretation**: The long-run average value if we repeat the experiment many times. **Variance of Discrete Distribution** $$\sigma^2 = \sum [(x_i - \mu)^2 \cdot P(x_i)]$$ **Standard Deviation** $$\sigma = \sqrt{\sigma^2}$$ ::: ### Example 5.1: Ponder Real Estate Monthly Sales ::: {.callout-note icon="💼" appearance="minimal"} ## Business Scenario Ponder Real Estate tracked monthly home sales over 24 months: | Houses Sold (x) | Number of Months | Houses Sold (x) | Number of Months | |----------------:|------------------:|----------------:|------------------:| | 5 | 3 | 12 | 5 | | 8 | 7 | 17 | 3 | | 10 | 4 | 20 | 2 | Mr. Ponder previously averaged 7.3 sales/month with $\sigma = 5.7$ sales. He'll **quit** and become a rodeo clown if the new data doesn't show improvement (higher mean, lower variability). **Should Mr. Ponder stick with real estate?** ::: **Solution - Step 1: Convert to Probability Distribution** | Houses (x) | Months | P(x) | x · P(x) | (x - μ)² · P(x) | |-----------:|-------:|-----:|---------:|----------------:| | 5 | 3 | 3/24 = 0.125 | 0.625 | (5-10.912)² (0.125) = 4.369 | | 8 | 7 | 7/24 = 0.292 | 2.336 | (8-10.912)² (0.292) = 2.476 | | 10 | 4 | 4/24 = 0.167 | 1.670 | (10-10.912)² (0.167) = 0.139 | | 12 | 5 | 5/24 = 0.208 | 2.496 | (12-10.912)² (0.208) = 0.246 | | 17 | 3 | 3/24 = 0.125 | 2.125 | (17-10.912)² (0.125) = 4.633 | | 20 | 2 | 2/24 = 0.083 | 1.660 | (20-10.912)² (0.083) = 6.855 | | **Total** | 24 | 1.000 | **10.912** | **18.718** | {.striped .hover} **Step 2: Calculate Statistics** $$\mu = E(X) = 10.912 \text{ houses/month}$$ $$\sigma^2 = 18.718 \text{ houses}^2$$ $$\sigma = \sqrt{18.718} = 4.326 \text{ houses}$$ **Step 3: Compare to Previous Performance** | Metric | Previous | New | Change | |--------|----------:|----:|-------:| | Mean (μ) | 7.3 | 10.912 | +3.612 ✓ | | SD (σ) | 5.7 | 4.326 | -1.374 ✓ | {.striped .hover} ::: {.callout-important icon="💡"} ## Business Insight **Good news for Mr. Ponder!** ✅ **Increased sales**: Average jumped from 7.3 to 10.9 houses/month (+49.5%) ✅ **Reduced variability**: Standard deviation dropped from 5.7 to 4.3 (-24.1%) **Translation**: More consistent, higher performance. He should **stay in real estate** and skip the rodeo career! **Strategic implication**: Whatever changed in the past 24 months (marketing, market conditions, sales process) is working. Document and replicate the success factors. ::: ## 5.3 The Binomial Distribution - Modeling Yes/No Outcomes Many business situations involve **binary outcomes** repeated multiple times: - Will this customer buy? (Yes/No) × 50 sales calls - Is this product defective? (Yes/No) × 100 units inspected - Did the student pass? (Yes/No) × 200 test takers The **binomial distribution** is perfect for these scenarios. ::: {.callout-tip icon="📚"} ## Binomial Distribution Requirements **Four Properties** (Bernoulli Process): 1. **Fixed number of trials** ($n$) 2. **Only two outcomes** per trial (success/failure) 3. **Constant probability** ($\pi$) for each trial 4. **Independent trials** (one doesn't affect others) **Binomial Probability Formula** $$P(X = x) = {}_nC_x \cdot \pi^x \cdot (1-\pi)^{n-x}$$ Where: - $n$ = number of trials - $x$ = number of successes desired - $\pi$ = probability of success on single trial - ${}_nC_x = \frac{n!}{x!(n-x)!}$ = combinations **Mean and Variance (Shortcuts)** $$\mu = n\pi$$ $$\sigma^2 = n\pi(1-\pi)$$ $$\sigma = \sqrt{n\pi(1-\pi)}$$ ::: ### Example 5.2: Journal of Higher Education - Summer Jobs ::: {.callout-note icon="💼" appearance="minimal"} ## College Student Employment The Journal of Higher Education reports **40% of high school graduates** work summer jobs to earn college tuition. If we randomly select **7 graduates**, find the probability that: a) Exactly 5 have summer jobs b) None have summer jobs c) All 7 have summer jobs ::: **Solution:** **Given:** $n = 7$ trials, $\pi = 0.40$ probability of success **a) Exactly 5 have jobs:** $P(X = 5)$ $$P(X=5) = {}_7C_5 \cdot (0.40)^5 \cdot (0.60)^2$$ $$= \frac{7!}{5!2!} \cdot (0.01024) \cdot (0.36)$$ $$= 21 \cdot 0.01024 \cdot 0.36 = 0.0774$$ **From Binomial Table (Appendix III, Table B):** Look up $n=7$, $\pi=0.40$, $x=5$ → **0.0774** **b) None have jobs:** $P(X = 0)$ $$P(X=0) = {}_7C_0 \cdot (0.40)^0 \cdot (0.60)^7 = 1 \cdot 1 \cdot 0.0280 = 0.0280$$ **From table:** $n=7$, $\pi=0.40$, $x=0$ → **0.0280** **c) All 7 have jobs:** $P(X = 7)$ $$P(X=7) = {}_7C_7 \cdot (0.40)^7 \cdot (0.60)^0 = 1 \cdot 0.0016 \cdot 1 = 0.0016$$ **From table:** $n=7$, $\pi=0.40$, $x=7$ → **0.0016** ::: {.callout-important icon="💡"} ## Interpretation - **7.74% chance** exactly 5 of 7 work (moderately likely) - **2.80% chance** none work (rare - less than 3%) - **0.16% chance** all 7 work (very rare - less than 2 in 1000) **Most likely outcome:** $E(X) = n\pi = 7(0.40) = 2.8 \approx 3$ students working The **extremes** (0 or 7) are both unlikely. We'd typically see 2-4 students with summer jobs in a sample of 7. ::: ### Handling π > 0.50: The Complement Trick **Problem:** Binomial tables only go up to $\pi = 0.50$. What if $\pi = 0.70$? **Solution:** Use the **complement**! ### Example 5.3: Internet Connectivity in Flatbush ::: {.callout-note icon="💼" appearance="minimal"} ## Scenario 70% of Flatbush residents have internet connectivity. Of 10 randomly selected residents, what's the probability **exactly 6** are connected? **Challenge:** $\pi = 0.70 > 0.50$ (not in table) ::: **Solution - The Complement Trick:** **Key insight:** If 70% are **connected** (success), then 30% are **not connected** (failure). **Reframe:** 6 successes at $\pi = 0.70$ **equals** 4 failures at $\pi = 0.30$ **Visual proof:** | 0 | 1 | 2 | 3 | 4 | 5 | **6** | 7 | 8 | 9 | 10 | (π=0.70) | |---|---|---|---|---|---|-------|---|---|---|----|----| | 10 | 9 | 8 | 7 | 6 | 5 | **4** | 3 | 2 | 1 | 0 | (π=0.30) | Therefore: $$P(X = 6 | n=10, \pi=0.70) = P(X = 4 | n=10, \pi=0.30)$$ **From Binomial Table:** $n=10$, $\pi=0.30$, $x=4$ → **0.2001** ::: {.callout-tip icon="💡"} **Rule:** When $\pi > 0.50$, find $P(X = x)$ by looking up $P(X = n-x)$ with $\pi' = 1 - \pi$ ::: ## 5.4 Cumulative Binomial Distributions Often we need probability of a **range** rather than exact value: - "At most 3 defects" → $P(X \leq 3)$ - "At least 5 sales" → $P(X \geq 5)$ - "Between 2 and 4 complaints" → $P(2 \leq X \leq 4)$ ### Using Cumulative Binomial Tables (Table C) **Table C provides:** $P(X \leq k) = P(X=0) + P(X=1) + ... + P(X=k)$ ### Example 5.4: Student Employment (Continued) ::: {.callout-note icon="💼" appearance="minimal"} Using the summer job data ($n=7$, $\pi=0.40$), find probability that: a) **3 or fewer** have jobs b) **At least 5** have jobs c) **Between 3 and 5** (inclusive) have jobs ::: **Solution:** **a) 3 or fewer:** $P(X \leq 3)$ **From Cumulative Table C:** $n=7$, $\pi=0.40$, $x=3$ → **0.7102** **b) At least 5:** $P(X \geq 5)$ Tables give $P(X \leq k)$, not $P(X \geq k)$. Use complement: $$P(X \geq 5) = 1 - P(X \leq 4)$$ **From Table C:** $n=7$, $\pi=0.40$, $x=4$ → 0.9037 $$P(X \geq 5) = 1 - 0.9037 = 0.0963$$ **c) Between 3 and 5 (inclusive):** $P(3 \leq X \leq 5)$ **Strategy:** $P(3 \leq X \leq 5) = P(X \leq 5) - P(X \leq 2)$ **From Table C:** - $P(X \leq 5) = 0.9812$ - $P(X \leq 2) = 0.4199$ $$P(3 \leq X \leq 5) = 0.9812 - 0.4199 = 0.5613$$ ::: {.callout-important icon="💡"} ## Summary - **71.02%** probability 3 or fewer work (high - most samples will be in this range) - **9.63%** probability at least 5 work (low - only 1 in 10 samples) - **56.13%** probability between 3-5 work (moderate - slightly better than coin flip) **Central tendency:** The distribution clusters around expected value $\mu = 2.8$, making 3-5 the most probable range. ::: ## 5.5 The Hypergeometric Distribution The binomial distribution requires **constant probability** across trials. But what if we sample **without replacement** from a **small population**? **Example:** Drawing cards from a deck. First draw: P(Ace) = 4/52. Second draw: P(Ace) = 3/51 (if first was Ace) or 4/51 (if not). Probability **changed** → Binomial doesn't apply → Use **Hypergeometric**! ::: {.callout-tip icon="📚"} ## Hypergeometric Distribution Formula $$P(X = x) = \frac{{}_rC_x \times {}_{N-r}C_{n-x}}{{}_NC_n}$$ Where: - $N$ = population size - $r$ = number of successes in population - $n$ = sample size - $x$ = number of successes desired in sample **When to Use:** - Sampling **without replacement** - From **finite population** - Sample size is **large relative to population** (typically $n/N > 0.05$) ::: ### Example 5.5: Racehorse Contagious Disease ::: {.callout-note icon="💼" appearance="minimal"} ## Veterinary Scenario A stable contains **N = 10 racehorses**. **r = 4 horses** have a contagious disease. A veterinarian randomly selects **n = 3 horses** for testing. What's the probability **exactly 2** of the 3 tested horses are sick? ::: **Solution:** $$P(X = 2) = \frac{{}_4C_2 \times {}_{10-4}C_{3-2}}{{}_10C_3}$$ **Step 1: Calculate combinations** $${}_4C_2 = \frac{4!}{2!2!} = \frac{24}{4} = 6 \text{ (ways to select 2 sick from 4)}$$ $${}_{6}C_1 = \frac{6!}{1!5!} = 6 \text{ (ways to select 1 healthy from 6)}$$ $${}_10C_3 = \frac{10!}{3!7!} = \frac{720}{6} = 120 \text{ (total ways to select 3 from 10)}$$ **Step 2: Calculate probability** $$P(X = 2) = \frac{6 \times 6}{120} = \frac{36}{120} = 0.30$$ ::: {.callout-important icon="💡"} ## Veterinary Insight **30% chance** of finding exactly 2 sick horses in the sample of 3. **Expected value:** $E(X) = n \times \frac{r}{N} = 3 \times \frac{4}{10} = 1.2$ sick horses Finding 2 sick horses is **above average** but not rare. The vet should: - Test the remaining 7 horses immediately - Quarantine the stable - Begin treatment protocol for confirmed cases ::: ### Example 5.6: Employment Discrimination Case - Johnson District ::: {.callout-note icon="�16" appearance="minimal"} ## Legal Application of Hypergeometric Distribution **Background:** Three women sued a Kansas City utility company for gender discrimination. **Facts:** - **N = 9** people eligible for promotion - **r = 4** were women - **n = 3** people actually promoted - **x = 1** woman was promoted **Legal question:** If gender played no role (random selection), what's the probability **at most 1 woman** would be promoted? If this probability is high, it suggests random chance could explain the outcome (no discrimination). ::: **Solution:** We need: $P(X \leq 1) = P(X=0) + P(X=1)$ **P(X = 1): Exactly 1 woman promoted** $$P(X=1) = \frac{{}_4C_1 \times {}_5C_2}{{}_9C_3} = \frac{4 \times 10}{84} = \frac{40}{84} = 0.4762$$ **P(X = 0): No women promoted** $$P(X=0) = \frac{{}_4C_0 \times {}_5C_3}{{}_9C_3} = \frac{1 \times 10}{84} = \frac{10}{84} = 0.1190$$ **Total probability:** $$P(X \leq 1) = 0.4762 + 0.1190 = 0.5952$$ ::: {.callout-important icon="💡"} ## Legal Interpretation **59.52% chance** (nearly 60%) that at most 1 woman would be promoted **purely by random selection**. **Court's conclusion:** This is **not** unusually low. Random chance could easily produce this outcome without discrimination. **Statistical standard:** Courts typically look for probabilities **< 5%** before inferring discrimination. Here, 59.52% is far above that threshold. **Verdict:** Insufficient statistical evidence of gender bias. Case dismissed. **Note:** This doesn't **prove** no discrimination occurred - only that the statistical evidence is weak. ::: ## 5.6 The Poisson Distribution - Modeling Rare Events The **Poisson distribution** models the number of **rare events** occurring in a fixed interval (time, space, area, volume): - Customer arrivals per hour - Defects per 100 units - Website crashes per month - Typos per page - Accidents per quarter ::: {.callout-tip icon="📚"} ## Poisson Distribution Formula $$P(X = x) = \frac{\mu^x \cdot e^{-\mu}}{x!}$$ Where: - $\mu$ = average number of occurrences in the interval - $x$ = specific number of occurrences - $e = 2.71828$ (natural logarithm base) **Requirements:** 1. Events are **rare** (low probability) 2. Events are **independent** 3. Average rate ($\mu$) is **constant** **Mean and Variance (Same value!)** $$E(X) = \mu$$ $$\text{Var}(X) = \mu$$ $$\sigma = \sqrt{\mu}$$ ::: ### Example 5.7: University Tutorial Office ::: {.callout-note icon="💼" appearance="minimal"} ## Student Arrivals Students arrive at the statistics tutorial office at an average rate of **μ = 5.2 students per hour** (Poisson distribution). Calculate the probability that in any given hour: a) **Exactly 4** students arrive b) **No students** arrive c) **Exactly 8** students arrive ::: **Solution:** **a) Exactly 4 students:** $P(X = 4)$ $$P(X=4) = \frac{(5.2)^4 \cdot e^{-5.2}}{4!} = \frac{731.16 \cdot 0.00552}{24} = \frac{4.036}{24} = 0.1681$$ **From Poisson Table (Table D):** $\mu = 5.2$, $x = 4$ → **0.1681** **b) No students:** $P(X = 0)$ $$P(X=0) = \frac{(5.2)^0 \cdot e^{-5.2}}{0!} = \frac{1 \cdot 0.00552}{1} = 0.0055$$ **From table:** $\mu = 5.2$, $x = 0$ → **0.0055** **c) Exactly 8 students:** $P(X = 8)$ **From table:** $\mu = 5.2$, $x = 8$ → **0.0731** ::: {.callout-important icon="💡"} ## Tutorial Office Staffing Insights - **16.81%** chance of exactly 4 students (common outcome) - **0.55%** chance of zero students (very rare - empty office is unusual) - **7.31%** chance of 8 students (occasional surge) **Operational planning:** - **Expect 5-6 students** most hours (around the mean) - **Staff for peak:** Plan capacity for 8-10 students to handle surges (95th percentile ≈ 9 students) - **Quiet hours rare:** Empty office only ~1 in 200 hours **Expected variance:** $\sigma = \sqrt{5.2} = 2.28$ students - moderate variability ::: ### Adjusting μ for Different Time Intervals **Critical skill:** The mean must match the time period in the problem! ### Example 5.8: Defect Rate Adjustment ::: {.callout-note icon="💼" appearance="minimal"} A manufacturing process averages **μ = 4 defects per 100 units**. Find the probability of: a) Exactly 10 defects in **200 units** b) No defects in **50 units** ::: **Solution:** **a) 200 units:** Adjust μ $$\mu_{200} = 4 \times \frac{200}{100} = 8 \text{ defects}$$ $$P(X=10 | \mu=8) = 0.0993 \text{ (from table)}$$ **b) 50 units:** Adjust μ $$\mu_{50} = 4 \times \frac{50}{100} = 2 \text{ defects}$$ $$P(X=0 | \mu=2) = 0.1353 \text{ (from table)}$$ ## 5.7 The Exponential Distribution - Time Between Events While **Poisson** models **how many** events occur, **Exponential** models **how long** between events. If arrivals follow Poisson → time between arrivals follows Exponential. ::: {.callout-tip icon="📚"} ## Exponential Distribution Formula $$P(X \leq t) = 1 - e^{-\mu t}$$ Where: - $t$ = time period of interest - $\mu$ = average rate of occurrence - $e = 2.71828$ **Relationship to Poisson:** - Poisson: "3 customers per hour" → How many? - Exponential: "Time until next customer" → How long? **Mean and Variance** $$E(X) = \frac{1}{\mu}$$ $$\sigma^2 = \frac{1}{\mu^2}$$ ::: ### Example 5.9: Cross City Cab Company ::: {.callout-note icon="�16" appearance="minimal"} ## Airport Taxi Service Taxis arrive at the local airport following a **Poisson distribution** with **μ = 12 taxis per hour**. You just landed and need to reach downtown for an important business deal. Your boss won't tolerate failure. **Decision rule:** If probability of taxi within 5 minutes < 50%, rent a car instead. **Should you wait for a taxi or rent a car?** ::: **Solution:** **Time conversion:** μ = 12 per 60 minutes. What fraction is 5 minutes? $$t = \frac{5}{60} = \frac{1}{12}$$ **Probability taxi arrives within 5 minutes:** $$P(X \leq 5 \text{ min}) = 1 - e^{-\mu t} = 1 - e^{-(12)(1/12)} = 1 - e^{-1}$$ **From Table D** (Poisson table where x=0 gives $e^{-\mu}$): $e^{-1} = 0.3679$ $$P(X \leq 5) = 1 - 0.3679 = 0.6321$$ ::: {.callout-important icon="💡"} ## Business Decision ✅ **Wait for the taxi!** **63.21% probability** a taxi arrives within 5 minutes **(> 50% threshold)**. **Additional insights:** - $P(X \leq 10 \text{ min}) = 1 - e^{-2} = 1 - 0.1353 = 0.8647$ (86.5% within 10 min) - $P(5 < X \leq 10) = 0.8647 - 0.6321 = 0.2326$ (23.3% arrive between 5-10 min) **Expected wait time:** $E(X) = \frac{1}{\mu} = \frac{1}{12}$ hour = **5 minutes** With 12 taxis/hour on average, you'll likely wait around 5 minutes - acceptable for most business travelers. ::: ## 5.8 The Uniform Distribution - Equally Likely Outcomes The **uniform distribution** applies when all outcomes in a range are **equally likely**. Think of it as a "flat" distribution - constant probability density across the range. ::: {.callout-tip icon="📚"} ## Uniform Distribution Formulas **Continuous Uniform Distribution on [a, b]** **Mean:** $$\mu = \frac{a + b}{2}$$ **Variance:** $$\sigma^2 = \frac{(b-a)^2}{12}$$ **Height (probability density):** $$\text{Height} = \frac{1}{b-a}$$ **Probability X falls between $X_1$ and $X_2$:** $$P(X_1 \leq X \leq X_2) = \frac{X_2 - X_1}{b - a}$$ ::: ### Example 5.10: Dow Chemical Fertilizer Bags ::: {.callout-note icon="�16" appearance="minimal"} ## Uniform Weight Distribution Dow Chemical produces lawn fertilizer in bags with **uniformly distributed** weight: - **Mean:** μ = 25 pounds - **Range:** 2.4 pounds **Harry Homeowner's concerns:** a) He needs **at least 23 pounds**. Will any bag suffice? b) What's the probability a bag weighs **more than 25.5 pounds**? ::: **Solution:** **Step 1: Determine a and b** Range = 2.4 pounds spreads evenly around mean of 25: $$a = \mu - \frac{\text{Range}}{2} = 25 - \frac{2.4}{2} = 25 - 1.2 = 23.8 \text{ lbs}$$ $$b = \mu + \frac{\text{Range}}{2} = 25 + 1.2 = 26.2 \text{ lbs}$$ **a) Will Harry get at least 23 pounds?** **Minimum bag weight** = 23.8 pounds ✅ **YES** - Even the lightest bag (23.8 lbs) exceeds Harry's 23 lb requirement! **b) Probability of bag > 25.5 pounds:** $$P(X > 25.5) = P(25.5 < X \leq 26.2) = \frac{26.2 - 25.5}{26.2 - 23.8}$$ $$= \frac{0.7}{2.4} = 0.2917$$ ::: {.callout-important icon="💡"} ## Consumer Insight **✅ Harry has no worries!** - **Guaranteed** at least 23 pounds (minimum is 23.8) - **29.17% chance** of getting bonus weight (> 25.5 lbs) - **Average bag:** 25 pounds (exactly what's advertised) **Quality perspective:** - **Consistent product:** Uniform distribution means predictable range - **No short-weighting:** Minimum 23.8 lbs protects consumers - **Bonus potential:** Nearly 1 in 3 bags exceed 25.5 lbs **Compared to normal distribution:** Uniform has **no extreme values** - all bags within narrow 2.4 lb range. ::: ## 5.9 The Normal Distribution - The Crown Jewel of Statistics Of all probability distributions, the **normal distribution** is the most important in statistics. We introduced its bell-shaped, symmetric form in Chapter 3 with the empirical rule. Now we'll unlock its full analytical power. **Key characteristics:** - **Continuous distribution** (not discrete) - **Infinite range** (theoretically from -∞ to +∞) - Used for measured variables: heights, weights, temperatures, IQ scores, financial returns - **Completely defined** by two parameters: mean (μ) and standard deviation (σ) ### Real-World Case: ToppsWear Clothing Manufacturer ToppsWear recognized the public was constantly changing in physical size and proportions. To produce better-fitting clothing, management commissioned a comprehensive study of customer body measurements. **Findings:** Customer heights are **normally distributed** with: - **Mean:** μ = 67 inches - **Standard deviation:** σ = 2 inches This normal distribution allows ToppsWear to: 1. Predict what percentage of customers fall into each size range 2. Optimize inventory allocation across sizes 3. Minimize both stockouts and excess inventory ::: {.callout-tip icon="📚"} ## Properties of the Normal Distribution **Symmetry:** - **50%** of observations above the mean - **50%** of observations below the mean - Mirror image around μ **Area Under the Curve:** - Total area = 1.00 (100%) - Area = Probability - 50% of area to right of μ, 50% to left **Empirical Rule (68-95-99.7):** - **68.3%** of observations within μ ± 1σ - **95.5%** of observations within μ ± 2σ - **99.7%** of observations within μ ± 3σ ::: ### Comparing Different Normal Distributions **Three scenarios for ToppsWear:** **Distribution I:** μ = 67 inches, σ = 2 inches (adult population) **Distribution II:** μ = 79 inches, σ = 2 inches (basketball players) **Distribution III:** μ = 67 inches, σ = 4 inches (more diverse population) **Key insights:** - **Different means** → Shift left/right (Distributions I vs II) - **Different standard deviations** → Change width/flatness (Distributions I vs III) - **Same area percentages** apply regardless of μ or σ (Empirical Rule) **Distribution III** is flatter and more spread out because σ = 4 (double the variability of Distribution I). **Empirical Rule application:** - **Distribution I:** 68.3% of heights between 65-69 inches (μ ± 1σ) - **Distribution III:** 68.3% of heights between 63-71 inches (wider range due to larger σ) ## 5.10 The Standard Normal Distribution (Z-Scores) Since there are infinite possible normal distributions (each with different μ and σ), statisticians created a **standard form** for all calculations. ::: {.callout-tip icon="📚"} ## Z-Score Transformation Formula $$Z = \frac{X - \mu}{\sigma}$$ Where: - **Z** = standard normal deviate (Z-score) - **X** = original value - **μ** = population mean - **σ** = population standard deviation **Standard Normal Distribution:** - **Mean = 0** - **Standard deviation = 1** - **Symmetric around Z = 0** **Interpretation of Z:** "The number of standard deviations an observation is above (+) or below (-) the mean." ::: ### Example 5.11: ToppsWear Customer Heights - Z-Score Conversions ::: {.callout-note icon="💼" appearance="minimal"} ## Three Customers **Tom Typical:** 67 inches tall **Paula Petite:** 63 inches tall **Steve Stretch:** 70 inches tall Convert each height to Z-scores given μ = 67 inches, σ = 2 inches. ::: **Solution:** **Tom Typical: X = 67** $$Z = \frac{67 - 67}{2} = \frac{0}{2} = 0$$ Tom is **exactly average** → Z = 0 (at the mean) **Paula Petite: X = 63** $$Z = \frac{63 - 67}{2} = \frac{-4}{2} = -2.00$$ Paula is **2 standard deviations below average** → Z = -2.00 **Steve Stretch: X = 70** $$Z = \frac{70 - 67}{2} = \frac{3}{2} = 1.50$$ Steve is **1.5 standard deviations above average** → Z = 1.50 ::: {.callout-important icon="💡"} ## Z-Score Interpretation Guide **Z = 0:** Exactly at the mean (perfectly average) **Z = +1:** One standard deviation above average (taller/larger) **Z = -1:** One standard deviation below average (shorter/smaller) **Z > +3:** Extremely high (top 0.15%) **Z < -3:** Extremely low (bottom 0.15%) **ToppsWear application:** - Paula (Z = -2) is **shorter than 97.7%** of customers → Small/Petite sizes - Tom (Z = 0) is **median customer** → Medium/Regular sizes - Steve (Z = 1.5) is **taller than 93.3%** of customers → Large/Tall sizes ::: ## 5.11 Calculating Probabilities with the Normal Distribution The beauty of standardization: **If you know the area, you know the probability!** **Think of it like a dartboard:** - 2/3 of target painted green - 1/3 of target painted red - Equal chance of hitting any point - P(hitting green) = 2/3 **because** 2/3 of area is green **Same logic for normal curves:** Area under curve = Probability ### Using the Standard Normal Table (Table E) **Table E provides:** Area from **mean (Z=0)** to any Z-value ### Example 5.12: TelCom Satellite Transmission Times ::: {.callout-note icon="💼" appearance="minimal"} ## Business Communication Service TelCom Satellite provides communication services to Chicago businesses. **Data:** - **Mean transmission time:** μ = 150 seconds - **Standard deviation:** σ = 15 seconds - **Distribution:** Normal The service director needs probability estimates for: a) Transmission between **125 and 150 seconds** b) Transmission **less than 125 seconds** c) Transmission between **145 and 155 seconds** d) Transmission between **160 and 165 seconds** ::: **Solution:** **a) P(125 ≤ X ≤ 150):** $$Z = \frac{125 - 150}{15} = \frac{-25}{15} = -1.67$$ **From Table E:** Z = 1.67 → Area = **0.4525** ::: {.callout-note icon="📊"} **Visual:** 45.25% of all transmissions fall between 125-150 seconds ::: **b) P(X < 125):** **Strategy:** Total area below mean = 0.5000 Area between 125-150 = 0.4525 $$P(X < 125) = 0.5000 - 0.4525 = 0.0475$$ **Only 4.75%** of transmissions are shorter than 125 seconds (rare!) **c) P(145 ≤ X ≤ 155):** **Step 1:** Find area from 145 to 150 $$Z = \frac{145 - 150}{15} = -0.33$$ **From Table E:** Area = 0.1293 **Step 2:** By symmetry, area from 150 to 155 = 0.1293 $$P(145 \leq X \leq 155) = 0.1293 + 0.1293 = 0.2586$$ **25.86%** of transmissions fall in this narrow 10-second window **d) P(160 ≤ X ≤ 165):** **Step 1:** Area from 150 to 165 $$Z = \frac{165 - 150}{15} = 1.00$$ **From Table E:** Area = 0.3413 **Step 2:** Area from 150 to 160 $$Z = \frac{160 - 150}{15} = 0.67$$ **From Table E:** Area = 0.2486 **Step 3:** Subtract $$P(160 \leq X \leq 165) = 0.3413 - 0.2486 = 0.0927$$ ::: {.callout-important icon="💡"} ## Strategic Business Implications for TelCom **Service demand profile:** - **Most transmissions (95%)** between 120-180 seconds (μ ± 2σ) - **Very short calls (<125 sec)** rare at 4.75% → Premium pricing opportunity - **Mid-range calls (145-155)** common at 25.86% → Standard pricing tier - **Longer calls (160-165)** moderate at 9.27% → Volume discount potential **Capacity planning:** - Plan for peak at 180 seconds (μ + 2σ) to serve 97.5% of calls - Reserve overflow capacity for extreme cases (>180 sec) **Revenue optimization:** - Tiered pricing: <120s (premium), 120-180s (standard), >180s (discounted) - Bundle packages targeting 145-155 second average usage (26% of market) ::: ## 5.12 Finding X-Values from Known Probabilities (Inverse Normal) Sometimes we know the **desired probability** and must find the **corresponding X-value**. **Business applications:** - "What score puts a student in top 10%?" (scholarships) - "What income level defines the poorest 15%?" (welfare programs) - "What response time separates best 10% from worst 10%?" (performance evaluation) ### Example 5.13: Presidential Economic Policy - Welfare Threshold ::: {.callout-note icon="💼" appearance="minimal"} ## Income Distribution Analysis Presidential economic advisors propose a welfare program for the **poorest 15%** of the nation. **Question:** What income level separates the bottom 15% from the rest? **Data (1996 dollars):** - **Mean income:** μ = $13,812 - **Standard deviation:** σ = $3,550 - **Distribution:** Normal ::: **Solution - The Inverse Process:** **Step 1: Visualize the problem** - We know **area** = 0.15 (left tail) - We need to find **X** = income threshold **Step 2: Convert to table lookup area** Table E shows area from **mean to Z**, not tail area. $$\text{Area from mean to Z} = 0.5000 - 0.1500 = 0.3500$$ **Step 3: Find Z from Table E** Look **inside** table body for area closest to 0.3500 → Find 0.3508 at **Z = 1.04** **Step 4: Assign correct sign** We're working in the **left tail** (below mean) → **Z = -1.04** **Step 5: Solve for X** $$Z = \frac{X - \mu}{\sigma}$$ $$-1.04 = \frac{X - 13,812}{3,550}$$ $$X = 13,812 + (-1.04)(3,550)$$ $$X = 13,812 - 3,692 = \$10,120$$ ::: {.callout-important icon="💡"} ## Policy Implications **Income threshold:** Anyone earning **$10,120 or less** receives government assistance (bottom 15%). **Budget impact:** - U.S. population ≈ 265 million (1996) - 15% = **39.75 million people** qualify - At $5,000/person/year → **$198.75 billion annual cost** **Political considerations:** - Threshold creates "welfare cliff" at $10,121 - May discourage earning just above cutoff - Recommend graduated phase-out from $10,120-$15,000 ::: ### Example 5.14: Fire Department Response Times - Performance Benchmarking ::: {.callout-note icon="💼" appearance="minimal"} ## Urban Fire Prevention Initiative A state commission identifies: - **Top 10%** fastest fire departments (models) - **Bottom 10%** slowest departments (needing improvement) **Data:** - **Mean response time:** μ = 12.8 minutes - **Standard deviation:** σ = 3.7 minutes - **Distribution:** Normal **Find:** Two response times that separate top 10%, middle 80%, bottom 10% ::: **Solution:** **Find X₁ (bottom 10% cutoff):** **Step 1:** Area from mean to Z = 0.5000 - 0.1000 = 0.4000 **Step 2:** In Table E, find 0.3997 (closest to 0.4000) → Z = 1.28 **Step 3:** Left tail → Z = **-1.28** $$-1.28 = \frac{X_1 - 12.8}{3.7}$$ $$X_1 = 12.8 + (-1.28)(3.7) = 12.8 - 4.74 = 8.06 \text{ minutes}$$ **Find X₂ (top 10% cutoff):** **Step 1:** Right tail, same logic → Z = **+1.28** $$1.28 = \frac{X_2 - 12.8}{3.7}$$ $$X_2 = 12.8 + (1.28)(3.7) = 12.8 + 4.74 = 17.54 \text{ minutes}$$ ::: {.callout-important icon="💡"} ## Performance Classification System **Excellent (Top 10%):** Response time **< 8.06 minutes** → Serve as model programs for improvement initiatives **Acceptable (Middle 80%):** Response time **8.06 - 17.54 minutes** → Meet standard performance expectations **Needs Improvement (Bottom 10%):** Response time **> 17.54 minutes** → Receive training, resources, and mentorship from excellent departments **Implementation strategy:** 1. Pair each "needs improvement" department with "excellent" mentor 2. Analyze best practices: dispatch protocols, routing algorithms, staffing 3. Set 12-month improvement target: reduce times by 20% 4. Monthly progress reviews with state commission **Expected impact:** If bottom 10% improve to average (12.8 min), estimated **45 lives saved annually** statewide. ::: ## 5.13 Normal Approximation to the Binomial Distribution When **n is large**, calculating binomial probabilities becomes tedious: - Tables don't extend to large n - Formulas involve massive factorials (100!) - Computers can still struggle with extreme values **Solution:** Use normal distribution as approximation! ::: {.callout-tip icon="📚"} ## When to Use Normal Approximation **Requirements:** - $n\pi \geq 5$ (at least 5 expected successes) - $n(1-\pi) \geq 5$ (at least 5 expected failures) - π reasonably close to 0.50 (symmetric) **Formulas:** $$\mu = n\pi$$ $$\sigma = \sqrt{n\pi(1-\pi)}$$ **Continuity Correction Factor:** Because normal is continuous but binomial is discrete: - P(X = 10) → P(9.5 ≤ X ≤ 10.5) - P(X ≤ 10) → P(X ≤ 10.5) - P(X ≥ 10) → P(X ≥ 9.5) ::: ### Example 5.15: Labor Union Strike Vote ::: {.callout-note icon="💼" appearance="minimal"} ## Union Democracy **40%** of union members favor a strike. **15 members** selected randomly. **Question:** Probability exactly **10 members** support strike? **Compare:** Binomial (exact) vs Normal (approximation) ::: **Solution:** **Binomial (Exact) - From Table B:** $$P(X = 10 | n = 15, \pi = 0.40) = 0.0245$$ **Normal Approximation:** **Step 1: Check requirements** $$n\pi = 15(0.40) = 6 \geq 5 \quad ✓$$ $$n(1-\pi) = 15(0.60) = 9 \geq 5 \quad ✓$$ **Step 2: Calculate normal parameters** $$\mu = n\pi = 15(0.40) = 6$$ $$\sigma = \sqrt{15(0.40)(0.60)} = \sqrt{3.6} = 1.897$$ **Step 3: Apply continuity correction** P(X = 10) → P(9.5 ≤ X ≤ 10.5) **Step 4: Convert to Z-scores** $$Z_1 = \frac{9.5 - 6}{1.897} = 1.85 \quad \text{(Area = 0.4678)}$$ $$Z_2 = \frac{10.5 - 6}{1.897} = 2.37 \quad \text{(Area = 0.4911)}$$ **Step 5: Calculate probability** $$P(9.5 \leq X \leq 10.5) = 0.4911 - 0.4678 = 0.0233$$ ::: {.callout-important icon="💡"} ## Accuracy Comparison **Binomial (exact):** 0.0245 (2.45%) **Normal (approx):** 0.0233 (2.33%) **Difference:** 0.0012 (0.12 percentage points) **Error:** Only 4.9% relative error - excellent approximation! **When approximation improves:** - Larger n (n > 30) - π closer to 0.50 - Example: n = 100, π = 0.40 → error typically < 1% **When to stick with binomial:** - Small n (n < 10) - Extreme π (π < 0.10 or π > 0.90) - Software available (use exact calculation) ::: --- ## Problemas Resueltos (Solved Problems) ### Problema 1: Medical Device Reliability ::: {.callout-note icon="🔬" appearance="minimal"} A medical device manufacturer produces pacemakers with a **0.3% defect rate** (π = 0.003). A hospital orders **n = 500 units**. **Questions:** a) What's the probability **at most 2** are defective? b) What's the expected number of defectives? c) Should the hospital inspect all units before implantation? ::: **Solution:** **a) P(X ≤ 2):** Use Poisson approximation (rare events) $$\mu = n\pi = 500(0.003) = 1.5 \text{ defectives}$$ **From Poisson Table (μ = 1.5):** - P(X = 0) = 0.2231 - P(X = 1) = 0.3347 - P(X = 2) = 0.2510 $$P(X \leq 2) = 0.2231 + 0.3347 + 0.2510 = 0.8088$$ **80.88% probability** of 2 or fewer defects **b) Expected defectives:** $$E(X) = \mu = 1.5 \text{ units}$$ **c) Risk assessment:** **P(X ≥ 1)** = 1 - P(X = 0) = 1 - 0.2231 = **0.7769** (77.7% chance of at least 1 defect) **Recommendation:** ✅ **YES, inspect all units** - Pacemaker failure = life-threatening - 77.7% chance of defect in order is unacceptable risk - Cost of inspection << cost of patient death/lawsuit --- ### Problema 2: Quality Control - Hypergeometric Application ::: {.callout-note icon="🏭" appearance="minimal"} A shipment contains **N = 50 electronic components**. Unknown to the buyer, **r = 8 are defective**. Quality inspector randomly selects **n = 5 units** for testing. **Find:** Probability of finding **exactly 2 defectives** in sample ::: **Solution:** $$P(X = 2) = \frac{{}_8C_2 \times {}_{42}C_3}{{}_50C_5}$$ **Step 1: Calculate combinations** $${}_8C_2 = \frac{8!}{2!6!} = \frac{8 \times 7}{2} = 28$$ $${}_{42}C_3 = \frac{42!}{3!39!} = \frac{42 \times 41 \times 40}{6} = 11,480$$ $${}_50C_5 = \frac{50!}{5!45!} = 2,118,760$$ **Step 2: Calculate probability** $$P(X = 2) = \frac{28 \times 11,480}{2,118,760} = \frac{321,440}{2,118,760} = 0.1517$$ **15.17% chance** of finding exactly 2 defectives **Expected defectives in sample:** $$E(X) = n \times \frac{r}{N} = 5 \times \frac{8}{50} = 0.8 \text{ units}$$ **Quality decision:** Finding 2 defectives (vs expected 0.8) suggests shipment quality is **worse than average** → Reject entire shipment --- ### Problema 3: Customer Service Call Center ::: {.callout-note icon="📞" appearance="minimal"} Calls arrive at rate of **μ = 18 per hour** (Poisson). **Management questions:** a) Probability of **exactly 25 calls** in next hour? b) Probability of **15-20 calls** (inclusive)? c) What's the probability **next call arrives within 2 minutes**? (Exponential) ::: **Solution:** **a) P(X = 25 | μ = 18):** **From Poisson Table:** μ = 18, X = 25 → **0.0201** (2.01%) **b) P(15 ≤ X ≤ 20):** **From Cumulative Poisson Table:** - P(X ≤ 20) = 0.8355 - P(X ≤ 14) = 0.3518 $$P(15 \leq X \leq 20) = 0.8355 - 0.3518 = 0.4837$$ **48.37%** probability (nearly half the time) **c) Exponential - Time until next call:** μ = 18 calls/hour = 18 calls/60 min = **0.30 calls/minute** $$P(X \leq 2 \text{ min}) = 1 - e^{-\mu t} = 1 - e^{-(0.30)(2)} = 1 - e^{-0.6}$$ $$= 1 - 0.5488 = 0.4512$$ **45.12% probability** next call within 2 minutes **Average wait:** $E(X) = \frac{1}{\mu} = \frac{1}{0.30} = 3.33$ minutes --- ### Problema 4: Manufacturing Tolerance Analysis ::: {.callout-note icon="⚙️" appearance="minimal"} Steel bolts have **normally distributed** lengths: - **Mean:** μ = 5.00 cm - **Standard deviation:** σ = 0.12 cm **Specifications:** 4.85 cm - 5.15 cm (anything outside is scrap) **Calculate:** a) Percentage within specs b) Scrap rate c) New σ needed to achieve 99% within specs ::: **Solution:** **a) P(4.85 ≤ X ≤ 5.15):** $$Z_1 = \frac{4.85 - 5.00}{0.12} = -1.25 \quad \text{(Area = 0.3944)}$$ $$Z_2 = \frac{5.15 - 5.00}{0.12} = 1.25 \quad \text{(Area = 0.3944)}$$ $$P(4.85 \leq X \leq 5.15) = 0.3944 + 0.3944 = 0.7888$$ **78.88%** within specifications **b) Scrap rate:** $$\text{Scrap} = 1 - 0.7888 = 0.2112$$ **21.12% scrap rate** (unacceptable! Target: < 1%) **c) Find σ for 99% within specs:** **Target:** P(4.85 ≤ X ≤ 5.15) = 0.99 → Each tail = 0.005 → Area from mean to spec limit = 0.495 **From Table E:** Area 0.4950 → Z = 2.58 $$2.58 = \frac{5.15 - 5.00}{\sigma}$$ $$\sigma = \frac{0.15}{2.58} = 0.0581 \text{ cm}$$ **Recommendation:** Reduce variability from σ = 0.12 to **σ = 0.058** (51% reduction) → Requires process improvement: better machines, training, quality control --- ## Lista de Fórmulas (Formula Reference) ### Discrete Distribution Fundamentals **Expected Value (Mean):** $$\mu = E(X) = \sum [x_i \cdot P(x_i)]$$ **Variance:** $$\sigma^2 = \sum [(x_i - \mu)^2 \cdot P(x_i)]$$ **Standard Deviation:** $$\sigma = \sqrt{\sigma^2}$$ ### Binomial Distribution **Probability Formula:** $$P(X = x) = {}_nC_x \cdot \pi^x \cdot (1-\pi)^{n-x}$$ **Combinations:** $${}_nC_x = \frac{n!}{x!(n-x)!}$$ **Mean:** $$\mu = n\pi$$ **Variance:** $$\sigma^2 = n\pi(1-\pi)$$ **Standard Deviation:** $$\sigma = \sqrt{n\pi(1-\pi)}$$ ### Hypergeometric Distribution **Probability Formula:** $$P(X = x) = \frac{{}_rC_x \times {}_{N-r}C_{n-x}}{{}_NC_n}$$ **Mean:** $$\mu = n \cdot \frac{r}{N}$$ ### Poisson Distribution **Probability Formula:** $$P(X = x) = \frac{\mu^x \cdot e^{-\mu}}{x!}$$ **Mean:** $$\mu = \lambda t \quad \text{(rate × time)}$$ **Variance:** $$\sigma^2 = \mu$$ **Standard Deviation:** $$\sigma = \sqrt{\mu}$$ ### Exponential Distribution **Cumulative Probability:** $$P(X \leq t) = 1 - e^{-\mu t}$$ **Mean:** $$E(X) = \frac{1}{\mu}$$ **Variance:** $$\sigma^2 = \frac{1}{\mu^2}$$ ### Uniform Distribution **Mean:** $$\mu = \frac{a + b}{2}$$ **Variance:** $$\sigma^2 = \frac{(b-a)^2}{12}$$ **Probability:** $$P(X_1 \leq X \leq X_2) = \frac{X_2 - X_1}{b - a}$$ **Probability Density:** $$f(x) = \frac{1}{b-a}$$ ### Normal Distribution **Z-Score Transformation:** $$Z = \frac{X - \mu}{\sigma}$$ **Inverse (Finding X from Z):** $$X = \mu + Z\sigma$$ **Normal Approximation to Binomial:** $$\mu = n\pi$$ $$\sigma = \sqrt{n\pi(1-\pi)}$$ **Continuity Correction:** - P(X = a) → P(a - 0.5 ≤ X ≤ a + 0.5) - P(X ≤ a) → P(X ≤ a + 0.5) - P(X ≥ a) → P(X ≥ a - 0.5) ### Empirical Rule (Normal Distributions Only) - **68.3%** within μ ± 1σ - **95.5%** within μ ± 2σ - **99.7%** within μ ± 3σ --- ## Chapter Summary This chapter introduced the major probability distributions used in business statistics: **Discrete Distributions:** - **Binomial:** Fixed trials, constant probability, independent (credit approvals, quality sampling) - **Hypergeometric:** Sampling without replacement from finite population (discrimination cases, shipment inspection) - **Poisson:** Rare events over time/space (customer arrivals, defects, accidents) **Continuous Distributions:** - **Exponential:** Time between events (taxi arrivals, equipment failure, service times) - **Uniform:** Equally likely outcomes (random number generation, arrival times) - **Normal:** The crown jewel - heights, weights, IQ, financial returns, measurement error **Key Skills Mastered:** ✓ Calculate probabilities using distribution formulas and tables ✓ Convert normal distributions to standard normal (Z-scores) ✓ Find probabilities from Z-scores and vice versa ✓ Apply continuity correction for normal approximation to binomial ✓ Select appropriate distribution based on business context **Next Chapter:** We build on these foundations to explore **sampling distributions** - the bridge between probability theory and statistical inference! --- **Next Chapter:** [Sampling Distributions](06-sampling-distributions.qmd)