Statistics Interview Questions - Medium

Dec 15, 2025 · 13 min read · statistics interview medium hypothesis-testing regression inference ·

Share on:

Medium-level statistics interview questions covering hypothesis testing, regression analysis, and statistical inference.

Q1: Explain hypothesis testing: null hypothesis, alternative hypothesis, p-value, and significance level.

Answer:

Key Concepts

Null Hypothesis (H₀): Statement being tested (usually "no effect" or "no difference")
Alternative Hypothesis (H₁): What we want to prove (there is an effect)
p-value: Probability of observing data as extreme or more extreme, assuming H₀ is true
Significance Level (α): Threshold for rejecting H₀ (typically 0.05)

Decision Rule

If p-value < α: Reject H₀ (statistically significant)
If p-value ≥ α: Fail to reject H₀ (not significant)

Python Example

 1import numpy as np
 2from scipy import stats
 3
 4# Example: Testing if a new drug is more effective
 5# H₀: μ_new - μ_old = 0 (no difference)
 6# H₁: μ_new - μ_old > 0 (new drug better)
 7
 8np.random.seed(42)
 9# Old drug: mean recovery time 10 days
10old_drug = np.random.normal(10, 2, 30)
11
12# New drug: mean recovery time 8.5 days (actually better)
13new_drug = np.random.normal(8.5, 2, 30)
14
15# One-sample t-test: test if new drug mean is significantly less than 10
16t_stat, p_value = stats.ttest_1samp(new_drug, 10, alternative='less')
17
18alpha = 0.05
19print(f"Null hypothesis (H₀): Mean recovery time = 10 days")
20print(f"Alternative (H₁): Mean recovery time < 10 days")
21print(f"\nSample mean: {np.mean(new_drug):.2f} days")
22print(f"t-statistic: {t_stat:.3f}")
23print(f"p-value: {p_value:.4f}")
24print(f"Significance level (α): {alpha}")
25
26if p_value < alpha:
27    print(f"\nDecision: Reject H₀ (p < α)")
28    print("Conclusion: New drug is significantly more effective")
29else:
30    print(f"\nDecision: Fail to reject H₀ (p ≥ α)")
31    print("Conclusion: No significant evidence new drug is better")
32
33# Two-sample t-test (comparing means)
34t_stat_2, p_value_2 = stats.ttest_ind(new_drug, old_drug, alternative='less')
35print(f"\nTwo-sample test:")
36print(f"t-statistic: {t_stat_2:.3f}")
37print(f"p-value: {p_value_2:.4f}")

Thinking Process: Hypothesis testing is about finding evidence against the null. Low p-value means data is unlikely under H₀, suggesting H₁ might be true. Never "accept" H₀, only "fail to reject" - absence of evidence ≠ evidence of absence.

Q2: What are Type I and Type II errors? How do they relate to statistical power?

Answer:

Error Types

Type I Error (α): Rejecting H₀ when it's true (false positive)
Type II Error (β): Failing to reject H₀ when it's false (false negative)
Power (1-β): Probability of correctly rejecting H₀ when it's false

Trade-offs

Decreasing α (Type I error) increases β (Type II error)
Increasing sample size increases power
Larger effect sizes are easier to detect (higher power)

Python Example

 1import numpy as np
 2from scipy import stats
 3import matplotlib.pyplot as plt
 4
 5# Simulate hypothesis testing with known truth
 6np.random.seed(42)
 7alpha = 0.05
 8n_trials = 1000
 9
10# Scenario 1: H₀ is true (no effect)
11null_true = []
12for _ in range(n_trials):
13    sample = np.random.normal(0, 1, 30)  # True mean = 0
14    _, p_val = stats.ttest_1samp(sample, 0)
15    null_true.append(p_val < alpha)
16
17type1_error_rate = np.mean(null_true)
18print(f"Type I Error Rate (α = {alpha}): {type1_error_rate:.3f}")
19print(f"Expected: ~{alpha}")
20
21# Scenario 2: H₁ is true (effect exists)
22alt_true = []
23true_effect = 0.5  # Actual difference
24for _ in range(n_trials):
25    sample = np.random.normal(true_effect, 1, 30)
26    _, p_val = stats.ttest_1samp(sample, 0)
27    alt_true.append(p_val < alpha)
28
29power = np.mean(alt_true)
30type2_error_rate = 1 - power
31print(f"\nType II Error Rate (β): {type2_error_rate:.3f}")
32print(f"Statistical Power (1-β): {power:.3f}")
33
34# Effect of sample size on power
35sample_sizes = [10, 20, 30, 50, 100]
36powers = []
37for n in sample_sizes:
38    rejections = 0
39    for _ in range(n_trials):
40        sample = np.random.normal(true_effect, 1, n)
41        _, p_val = stats.ttest_1samp(sample, 0)
42        if p_val < alpha:
43            rejections += 1
44    powers.append(rejections / n_trials)
45
46print(f"\nPower vs Sample Size:")
47for n, pwr in zip(sample_sizes, powers):
48    print(f"  n={n:3d}: Power = {pwr:.3f}")
49
50# Effect of effect size on power
51effect_sizes = [0.1, 0.2, 0.5, 1.0]
52powers_effect = []
53for effect in effect_sizes:
54    rejections = 0
55    for _ in range(n_trials):
56        sample = np.random.normal(effect, 1, 30)
57        _, p_val = stats.ttest_1samp(sample, 0)
58        if p_val < alpha:
59            rejections += 1
60    powers_effect.append(rejections / n_trials)
61
62print(f"\nPower vs Effect Size (n=30):")
63for effect, pwr in zip(effect_sizes, powers_effect):
64    print(f"  Effect={effect:.1f}: Power = {pwr:.3f}")

Thinking Process: Type I and Type II errors are inversely related for fixed sample size. To increase power without increasing Type I error, need larger sample size or larger effect size. Power analysis helps determine adequate sample size before study.

Q3: Explain linear regression assumptions and how to check them.

Answer:

Assumptions

Linearity: Relationship between X and Y is linear
Independence: Observations are independent (no autocorrelation)
Homoscedasticity: Constant variance of residuals across X values
Normality: Residuals are normally distributed

Diagnostics

Residual plots: Check linearity and homoscedasticity
Q-Q plots: Check normality
Durbin-Watson test: Check independence
Leverage/Cook's distance: Check for influential points

Python Example

 1import numpy as np
 2import matplotlib.pyplot as plt
 3from scipy import stats
 4from sklearn.linear_model import LinearRegression
 5from sklearn.metrics import r2_score
 6
 7np.random.seed(42)
 8
 9# Generate data that violates assumptions
10# True relationship: y = 2 + 3x + error
11x = np.linspace(0, 10, 100)
12# Heteroscedastic errors (violates assumption)
13error = np.random.normal(0, 0.5 + 0.3*x, len(x))
14y_true = 2 + 3*x
15y = y_true + error
16
17# Fit linear regression
18X = x.reshape(-1, 1)
19model = LinearRegression()
20model.fit(X, y)
21y_pred = model.predict(X)
22
23# Calculate residuals
24residuals = y - y_pred
25
26print(f"R²: {r2_score(y, y_pred):.3f}")
27print(f"Coefficients: β₀={model.intercept_:.2f}, β₁={model.coef_[0]:.2f}")
28
29# Check assumptions
30
31# 1. Linearity: Plot residuals vs fitted
32plt.figure(figsize=(12, 8))
33
34plt.subplot(2, 2, 1)
35plt.scatter(y_pred, residuals, alpha=0.6)
36plt.axhline(y=0, color='r', linestyle='--')
37plt.xlabel('Fitted Values')
38plt.ylabel('Residuals')
39plt.title('Residuals vs Fitted (Linearity & Homoscedasticity)')
40plt.grid(True, alpha=0.3)
41
42# 2. Normality: Q-Q plot
43plt.subplot(2, 2, 2)
44stats.probplot(residuals, dist="norm", plot=plt)
45plt.title('Q-Q Plot (Normality Check)')
46plt.grid(True, alpha=0.3)
47
48# 3. Homoscedasticity: Scale-location plot
49plt.subplot(2, 2, 3)
50sqrt_abs_residuals = np.sqrt(np.abs(residuals))
51plt.scatter(y_pred, sqrt_abs_residuals, alpha=0.6)
52plt.xlabel('Fitted Values')
53plt.ylabel('√|Standardized Residuals|')
54plt.title('Scale-Location Plot (Homoscedasticity)')
55plt.grid(True, alpha=0.3)
56
57# 4. Statistical tests
58# Normality test (Shapiro-Wilk)
59shapiro_stat, shapiro_p = stats.shapiro(residuals)
60print(f"\nNormality test (Shapiro-Wilk):")
61print(f"  p-value: {shapiro_p:.4f}")
62if shapiro_p < 0.05:
63    print("  Violation: Residuals not normally distributed")
64else:
65    print("  OK: Residuals appear normally distributed")
66
67# Homoscedasticity: Breusch-Pagan test (simplified)
68from scipy.stats import chi2
69n = len(residuals)
70bp_stat = n * r2_score(y_pred, residuals**2)
71bp_p = 1 - chi2.cdf(bp_stat, 1)
72print(f"\nHomoscedasticity test (Breusch-Pagan):")
73print(f"  p-value: {bp_p:.4f}")
74if bp_p < 0.05:
75    print("  Violation: Heteroscedasticity detected")
76else:
77    print("  OK: Homoscedasticity assumption holds")
78
79plt.tight_layout()
80plt.show()
81
82# Remedies
83print("\nPotential remedies:")
84print("- Non-linearity: Transform variables, add polynomial terms")
85print("- Heteroscedasticity: Transform Y, use weighted least squares")
86print("- Non-normality: Transform Y, use robust regression")
87print("- Dependencies: Use time series methods, mixed models")

Thinking Process: Violating assumptions affects validity of p-values and confidence intervals. Always check diagnostics after fitting model. Transformations or alternative methods may be needed if assumptions violated.

Q4: What is the Central Limit Theorem and why is it important?

Answer:

Statement

As sample size increases, the distribution of sample means approaches a normal distribution, regardless of the population distribution shape.

Key Points

Mean of sampling distribution = Population mean (μ)
Standard error = σ/√n (standard deviation of sample means)
Works even if population is not normal (with large enough n)

Why It Matters

Justifies using normal distribution for inference
Allows calculation of confidence intervals
Foundation for many statistical tests

Python Example

 1import numpy as np
 2import matplotlib.pyplot as plt
 3from scipy import stats
 4
 5np.random.seed(42)
 6
 7# Non-normal population: Exponential distribution
 8population = np.random.exponential(scale=2, size=10000)
 9true_mean = 2  # Mean of exponential is scale parameter
10
11print(f"Population distribution: Exponential (mean={true_mean})")
12print(f"Population is NOT normal (right-skewed)")
13
14# Simulate sampling distribution of means
15sample_sizes = [5, 10, 30, 50]
16n_samples = 1000
17
18fig, axes = plt.subplots(2, 2, figsize=(12, 10))
19axes = axes.flatten()
20
21for idx, n in enumerate(sample_sizes):
22    sample_means = []
23    for _ in range(n_samples):
24        sample = np.random.choice(population, size=n, replace=False)
25        sample_means.append(np.mean(sample))
26    
27    sample_means = np.array(sample_means)
28    
29    # Theoretical CLT predictions
30    theoretical_mean = true_mean
31    theoretical_std = np.std(population) / np.sqrt(n)
32    
33    # Plot histogram
34    axes[idx].hist(sample_means, bins=30, density=True, alpha=0.7, label='Sample Means')
35    
36    # Overlay normal distribution
37    x = np.linspace(min(sample_means), max(sample_means), 100)
38    normal_pdf = stats.norm.pdf(x, loc=theoretical_mean, scale=theoretical_std)
39    axes[idx].plot(x, normal_pdf, 'r-', linewidth=2, label='Normal (CLT)')
40    
41    axes[idx].axvline(theoretical_mean, color='green', linestyle='--', label=f'True Mean={theoretical_mean}')
42    axes[idx].set_title(f'n = {n}')
43    axes[idx].set_xlabel('Sample Mean')
44    axes[idx].set_ylabel('Density')
45    axes[idx].legend()
46    axes[idx].grid(True, alpha=0.3)
47    
48    # Compare empirical vs theoretical
49    empirical_mean = np.mean(sample_means)
50    empirical_std = np.std(sample_means)
51    
52    print(f"\nn = {n}:")
53    print(f"  Empirical mean: {empirical_mean:.3f} (theoretical: {theoretical_mean:.3f})")
54    print(f"  Empirical std: {empirical_std:.3f} (theoretical: {theoretical_std:.3f})")
55    
56    # Normality test
57    stat, p_val = stats.shapiro(sample_means)
58    print(f"  Normality test p-value: {p_val:.4f}", end="")
59    if p_val > 0.05:
60        print(" (approximately normal ✓)")
61    else:
62        print(" (not normal yet)")
63
64plt.suptitle('Central Limit Theorem: Sample Means Approach Normal Distribution', fontsize=14)
65plt.tight_layout()
66plt.show()
67
68# Practical application: Confidence interval
69n = 30
70sample = np.random.choice(population, size=n, replace=False)
71sample_mean = np.mean(sample)
72sample_std = np.std(sample, ddof=1)
73
74# 95% CI using CLT (t-distribution for small n)
75t_critical = stats.t.ppf(0.975, df=n-1)
76margin_error = t_critical * (sample_std / np.sqrt(n))
77ci_lower = sample_mean - margin_error
78ci_upper = sample_mean + margin_error
79
80print(f"\nPractical application: 95% Confidence Interval")
81print(f"Sample mean: {sample_mean:.3f}")
82print(f"95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")
83print(f"True mean ({true_mean:.3f}) is in CI: {ci_lower <= true_mean <= ci_upper}")

Thinking Process: CLT is fundamental because it allows inference even when population distribution is unknown. Larger samples → better normal approximation. For small samples from non-normal populations, may need non-parametric methods or bootstrap.

Q5: Explain Bayes' theorem and its application in statistics.

Answer:

Formula

$$P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$$

Where:

P(A|B): Posterior probability (what we want)
P(B|A): Likelihood (probability of evidence given hypothesis)
P(A): Prior probability (initial belief)
P(B): Evidence/Normalizing constant

Applications

Medical diagnosis
Spam filtering
A/B testing
Machine learning (Naive Bayes)

Python Example

 1import numpy as np
 2from scipy import stats
 3
 4# Example: Medical diagnosis
 5# Disease prevalence: 1% of population
 6# Test sensitivity: 95% (P(positive|disease))
 7# Test specificity: 90% (P(negative|no disease))
 8
 9# Prior
10P_disease = 0.01
11P_no_disease = 1 - P_disease
12
13# Likelihoods
14P_positive_given_disease = 0.95
15P_negative_given_disease = 1 - P_positive_given_disease
16P_negative_given_no_disease = 0.90
17P_positive_given_no_disease = 1 - P_negative_given_no_disease
18
19# Evidence (marginal probability)
20P_positive = (P_positive_given_disease * P_disease + 
21              P_positive_given_no_disease * P_no_disease)
22
23# Bayes' theorem: P(disease|positive)
24P_disease_given_positive = (P_positive_given_disease * P_disease) / P_positive
25
26print("Medical Diagnosis Example:")
27print(f"Prior: P(disease) = {P_disease:.3f} (1%)")
28print(f"Likelihood: P(positive|disease) = {P_positive_given_disease:.3f}")
29print(f"Evidence: P(positive) = {P_positive:.3f}")
30print(f"\nPosterior: P(disease|positive) = {P_disease_given_positive:.3f} ({P_disease_given_positive*100:.1f}%)")
31print(f"\nKey insight: Despite 95% sensitivity, only {P_disease_given_positive*100:.1f}%")
32print(f"of positive tests indicate actual disease (low prevalence)!")
33
34# Bayesian updating with multiple tests
35print("\n" + "="*50)
36print("Bayesian Updating: Multiple Tests")
37
38# First test positive
39prior = P_disease
40posterior_1 = (P_positive_given_disease * prior) / P_positive
41print(f"After 1st positive test: {posterior_1:.3f}")
42
43# Second test positive (using first posterior as new prior)
44P_positive_2 = (P_positive_given_disease * posterior_1 + 
45                P_positive_given_no_disease * (1 - posterior_1))
46posterior_2 = (P_positive_given_disease * posterior_1) / P_positive_2
47print(f"After 2nd positive test: {posterior_2:.3f}")
48
49# Compare to naive approach (wrong)
50naive_estimate = 1 - (1 - P_positive_given_disease)**2
51print(f"\nNaive approach (wrong): {naive_estimate:.3f}")
52print("This ignores prior probability!")
53
54# A/B Testing Example
55print("\n" + "="*50)
56print("Bayesian A/B Testing Example")
57
58# Prior: uniform (no preference)
59prior_A = 0.5
60prior_B = 0.5
61
62# Observed data: A converts 45/100, B converts 55/100
63conversions_A = 45
64trials_A = 100
65conversions_B = 55
66trials_B = 100
67
68# Beta distribution (conjugate prior for binomial)
69# Posterior is Beta(alpha + conversions, beta + non-conversions)
70from scipy.stats import beta
71
72# Prior: Beta(1,1) - uniform
73alpha_prior, beta_prior = 1, 1
74
75# Posteriors
76alpha_A = alpha_prior + conversions_A
77beta_A = beta_prior + (trials_A - conversions_A)
78alpha_B = alpha_prior + conversions_B
79beta_B = beta_prior + (trials_B - conversions_B)
80
81# Probability that B > A
82# Sample from posteriors and compare
83n_samples = 10000
84samples_A = beta.rvs(alpha_A, beta_A, size=n_samples)
85samples_B = beta.rvs(alpha_B, beta_B, size=n_samples)
86prob_B_better = np.mean(samples_B > samples_A)
87
88print(f"Probability that B is better than A: {prob_B_better:.3f}")
89print(f"Expected conversion rate A: {alpha_A/(alpha_A+beta_A):.3f}")
90print(f"Expected conversion rate B: {alpha_B/(alpha_B+beta_B):.3f}")
91
92# Credible interval (Bayesian confidence interval)
93ci_A = beta.interval(0.95, alpha_A, beta_A)
94ci_B = beta.interval(0.95, alpha_B, beta_B)
95print(f"\n95% Credible Interval A: [{ci_A[0]:.3f}, {ci_A[1]:.3f}]")
96print(f"95% Credible Interval B: [{ci_B[0]:.3f}, {ci_B[1]:.3f}]")

Thinking Process: Bayes' theorem quantifies how evidence updates beliefs. Prior matters - base rates are crucial. Bayesian approach provides probability of hypothesis (more intuitive than p-values) and naturally incorporates prior knowledge.

Q6: What is statistical power and how do you calculate sample size needed?

Answer:

Definition

Statistical Power = Probability of correctly rejecting H₀ when H₁ is true (1 - Type II error rate)

Factors Affecting Power

Effect size: Larger effects → higher power
Sample size: Larger n → higher power
Significance level: Higher α → higher power (but more Type I errors)
Variability: Lower σ → higher power

Sample Size Calculation

For two-sample t-test: $$n = \frac{2(z_{\alpha/2} + z_{\beta})^2 \sigma^2}{\delta^2}$$

Where:

δ = effect size (difference in means)
σ = standard deviation
α = significance level
β = Type II error rate (power = 1-β)

Python Example

  1import numpy as np
  2from scipy import stats
  3
  4# Sample size calculation function
  5def calculate_sample_size(effect_size, std_dev, alpha=0.05, power=0.80):
  6    """
  7    Calculate sample size for two-sample t-test.
  8    
  9    Parameters:
 10    - effect_size: Minimum detectable difference (δ)
 11    - std_dev: Standard deviation (σ)
 12    - alpha: Significance level (default 0.05)
 13    - power: Desired power (default 0.80, so β=0.20)
 14    """
 15    z_alpha = stats.norm.ppf(1 - alpha/2)  # Two-tailed
 16    z_beta = stats.norm.ppf(power)
 17    
 18    n = 2 * ((z_alpha + z_beta)**2 * std_dev**2) / effect_size**2
 19    return int(np.ceil(n))
 20
 21# Example: Clinical trial
 22# Want to detect 5-point difference in test scores
 23# Standard deviation: 10 points
 24# Significance: 0.05, Power: 0.80
 25
 26effect_size = 5
 27std_dev = 10
 28alpha = 0.05
 29power = 0.80
 30
 31n_required = calculate_sample_size(effect_size, std_dev, alpha, power)
 32print(f"Sample size calculation:")
 33print(f"Effect size (δ): {effect_size}")
 34print(f"Standard deviation (σ): {std_dev}")
 35print(f"Significance level (α): {alpha}")
 36print(f"Power (1-β): {power}")
 37print(f"\nRequired sample size per group: {n_required}")
 38print(f"Total sample size: {n_required * 2}")
 39
 40# Verify with simulation
 41print("\nVerifying with simulation:")
 42n_simulations = 1000
 43rejections = 0
 44
 45for _ in range(n_simulations):
 46    group1 = np.random.normal(100, std_dev, n_required)
 47    group2 = np.random.normal(100 + effect_size, std_dev, n_required)  # True effect exists
 48    t_stat, p_val = stats.ttest_ind(group2, group1, alternative='greater')
 49    if p_val < alpha:
 50        rejections += 1
 51
 52empirical_power = rejections / n_simulations
 53print(f"Empirical power: {empirical_power:.3f} (target: {power})")
 54
 55# Power analysis: How power changes with sample size
 56sample_sizes = [10, 20, 30, 50, 64, 100]
 57powers = []
 58
 59for n in sample_sizes:
 60    rejections = 0
 61    for _ in range(n_simulations):
 62        group1 = np.random.normal(100, std_dev, n)
 63        group2 = np.random.normal(100 + effect_size, std_dev, n)
 64        _, p_val = stats.ttest_ind(group2, group1, alternative='greater')
 65        if p_val < alpha:
 66            rejections += 1
 67    powers.append(rejections / n_simulations)
 68
 69print(f"\nPower vs Sample Size (effect={effect_size}, σ={std_dev}):")
 70for n, pwr in zip(sample_sizes, powers):
 71    print(f"  n={n:3d} per group: Power = {pwr:.3f}")
 72
 73# Power analysis: How power changes with effect size
 74effect_sizes_test = [2, 3, 5, 7, 10]
 75powers_effect = []
 76n_fixed = 30
 77
 78for effect in effect_sizes_test:
 79    rejections = 0
 80    for _ in range(n_simulations):
 81        group1 = np.random.normal(100, std_dev, n_fixed)
 82        group2 = np.random.normal(100 + effect, std_dev, n_fixed)
 83        _, p_val = stats.ttest_ind(group2, group1, alternative='greater')
 84        if p_val < alpha:
 85            rejections += 1
 86    powers_effect.append(rejections / n_simulations)
 87
 88print(f"\nPower vs Effect Size (n={n_fixed} per group):")
 89for effect, pwr in zip(effect_sizes_test, powers_effect):
 90    print(f"  Effect={effect:2d}: Power = {pwr:.3f}")
 91
 92# Using statsmodels for power analysis
 93try:
 94    from statsmodels.stats.power import TTestIndPower
 95    
 96    analysis = TTestIndPower()
 97    effect_size_cohen = effect_size / std_dev  # Cohen's d
 98    n_required_statsmodels = analysis.solve_power(
 99        effect_size=effect_size_cohen,
100        alpha=alpha,
101        power=power,
102        ratio=1.0,
103        alternative='larger'
104    )
105    print(f"\nUsing statsmodels: n = {int(np.ceil(n_required_statsmodels))} per group")
106except ImportError:
107    print("\nInstall statsmodels for additional power analysis functions")

Thinking Process: Power analysis is crucial before study to ensure adequate sample size. Low power wastes resources and may miss real effects. Balance statistical power with practical constraints (cost, time). Consider effect size that's clinically/practically meaningful, not just statistically significant.

These concepts are essential for designing studies, analyzing data, and making valid statistical inferences.

Related Snippets

Statistics Interview Questions - Easy
Easy-level statistics interview questions covering fundamental concepts, …
Statistics Interview Questions - Hard
Hard-level statistics interview questions covering advanced inference, multiple …