P-Values Explained

Visual guide to understanding p-values and what they really mean.


What is a P-Value?

P-value: The probability of observing data at least as extreme as what we observed, assuming the null hypothesis is true.

$$ p = P(\text{data as extreme or more} \mid H_0 \text{ is true}) $$


Visual Intuition

The p-value represents the probability of observing data as extreme or more extreme than what we observed, assuming the null hypothesis is true.

Conceptual View:

  • Null Distribution: The expected distribution if H₀ is true
  • Observed Value: Your actual measurement
  • P-value: The area in the tail beyond your observed value
  • α threshold: Commonly 0.05 (5%)

Interpretation:

  • Small p-value (< 0.05): Data is unlikely under H₀ → Evidence against H₀
  • Large p-value (≥ 0.05): Data is consistent with H₀ → Insufficient evidence against H₀

Interactive P-Value Calculator


Common Misconceptions

❌ Wrong: "P-value is the probability that H₀ is true"

No! P-value assumes H₀ is true and calculates probability of data.

❌ Wrong: "P < 0.05 means the result is important"

No! Statistical significance ≠ practical significance.

❌ Wrong: "P > 0.05 proves H₀ is true"

No! Failing to reject H₀ doesn't prove it's true (absence of evidence ≠ evidence of absence).

✅ Correct: "P-value measures compatibility of data with H₀"

Small p-value = data is surprising if H₀ were true.


Significance Levels


Example: Coin Flip Test

 1import numpy as np
 2from scipy import stats
 3
 4# Observed: 65 heads out of 100 flips
 5# H₀: Coin is fair (p = 0.5)
 6# H₁: Coin is biased (p ≠ 0.5)
 7
 8n = 100
 9observed_heads = 65
10expected = 0.5
11
12# Binomial test
13p_value = stats.binom_test(observed_heads, n, expected, alternative='two-sided')
14print(f"P-value: {p_value:.4f}")
15
16# Interpretation
17if p_value < 0.05:
18    print("Reject H₀: Coin appears biased")
19else:
20    print("Fail to reject H₀: Coin appears fair")

P-Value vs Effect Size

Statistical significance (p-value) and practical significance (effect size) are different concepts:

Effect SizeP-value < 0.05P-value ≥ 0.05
Large✅ Significant & Important
Best case
⚠️ Not significant but Important
May need more data
Small⚠️ Significant but Not Important
Large sample artifact
❌ Not significant & Not Important
No evidence

Key Insight: With large enough samples, even tiny (unimportant) effects can be statistically significant!


Key Takeaways

  1. P-value ≠ Probability H₀ is true

    • It's P(data | H₀), not P(H₀ | data)
  2. Threshold α = 0.05 is arbitrary

    • Not a magical boundary
    • Consider context and field standards
  3. Statistical ≠ Practical significance

    • Small effects can be "significant" with large samples
    • Large effects can be "non-significant" with small samples
  4. P-values are continuous

    • Don't just report "p < 0.05"
    • Report exact p-value
  5. Multiple comparisons problem

    • More tests = higher chance of false positives
    • Use corrections (Bonferroni, FDR)

Relationship to Confidence Intervals

1If 95% CI excludes H₀ value → p < 0.05
2If 95% CI includes H₀ value → p ≥ 0.05

Confidence intervals provide more information than p-values alone!


Further Reading

Related Snippets