Percentiles and Quantiles
Visual guide to percentiles, quartiles, and quantiles for understanding data distribution.
What are Percentiles?
Percentile: The value below which a given percentage of observations fall.
- 50th percentile = Median (half the data is below)
- 25th percentile = First quartile (Q1)
- 75th percentile = Third quartile (Q3)
Key Concept: The shaded area under a distribution curve up to a percentile line represents the percentage of data below that value.
Box Plot Representation
A box plot (box-and-whisker plot) visualizes the five-number summary and quartiles.
1 Min Q1 Median Q3 Max
2 | | | | |
3 |-----------|===========|==========|-----------|
4 ← IQR →
5 (middle 50%)
Components:
- Box: Contains middle 50% of data (from Q1 to Q3)
- Line inside box: Median (Q2)
- Whiskers: Extend to minimum and maximum (or 1.5×IQR for outlier detection)
- IQR (Interquartile Range): Q3 - Q1 (measure of spread)
Interactive Percentile Calculator
Common Percentiles
| Percentile | Name | Meaning |
|---|---|---|
| 0th | Minimum | Smallest value |
| 25th | Q1 (First Quartile) | 25% below, 75% above |
| 50th | Q2 (Median) | Middle value |
| 75th | Q3 (Third Quartile) | 75% below, 25% above |
| 100th | Maximum | Largest value |
Special Percentiles
- Deciles: 10th, 20th, ..., 90th (divide into 10 parts)
- Quintiles: 20th, 40th, 60th, 80th (divide into 5 parts)
- Quartiles: 25th, 50th, 75th (divide into 4 parts)
Interquartile Range (IQR)
$$ \text{IQR} = Q_3 - Q_1 $$
Use: Measure of spread that's robust to outliers.
Outlier Detection:
- Lower fence: $Q_1 - 1.5 \times \text{IQR}$
- Upper fence: $Q_3 + 1.5 \times \text{IQR}$
1import numpy as np
2
3data = np.random.normal(100, 15, 1000)
4
5# Calculate percentiles
6q1 = np.percentile(data, 25)
7q2 = np.percentile(data, 50) # median
8q3 = np.percentile(data, 75)
9
10iqr = q3 - q1
11
12# Outlier detection
13lower_fence = q1 - 1.5 * iqr
14upper_fence = q3 + 1.5 * iqr
15
16outliers = data[(data < lower_fence) | (data > upper_fence)]
17print(f"Found {len(outliers)} outliers")
Percentile Rank
Percentile Rank: The percentage of scores that fall below a given value.
$$ \text{Percentile Rank} = \frac{\text{Number of values below}}{\text{Total number of values}} \times 100 $$
1def percentile_rank(data, value):
2 """Calculate percentile rank of a value"""
3 below = sum(x < value for x in data)
4 return (below / len(data)) * 100
5
6# Example
7data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
8rank = percentile_rank(data, 55)
9print(f"55 is at the {rank}th percentile") # 50th percentile
Quantiles
Quantile: Generalization of percentiles.
- p-quantile: Value that divides data so proportion p is below it
- Percentiles are quantiles expressed as percentages
$$ Q(p) = \text{value such that } P(X \leq Q(p)) = p $$
1import numpy as np
2
3data = np.random.normal(100, 15, 1000)
4
5# Quantiles (0 to 1)
6q_25 = np.quantile(data, 0.25) # Same as 25th percentile
7q_50 = np.quantile(data, 0.50) # Median
8q_75 = np.quantile(data, 0.75) # Same as 75th percentile
9
10# Percentiles (0 to 100)
11p_25 = np.percentile(data, 25)
12p_50 = np.percentile(data, 50)
13p_75 = np.percentile(data, 75)
14
15# They're the same!
16assert q_25 == p_25
Applications
1. Growth Charts
Children's height/weight percentiles show where a child falls relative to peers.
2. Test Scores
SAT/GRE scores often reported as percentiles.
3. Income Distribution
Median income = 50th percentile of income distribution.
4. Performance Metrics
Website load time: "95th percentile < 2 seconds" means 95% of requests are faster than 2s.
5. Outlier Detection
Values beyond 1.5×IQR from quartiles are potential outliers.
Percentiles vs Mean
| Measure | Pros | Cons | Best For |
|---|---|---|---|
| Mean | Uses all data Mathematically convenient | Sensitive to outliers Can be misleading for skewed data | Symmetric distributions No outliers |
| Median (50th percentile) | Robust to outliers Represents "typical" value | Ignores magnitude of extremes Less efficient statistically | Skewed distributions Data with outliers |
Key Insight:
- Right-skewed data: Mean > Median (pulled right by high outliers)
- Left-skewed data: Mean < Median (pulled left by low outliers)
- Symmetric data: Mean ≈ Median
Example: Income distribution is right-skewed, so median income is more representative than mean income.
Further Reading
Related Snippets
- Bayes' Theorem & Applications
Bayesian inference and practical applications - Central Limit Theorem
Foundation of statistical inference - Common Probability Distributions
Normal, Binomial, Poisson, Exponential, Gamma, Pareto distributions - Monte Carlo Methods
Simulation and numerical integration - Null Hypothesis Testing
Understanding null hypothesis and hypothesis testing - P-Values Explained
Understanding p-values and statistical significance - Probability Basics
Fundamental probability concepts and rules - Random Variables
Expected value, variance, and moments - Statistical Moments
Mean, variance, skewness, and kurtosis explained