Percentiles and Quantiles

Dec 12, 2024 · 4 min read · statistics percentiles quantiles quartiles descriptive-statistics ·

Share on:

Visual guide to percentiles, quartiles, and quantiles for understanding data distribution.

What are Percentiles?

Percentile: The value below which a given percentage of observations fall.

50th percentile = Median (half the data is below)
25th percentile = First quartile (Q1)
75th percentile = Third quartile (Q3)

Key Concept: The shaded area under a distribution curve up to a percentile line represents the percentage of data below that value.

Box Plot Representation

A box plot (box-and-whisker plot) visualizes the five-number summary and quartiles.

1    Min          Q1        Median      Q3          Max
2     |           |           |          |           |
3     |-----------|===========|==========|-----------|
4                    ← IQR →
5                 (middle 50%)

Components:

Box: Contains middle 50% of data (from Q1 to Q3)
Line inside box: Median (Q2)
Whiskers: Extend to minimum and maximum (or 1.5×IQR for outlier detection)
IQR (Interquartile Range): Q3 - Q1 (measure of spread)

Interactive Percentile Calculator

Common Percentiles

Percentile	Name	Meaning
0th	Minimum	Smallest value
25th	Q1 (First Quartile)	25% below, 75% above
50th	Q2 (Median)	Middle value
75th	Q3 (Third Quartile)	75% below, 25% above
100th	Maximum	Largest value

Special Percentiles

Deciles: 10th, 20th, ..., 90th (divide into 10 parts)
Quintiles: 20th, 40th, 60th, 80th (divide into 5 parts)
Quartiles: 25th, 50th, 75th (divide into 4 parts)

Interquartile Range (IQR)

$$ \text{IQR} = Q_3 - Q_1 $$

Use: Measure of spread that's robust to outliers.

Outlier Detection:

Lower fence: $Q_1 - 1.5 \times \text{IQR}$
Upper fence: $Q_3 + 1.5 \times \text{IQR}$

 1import numpy as np
 2
 3data = np.random.normal(100, 15, 1000)
 4
 5# Calculate percentiles
 6q1 = np.percentile(data, 25)
 7q2 = np.percentile(data, 50)  # median
 8q3 = np.percentile(data, 75)
 9
10iqr = q3 - q1
11
12# Outlier detection
13lower_fence = q1 - 1.5 * iqr
14upper_fence = q3 + 1.5 * iqr
15
16outliers = data[(data < lower_fence) | (data > upper_fence)]
17print(f"Found {len(outliers)} outliers")

Percentile Rank

Percentile Rank: The percentage of scores that fall below a given value.

$$ \text{Percentile Rank} = \frac{\text{Number of values below}}{\text{Total number of values}} \times 100 $$

1def percentile_rank(data, value):
2    """Calculate percentile rank of a value"""
3    below = sum(x < value for x in data)
4    return (below / len(data)) * 100
5
6# Example
7data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
8rank = percentile_rank(data, 55)
9print(f"55 is at the {rank}th percentile")  # 50th percentile

Quantiles

Quantile: Generalization of percentiles.

p-quantile: Value that divides data so proportion p is below it
Percentiles are quantiles expressed as percentages

$$ Q(p) = \text{value such that } P(X \leq Q(p)) = p $$

 1import numpy as np
 2
 3data = np.random.normal(100, 15, 1000)
 4
 5# Quantiles (0 to 1)
 6q_25 = np.quantile(data, 0.25)  # Same as 25th percentile
 7q_50 = np.quantile(data, 0.50)  # Median
 8q_75 = np.quantile(data, 0.75)  # Same as 75th percentile
 9
10# Percentiles (0 to 100)
11p_25 = np.percentile(data, 25)
12p_50 = np.percentile(data, 50)
13p_75 = np.percentile(data, 75)
14
15# They're the same!
16assert q_25 == p_25

Applications

1. Growth Charts

Children's height/weight percentiles show where a child falls relative to peers.

2. Test Scores

SAT/GRE scores often reported as percentiles.

3. Income Distribution

Median income = 50th percentile of income distribution.

4. Performance Metrics

Website load time: "95th percentile < 2 seconds" means 95% of requests are faster than 2s.

5. Outlier Detection

Values beyond 1.5×IQR from quartiles are potential outliers.

Percentiles vs Mean

Measure	Pros	Cons	Best For
Mean	Uses all data Mathematically convenient	Sensitive to outliers Can be misleading for skewed data	Symmetric distributions No outliers
Median (50th percentile)	Robust to outliers Represents "typical" value	Ignores magnitude of extremes Less efficient statistically	Skewed distributions Data with outliers

Key Insight:

Right-skewed data: Mean > Median (pulled right by high outliers)
Left-skewed data: Mean < Median (pulled left by low outliers)
Symmetric data: Mean ≈ Median

Example: Income distribution is right-skewed, so median income is more representative than mean income.

Related Snippets

Bayes' Theorem & Applications
Bayesian inference and practical applications
Central Limit Theorem
Foundation of statistical inference
Common Probability Distributions
Normal, Binomial, Poisson, Exponential, Gamma, Pareto distributions
Monte Carlo Methods
Simulation and numerical integration
Null Hypothesis Testing
Understanding null hypothesis and hypothesis testing
P-Values Explained
Understanding p-values and statistical significance
Probability Basics
Fundamental probability concepts and rules
Random Variables
Expected value, variance, and moments
Statistical Moments
Mean, variance, skewness, and kurtosis explained