Percentiles and Quantiles

Visual guide to percentiles, quartiles, and quantiles for understanding data distribution.


What are Percentiles?

Percentile: The value below which a given percentage of observations fall.

  • 50th percentile = Median (half the data is below)
  • 25th percentile = First quartile (Q1)
  • 75th percentile = Third quartile (Q3)

Key Concept: The shaded area under a distribution curve up to a percentile line represents the percentage of data below that value.


Box Plot Representation

A box plot (box-and-whisker plot) visualizes the five-number summary and quartiles.

1    Min          Q1        Median      Q3          Max
2     |           |           |          |           |
3     |-----------|===========|==========|-----------|
4                    ← IQR →
5                 (middle 50%)

Components:

  • Box: Contains middle 50% of data (from Q1 to Q3)
  • Line inside box: Median (Q2)
  • Whiskers: Extend to minimum and maximum (or 1.5×IQR for outlier detection)
  • IQR (Interquartile Range): Q3 - Q1 (measure of spread)

Interactive Percentile Calculator


Common Percentiles

PercentileNameMeaning
0thMinimumSmallest value
25thQ1 (First Quartile)25% below, 75% above
50thQ2 (Median)Middle value
75thQ3 (Third Quartile)75% below, 25% above
100thMaximumLargest value

Special Percentiles

  • Deciles: 10th, 20th, ..., 90th (divide into 10 parts)
  • Quintiles: 20th, 40th, 60th, 80th (divide into 5 parts)
  • Quartiles: 25th, 50th, 75th (divide into 4 parts)

Interquartile Range (IQR)

$$ \text{IQR} = Q_3 - Q_1 $$

Use: Measure of spread that's robust to outliers.

Outlier Detection:

  • Lower fence: $Q_1 - 1.5 \times \text{IQR}$
  • Upper fence: $Q_3 + 1.5 \times \text{IQR}$
 1import numpy as np
 2
 3data = np.random.normal(100, 15, 1000)
 4
 5# Calculate percentiles
 6q1 = np.percentile(data, 25)
 7q2 = np.percentile(data, 50)  # median
 8q3 = np.percentile(data, 75)
 9
10iqr = q3 - q1
11
12# Outlier detection
13lower_fence = q1 - 1.5 * iqr
14upper_fence = q3 + 1.5 * iqr
15
16outliers = data[(data < lower_fence) | (data > upper_fence)]
17print(f"Found {len(outliers)} outliers")

Percentile Rank

Percentile Rank: The percentage of scores that fall below a given value.

$$ \text{Percentile Rank} = \frac{\text{Number of values below}}{\text{Total number of values}} \times 100 $$

1def percentile_rank(data, value):
2    """Calculate percentile rank of a value"""
3    below = sum(x < value for x in data)
4    return (below / len(data)) * 100
5
6# Example
7data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
8rank = percentile_rank(data, 55)
9print(f"55 is at the {rank}th percentile")  # 50th percentile

Quantiles

Quantile: Generalization of percentiles.

  • p-quantile: Value that divides data so proportion p is below it
  • Percentiles are quantiles expressed as percentages

$$ Q(p) = \text{value such that } P(X \leq Q(p)) = p $$

 1import numpy as np
 2
 3data = np.random.normal(100, 15, 1000)
 4
 5# Quantiles (0 to 1)
 6q_25 = np.quantile(data, 0.25)  # Same as 25th percentile
 7q_50 = np.quantile(data, 0.50)  # Median
 8q_75 = np.quantile(data, 0.75)  # Same as 75th percentile
 9
10# Percentiles (0 to 100)
11p_25 = np.percentile(data, 25)
12p_50 = np.percentile(data, 50)
13p_75 = np.percentile(data, 75)
14
15# They're the same!
16assert q_25 == p_25

Applications

1. Growth Charts

Children's height/weight percentiles show where a child falls relative to peers.

2. Test Scores

SAT/GRE scores often reported as percentiles.

3. Income Distribution

Median income = 50th percentile of income distribution.

4. Performance Metrics

Website load time: "95th percentile < 2 seconds" means 95% of requests are faster than 2s.

5. Outlier Detection

Values beyond 1.5×IQR from quartiles are potential outliers.


Percentiles vs Mean

MeasureProsConsBest For
MeanUses all data
Mathematically convenient
Sensitive to outliers
Can be misleading for skewed data
Symmetric distributions
No outliers
Median (50th percentile)Robust to outliers
Represents "typical" value
Ignores magnitude of extremes
Less efficient statistically
Skewed distributions
Data with outliers

Key Insight:

  • Right-skewed data: Mean > Median (pulled right by high outliers)
  • Left-skewed data: Mean < Median (pulled left by low outliers)
  • Symmetric data: Mean ≈ Median

Example: Income distribution is right-skewed, so median income is more representative than mean income.


Further Reading

Related Snippets