Database Performance & SQL vs NoSQL

Database performance issues, solutions, and SQL vs NoSQL comparison.


Common Performance Issues

1. Missing Indexes

1-- ❌ Problem: Full table scan
2SELECT * FROM users WHERE email = 'test@example.com';
3
4-- βœ… Solution: Add index
5CREATE INDEX idx_email ON users(email);
6
7-- Verify index usage
8EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

2. N+1 Queries

 1# ❌ Problem: N+1 queries
 2users = User.query.all()
 3for user in users:
 4    orders = Order.query.filter_by(user_id=user.id).all()  # N queries
 5
 6# βœ… Solution: Use JOIN or eager loading
 7users = User.query.options(joinedload(User.orders)).all()
 8
 9# SQL equivalent
10SELECT u.*, o.*
11FROM users u
12LEFT JOIN orders o ON u.id = o.user_id;

3. SELECT *

1-- ❌ Problem: Fetching unnecessary data
2SELECT * FROM users;
3
4-- βœ… Solution: Select only needed columns
5SELECT id, username, email FROM users;

4. Lack of Query Optimization

1-- ❌ Problem: Inefficient query
2SELECT * FROM orders 
3WHERE YEAR(created_at) = 2024;
4
5-- βœ… Solution: Use range query (can use index)
6SELECT * FROM orders 
7WHERE created_at >= '2024-01-01' 
8  AND created_at < '2025-01-01';

5. No Connection Pooling

 1# ❌ Problem: Creating new connection each time
 2def get_data():
 3    conn = psycopg2.connect(...)
 4    # query
 5    conn.close()
 6
 7# βœ… Solution: Use connection pool
 8from psycopg2 import pool
 9
10connection_pool = pool.SimpleConnectionPool(1, 20, ...)
11
12def get_data():
13    conn = connection_pool.getconn()
14    try:
15        # query
16    finally:
17        connection_pool.putconn(conn)

Index Strategies

Types of Indexes

 1-- B-Tree Index (default, good for equality and range)
 2CREATE INDEX idx_age ON users(age);
 3
 4-- Hash Index (only equality, faster than B-Tree)
 5CREATE INDEX idx_email_hash ON users USING HASH(email);
 6
 7-- GIN Index (for arrays, JSONB, full-text)
 8CREATE INDEX idx_tags ON posts USING GIN(tags);
 9
10-- GiST Index (for geometric data, full-text)
11CREATE INDEX idx_location ON places USING GIST(location);
12
13-- Partial Index (index subset of rows)
14CREATE INDEX idx_active_users ON users(email) WHERE active = true;
15
16-- Expression Index
17CREATE INDEX idx_lower_email ON users(LOWER(email));
18
19-- Composite Index (multiple columns)
20CREATE INDEX idx_name_email ON users(last_name, first_name, email);

When to Use Indexes

 1βœ… Use indexes for:
 2- Columns in WHERE clauses
 3- Columns in JOIN conditions
 4- Columns in ORDER BY
 5- Foreign keys
 6- Columns used in GROUP BY
 7
 8❌ Don't over-index:
 9- Small tables (< 1000 rows)
10- Columns with low cardinality (few distinct values)
11- Frequently updated columns
12- Wide columns (large text/blob)

Query Optimization

Use EXPLAIN

 1-- PostgreSQL
 2EXPLAIN ANALYZE
 3SELECT u.username, COUNT(o.id) AS order_count
 4FROM users u
 5LEFT JOIN orders o ON u.id = o.user_id
 6GROUP BY u.id, u.username;
 7
 8-- Look for:
 9-- - Seq Scan (bad) vs Index Scan (good)
10-- - High cost numbers
11-- - Large row estimates

Optimize JOINs

 1-- ❌ Inefficient: Multiple LEFT JOINs
 2SELECT *
 3FROM users u
 4LEFT JOIN orders o ON u.id = o.user_id
 5LEFT JOIN order_items oi ON o.id = oi.order_id
 6LEFT JOIN products p ON oi.product_id = p.id;
 7
 8-- βœ… Better: Use subqueries or CTEs
 9WITH user_orders AS (
10    SELECT user_id, COUNT(*) as order_count
11    FROM orders
12    GROUP BY user_id
13)
14SELECT u.*, uo.order_count
15FROM users u
16LEFT JOIN user_orders uo ON u.id = uo.user_id;

Use Materialized Views

 1-- Create materialized view
 2CREATE MATERIALIZED VIEW user_stats AS
 3SELECT 
 4    u.id,
 5    u.username,
 6    COUNT(o.id) AS order_count,
 7    SUM(o.total) AS total_spent
 8FROM users u
 9LEFT JOIN orders o ON u.id = o.user_id
10GROUP BY u.id, u.username;
11
12-- Refresh periodically
13REFRESH MATERIALIZED VIEW user_stats;
14
15-- Query is now fast
16SELECT * FROM user_stats WHERE order_count > 10;

SQL vs NoSQL Comparison

Feature Comparison

FeatureSQLNoSQL
SchemaFixed, predefinedFlexible, dynamic
ScalabilityVertical (scale up)Horizontal (scale out)
TransactionsACID guaranteedEventually consistent (usually)
JoinsNative supportLimited or application-level
Query LanguageSQL (standardized)Database-specific
Use CaseComplex queries, relationshipsHigh throughput, flexible schema
Data IntegrityStrongWeaker (trade-off for performance)
Learning CurveModerateVaries by database

When to Use SQL

 1βœ… Use SQL when:
 2- Data has clear relationships
 3- Need ACID transactions
 4- Complex queries with JOINs
 5- Data integrity is critical
 6- Schema is stable
 7- Need strong consistency
 8
 9Examples:
10- Banking systems
11- E-commerce transactions
12- CRM systems
13- ERP systems

When to Use NoSQL

 1βœ… Use NoSQL when:
 2- Schema changes frequently
 3- Need horizontal scaling
 4- High write throughput
 5- Eventual consistency is acceptable
 6- Data is document-oriented or graph-based
 7- Need flexible data model
 8
 9Examples:
10- Real-time analytics
11- Content management
12- Social networks
13- IoT data
14- Caching layers

Hybrid Approach

 1Many applications use both:
 2
 3PostgreSQL (SQL):
 4- User accounts
 5- Orders
 6- Transactions
 7- Financial data
 8
 9MongoDB (NoSQL):
10- Product catalog
11- User preferences
12- Session data
13- Logs
14
15Redis (Cache):
16- Session storage
17- Rate limiting
18- Real-time data
19
20Neo4j (Graph):
21- Social connections
22- Recommendations
23- Fraud detection

Database-Specific Optimizations

PostgreSQL

 1-- Vacuum (reclaim space)
 2VACUUM ANALYZE users;
 3
 4-- Increase work_mem for complex queries
 5SET work_mem = '256MB';
 6
 7-- Use parallel queries
 8SET max_parallel_workers_per_gather = 4;
 9
10-- Connection pooling (pgBouncer)
11-- Use external tool for connection pooling
12
13-- Partitioning
14CREATE TABLE orders (
15    id SERIAL,
16    created_at TIMESTAMP
17) PARTITION BY RANGE (created_at);

MySQL

 1-- Optimize table
 2OPTIMIZE TABLE users;
 3
 4-- Analyze table
 5ANALYZE TABLE users;
 6
 7-- Query cache (MySQL 5.7)
 8SET GLOBAL query_cache_size = 1048576;
 9
10-- InnoDB buffer pool
11SET GLOBAL innodb_buffer_pool_size = 2147483648;  -- 2GB

MongoDB

 1// Create indexes
 2db.users.createIndex({ email: 1 })
 3
 4// Compound index
 5db.users.createIndex({ last_name: 1, first_name: 1 })
 6
 7// Use aggregation pipeline efficiently
 8db.orders.aggregate([
 9    { $match: { status: 'completed' } },  // Filter early
10    { $group: { _id: '$userId', total: { $sum: '$amount' } } }
11])
12
13// Use projection to limit fields
14db.users.find({}, { name: 1, email: 1, _id: 0 })
15
16// Enable profiling
17db.setProfilingLevel(2)  // Log all operations
18db.system.profile.find().sort({ ts: -1 }).limit(5)

Monitoring & Tools

PostgreSQL

 1-- Active queries
 2SELECT pid, query, state, query_start
 3FROM pg_stat_activity
 4WHERE state = 'active';
 5
 6-- Table sizes
 7SELECT 
 8    schemaname,
 9    tablename,
10    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
11FROM pg_tables
12ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
13
14-- Index usage
15SELECT 
16    schemaname,
17    tablename,
18    indexname,
19    idx_scan,
20    idx_tup_read,
21    idx_tup_fetch
22FROM pg_stat_user_indexes
23ORDER BY idx_scan ASC;
24
25-- Cache hit ratio
26SELECT 
27    sum(heap_blks_read) as heap_read,
28    sum(heap_blks_hit)  as heap_hit,
29    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
30FROM pg_statio_user_tables;

MySQL

 1-- Show processlist
 2SHOW FULL PROCESSLIST;
 3
 4-- Slow queries
 5SET GLOBAL slow_query_log = 'ON';
 6SET GLOBAL long_query_time = 2;
 7
 8-- Table status
 9SHOW TABLE STATUS;
10
11-- Index statistics
12SHOW INDEX FROM users;

MongoDB

 1// Current operations
 2db.currentOp()
 3
 4// Database stats
 5db.stats()
 6
 7// Collection stats
 8db.users.stats()
 9
10// Explain query
11db.users.find({ email: 'test@example.com' }).explain('executionStats')

Best Practices Summary

 1βœ… DO:
 2- Use indexes on frequently queried columns
 3- Use connection pooling
 4- Select only needed columns
 5- Use EXPLAIN to analyze queries
 6- Use appropriate data types
 7- Normalize data (SQL) or denormalize (NoSQL) appropriately
 8- Monitor slow queries
 9- Use caching (Redis, Memcached)
10- Use read replicas for read-heavy workloads
11- Partition large tables
12
13❌ DON'T:
14- Use SELECT *
15- Create too many indexes
16- Ignore query performance
17- Store large BLOBs in database
18- Use ORM without understanding generated SQL
19- Ignore connection limits
20- Skip backups
21- Use database for session storage (use Redis)

Related Snippets