Database Performance & SQL vs NoSQL
Database performance issues, solutions, and SQL vs NoSQL comparison.
Common Performance Issues
1. Missing Indexes
1-- β Problem: Full table scan
2SELECT * FROM users WHERE email = 'test@example.com';
3
4-- β
Solution: Add index
5CREATE INDEX idx_email ON users(email);
6
7-- Verify index usage
8EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
2. N+1 Queries
1# β Problem: N+1 queries
2users = User.query.all()
3for user in users:
4 orders = Order.query.filter_by(user_id=user.id).all() # N queries
5
6# β
Solution: Use JOIN or eager loading
7users = User.query.options(joinedload(User.orders)).all()
8
9# SQL equivalent
10SELECT u.*, o.*
11FROM users u
12LEFT JOIN orders o ON u.id = o.user_id;
3. SELECT *
1-- β Problem: Fetching unnecessary data
2SELECT * FROM users;
3
4-- β
Solution: Select only needed columns
5SELECT id, username, email FROM users;
4. Lack of Query Optimization
1-- β Problem: Inefficient query
2SELECT * FROM orders
3WHERE YEAR(created_at) = 2024;
4
5-- β
Solution: Use range query (can use index)
6SELECT * FROM orders
7WHERE created_at >= '2024-01-01'
8 AND created_at < '2025-01-01';
5. No Connection Pooling
1# β Problem: Creating new connection each time
2def get_data():
3 conn = psycopg2.connect(...)
4 # query
5 conn.close()
6
7# β
Solution: Use connection pool
8from psycopg2 import pool
9
10connection_pool = pool.SimpleConnectionPool(1, 20, ...)
11
12def get_data():
13 conn = connection_pool.getconn()
14 try:
15 # query
16 finally:
17 connection_pool.putconn(conn)
Index Strategies
Types of Indexes
1-- B-Tree Index (default, good for equality and range)
2CREATE INDEX idx_age ON users(age);
3
4-- Hash Index (only equality, faster than B-Tree)
5CREATE INDEX idx_email_hash ON users USING HASH(email);
6
7-- GIN Index (for arrays, JSONB, full-text)
8CREATE INDEX idx_tags ON posts USING GIN(tags);
9
10-- GiST Index (for geometric data, full-text)
11CREATE INDEX idx_location ON places USING GIST(location);
12
13-- Partial Index (index subset of rows)
14CREATE INDEX idx_active_users ON users(email) WHERE active = true;
15
16-- Expression Index
17CREATE INDEX idx_lower_email ON users(LOWER(email));
18
19-- Composite Index (multiple columns)
20CREATE INDEX idx_name_email ON users(last_name, first_name, email);
When to Use Indexes
1β
Use indexes for:
2- Columns in WHERE clauses
3- Columns in JOIN conditions
4- Columns in ORDER BY
5- Foreign keys
6- Columns used in GROUP BY
7
8β Don't over-index:
9- Small tables (< 1000 rows)
10- Columns with low cardinality (few distinct values)
11- Frequently updated columns
12- Wide columns (large text/blob)
Query Optimization
Use EXPLAIN
1-- PostgreSQL
2EXPLAIN ANALYZE
3SELECT u.username, COUNT(o.id) AS order_count
4FROM users u
5LEFT JOIN orders o ON u.id = o.user_id
6GROUP BY u.id, u.username;
7
8-- Look for:
9-- - Seq Scan (bad) vs Index Scan (good)
10-- - High cost numbers
11-- - Large row estimates
Optimize JOINs
1-- β Inefficient: Multiple LEFT JOINs
2SELECT *
3FROM users u
4LEFT JOIN orders o ON u.id = o.user_id
5LEFT JOIN order_items oi ON o.id = oi.order_id
6LEFT JOIN products p ON oi.product_id = p.id;
7
8-- β
Better: Use subqueries or CTEs
9WITH user_orders AS (
10 SELECT user_id, COUNT(*) as order_count
11 FROM orders
12 GROUP BY user_id
13)
14SELECT u.*, uo.order_count
15FROM users u
16LEFT JOIN user_orders uo ON u.id = uo.user_id;
Use Materialized Views
1-- Create materialized view
2CREATE MATERIALIZED VIEW user_stats AS
3SELECT
4 u.id,
5 u.username,
6 COUNT(o.id) AS order_count,
7 SUM(o.total) AS total_spent
8FROM users u
9LEFT JOIN orders o ON u.id = o.user_id
10GROUP BY u.id, u.username;
11
12-- Refresh periodically
13REFRESH MATERIALIZED VIEW user_stats;
14
15-- Query is now fast
16SELECT * FROM user_stats WHERE order_count > 10;
SQL vs NoSQL Comparison
Feature Comparison
| Feature | SQL | NoSQL |
|---|---|---|
| Schema | Fixed, predefined | Flexible, dynamic |
| Scalability | Vertical (scale up) | Horizontal (scale out) |
| Transactions | ACID guaranteed | Eventually consistent (usually) |
| Joins | Native support | Limited or application-level |
| Query Language | SQL (standardized) | Database-specific |
| Use Case | Complex queries, relationships | High throughput, flexible schema |
| Data Integrity | Strong | Weaker (trade-off for performance) |
| Learning Curve | Moderate | Varies by database |
When to Use SQL
1β
Use SQL when:
2- Data has clear relationships
3- Need ACID transactions
4- Complex queries with JOINs
5- Data integrity is critical
6- Schema is stable
7- Need strong consistency
8
9Examples:
10- Banking systems
11- E-commerce transactions
12- CRM systems
13- ERP systems
When to Use NoSQL
1β
Use NoSQL when:
2- Schema changes frequently
3- Need horizontal scaling
4- High write throughput
5- Eventual consistency is acceptable
6- Data is document-oriented or graph-based
7- Need flexible data model
8
9Examples:
10- Real-time analytics
11- Content management
12- Social networks
13- IoT data
14- Caching layers
Hybrid Approach
1Many applications use both:
2
3PostgreSQL (SQL):
4- User accounts
5- Orders
6- Transactions
7- Financial data
8
9MongoDB (NoSQL):
10- Product catalog
11- User preferences
12- Session data
13- Logs
14
15Redis (Cache):
16- Session storage
17- Rate limiting
18- Real-time data
19
20Neo4j (Graph):
21- Social connections
22- Recommendations
23- Fraud detection
Database-Specific Optimizations
PostgreSQL
1-- Vacuum (reclaim space)
2VACUUM ANALYZE users;
3
4-- Increase work_mem for complex queries
5SET work_mem = '256MB';
6
7-- Use parallel queries
8SET max_parallel_workers_per_gather = 4;
9
10-- Connection pooling (pgBouncer)
11-- Use external tool for connection pooling
12
13-- Partitioning
14CREATE TABLE orders (
15 id SERIAL,
16 created_at TIMESTAMP
17) PARTITION BY RANGE (created_at);
MySQL
1-- Optimize table
2OPTIMIZE TABLE users;
3
4-- Analyze table
5ANALYZE TABLE users;
6
7-- Query cache (MySQL 5.7)
8SET GLOBAL query_cache_size = 1048576;
9
10-- InnoDB buffer pool
11SET GLOBAL innodb_buffer_pool_size = 2147483648; -- 2GB
MongoDB
1// Create indexes
2db.users.createIndex({ email: 1 })
3
4// Compound index
5db.users.createIndex({ last_name: 1, first_name: 1 })
6
7// Use aggregation pipeline efficiently
8db.orders.aggregate([
9 { $match: { status: 'completed' } }, // Filter early
10 { $group: { _id: '$userId', total: { $sum: '$amount' } } }
11])
12
13// Use projection to limit fields
14db.users.find({}, { name: 1, email: 1, _id: 0 })
15
16// Enable profiling
17db.setProfilingLevel(2) // Log all operations
18db.system.profile.find().sort({ ts: -1 }).limit(5)
Monitoring & Tools
PostgreSQL
1-- Active queries
2SELECT pid, query, state, query_start
3FROM pg_stat_activity
4WHERE state = 'active';
5
6-- Table sizes
7SELECT
8 schemaname,
9 tablename,
10 pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
11FROM pg_tables
12ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
13
14-- Index usage
15SELECT
16 schemaname,
17 tablename,
18 indexname,
19 idx_scan,
20 idx_tup_read,
21 idx_tup_fetch
22FROM pg_stat_user_indexes
23ORDER BY idx_scan ASC;
24
25-- Cache hit ratio
26SELECT
27 sum(heap_blks_read) as heap_read,
28 sum(heap_blks_hit) as heap_hit,
29 sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
30FROM pg_statio_user_tables;
MySQL
1-- Show processlist
2SHOW FULL PROCESSLIST;
3
4-- Slow queries
5SET GLOBAL slow_query_log = 'ON';
6SET GLOBAL long_query_time = 2;
7
8-- Table status
9SHOW TABLE STATUS;
10
11-- Index statistics
12SHOW INDEX FROM users;
MongoDB
1// Current operations
2db.currentOp()
3
4// Database stats
5db.stats()
6
7// Collection stats
8db.users.stats()
9
10// Explain query
11db.users.find({ email: 'test@example.com' }).explain('executionStats')
Best Practices Summary
1β
DO:
2- Use indexes on frequently queried columns
3- Use connection pooling
4- Select only needed columns
5- Use EXPLAIN to analyze queries
6- Use appropriate data types
7- Normalize data (SQL) or denormalize (NoSQL) appropriately
8- Monitor slow queries
9- Use caching (Redis, Memcached)
10- Use read replicas for read-heavy workloads
11- Partition large tables
12
13β DON'T:
14- Use SELECT *
15- Create too many indexes
16- Ignore query performance
17- Store large BLOBs in database
18- Use ORM without understanding generated SQL
19- Ignore connection limits
20- Skip backups
21- Use database for session storage (use Redis)
Related Snippets
- MongoDB Essentials
MongoDB queries, aggregation, and Docker setup for document-based NoSQL β¦ - MySQL Essentials
MySQL commands, optimization, and Docker setup. Docker Setup Docker Run 1# Run β¦ - Neo4j Graph Database
Neo4j graph database queries, algorithms, and vector search capabilities. Docker β¦ - PostgreSQL Essentials & pgvector
PostgreSQL advanced features including procedures, views, pgvector for vector β¦ - SQL Language Basics
SQL language fundamentals - essential commands and patterns for relational β¦ - SQL vs NoSQL Comparison
Comprehensive comparison of SQL and NoSQL databases to help choose the right β¦ - SQLite Essentials
SQLite commands for embedded database applications. Basic Commands 1# β¦