Architecture Interview Questions - Medium

Medium-level software architecture interview questions covering distributed systems, event-driven architecture, and advanced patterns.

Q1: Explain Event-Driven Architecture and its benefits.

Answer:

Definition: Architecture where components communicate through events (state changes) rather than direct calls.

Core Concepts:

Components

Event: Immutable fact about something that happened

1{
2  "eventType": "OrderPlaced",
3  "timestamp": "2025-12-13T10:30:00Z",
4  "data": {
5    "orderId": "12345",
6    "userId": "user-789",
7    "amount": 99.99
8  }
9}

Patterns

1. Pub/Sub (Publish-Subscribe):

1Publisher β†’ Topic β†’ [Subscriber1, Subscriber2, Subscriber3]
  • Publishers don't know subscribers
  • Multiple subscribers can listen to same event
  • Loose coupling

2. Event Streaming:

1Producer β†’ Stream (Kafka, Kinesis) β†’ Consumers
  • Events stored in order
  • Consumers can replay events
  • Event sourcing possible

3. Event Notification:

1Service A β†’ Event: "User Created" β†’ Service B reacts
  • Minimal data in event
  • Consumers fetch details if needed

4. Event-Carried State Transfer:

1Event contains full state:
2{
3  "eventType": "UserUpdated",
4  "user": {complete user object}
5}
  • No need to query for details
  • Larger events

Benefits

Loose Coupling:

  • Services don't need to know about each other
  • Add/remove consumers without affecting producers
  • Independent deployment

Scalability:

  • Async processing
  • Consumers scale independently
  • Buffer spikes with message queue

Flexibility:

  • Easy to add new features (new consumers)
  • Change business logic without modifying producers

Resilience:

  • Failures isolated
  • Messages persisted (retry possible)
  • Graceful degradation

Audit Trail:

  • All events logged
  • Replay for debugging
  • Event sourcing for complete history

Challenges

Complexity:

  • Harder to trace flow
  • Debugging distributed events
  • Need monitoring/observability

Eventual Consistency:

  • Not immediately consistent
  • Must handle out-of-order events
  • Idempotency required

Event Schema Evolution:

  • Backward/forward compatibility needed
  • Versioning strategy required

Example Use Case

E-commerce Order System:

 11. User places order
 2   β†’ OrderPlaced event
 3
 42. Multiple services react:
 5   - Inventory Service: Reserve items
 6   - Payment Service: Process payment
 7   - Notification Service: Send confirmation email
 8   - Analytics Service: Track metrics
 9   - Shipping Service: Create shipment
10
113. Each service emits own events:
12   - PaymentProcessed
13   - ItemsReserved
14   - EmailSent
15   - ShipmentCreated

Traditional (synchronous):

1OrderService calls:
2  β†’ InventoryService.reserve()
3  β†’ PaymentService.process()
4  β†’ NotificationService.send()
5  β†’ ...
6(If any fails, entire operation fails)

Event-Driven (asynchronous):

1OrderService emits OrderPlaced event
2Each service independently:
3  - Listens for event
4  - Processes asynchronously
5  - Emits own events
6(Failures isolated, can retry)

Q2: What is CQRS (Command Query Responsibility Segregation)?

Answer:

Definition: Separate read and write operations into different models.

Traditional Approach:

CQRS Approach:

Commands (Write Side)

Purpose: Modify state

Characteristics:

  • Validates business rules
  • Emits domain events
  • Optimized for writes
  • Normalized schema

Example:

 1Command: PlaceOrder
 2{
 3  "userId": "123",
 4  "items": [...],
 5  "shippingAddress": {...}
 6}
 7
 8Processing:
 91. Validate user exists
102. Check inventory
113. Calculate total
124. Create order
135. Emit OrderPlaced event

Queries (Read Side)

Purpose: Retrieve data

Characteristics:

  • No business logic
  • Optimized for reads
  • Denormalized (pre-joined)
  • Can have multiple read models

Example:

 1Query: GetUserOrderHistory
 2{
 3  "userId": "123"
 4}
 5
 6Returns:
 7{
 8  "orders": [
 9    {
10      "orderId": "...",
11      "date": "...",
12      "total": 99.99,
13      "items": [...],  // Pre-joined
14      "status": "..."
15    }
16  ]
17}

Synchronization

Eventual Consistency:

11. Command updates Write DB
22. Emits event
33. Event handler updates Read DB
44. Small delay (milliseconds to seconds)

Strategies:

  • Event-driven (recommended)
  • Database triggers
  • Change data capture (CDC)
  • Scheduled sync

Benefits

Performance:

  • Optimize read and write separately
  • Scale independently
  • Different databases (SQL for writes, NoSQL for reads)

Flexibility:

  • Multiple read models for different views
  • Add new read models without affecting writes

Scalability:

  • Read replicas
  • Caching strategies
  • Different scaling strategies

Simplicity:

  • Simpler models (not trying to serve both)
  • Clear separation of concerns

When to Use

βœ… Use CQRS when:

  • Complex domain logic
  • Read/write patterns very different
  • Need to scale reads and writes independently
  • Multiple views of same data needed

❌ Don't use CQRS when:

  • Simple CRUD application
  • Read/write patterns similar
  • Team unfamiliar with pattern
  • Eventual consistency unacceptable

Example: E-commerce

Write Model (Orders):

 1Orders Table:
 2- order_id
 3- user_id
 4- status
 5- created_at
 6
 7OrderItems Table:
 8- order_item_id
 9- order_id
10- product_id
11- quantity
12- price

Read Model (Order History View):

 1OrderHistoryView (Denormalized):
 2{
 3  "orderId": "...",
 4  "userName": "Alice",
 5  "userEmail": "alice@example.com",
 6  "orderDate": "...",
 7  "totalAmount": 99.99,
 8  "items": [
 9    {
10      "productName": "Widget",
11      "quantity": 2,
12      "price": 49.99
13    }
14  ],
15  "shippingAddress": "...",
16  "status": "Shipped"
17}

Read Model (Admin Dashboard):

1OrderStatistics:
2{
3  "totalOrders": 1000,
4  "totalRevenue": 50000,
5  "averageOrderValue": 50,
6  "topProducts": [...]
7}

Q3: Explain the Saga pattern for distributed transactions.

Answer:

Problem: Distributed transactions across microservices (no 2PC/XA transactions).

Definition: Sequence of local transactions where each transaction updates database and publishes event/message to trigger next step.

Saga Types

1. Choreography (Event-Driven)

How it works: Each service listens for events and decides what to do.

Pros:

  • No central coordinator
  • Loose coupling
  • Simple for simple workflows

Cons:

  • Hard to track overall state
  • Cyclic dependencies possible
  • Difficult to understand flow

2. Orchestration (Command-Driven)

How it works: Central orchestrator tells each service what to do.

Pros:

  • Clear workflow
  • Easy to track state
  • Centralized error handling

Cons:

  • Single point of failure
  • Orchestrator can become complex

Compensating Transactions

Problem: How to rollback when step fails?

Solution: Each step has compensating transaction.

Example - Order Saga:

Happy Path:

11. Create Order β†’ Success
22. Process Payment β†’ Success
33. Reserve Inventory β†’ Success
44. Create Shipment β†’ Success

Failure Scenario:

11. Create Order β†’ Success
22. Process Payment β†’ Success
33. Reserve Inventory β†’ FAIL
4
5Compensate:
63. Unreserve Inventory (N/A - failed)
72. Refund Payment ← Compensate
81. Cancel Order ← Compensate

Implementation Example (Orchestration)

Saga Definition:

 1class OrderSaga {
 2  steps = [
 3    {
 4      name: "CreateOrder",
 5      action: (data) => orderService.create(data),
 6      compensate: (data) => orderService.cancel(data.orderId)
 7    },
 8    {
 9      name: "ProcessPayment",
10      action: (data) => paymentService.charge(data),
11      compensate: (data) => paymentService.refund(data.paymentId)
12    },
13    {
14      name: "ReserveInventory",
15      action: (data) => inventoryService.reserve(data),
16      compensate: (data) => inventoryService.release(data.reservationId)
17    },
18    {
19      name: "CreateShipment",
20      action: (data) => shippingService.create(data),
21      compensate: (data) => shippingService.cancel(data.shipmentId)
22    }
23  ];
24  
25  async execute(orderData) {
26    const completedSteps = [];
27    
28    try {
29      for (const step of this.steps) {
30        const result = await step.action(orderData);
31        completedSteps.push({ step, result });
32        
33        // Update orderData with results for next steps
34        Object.assign(orderData, result);
35      }
36      
37      return { success: true, data: orderData };
38      
39    } catch (error) {
40      // Compensate in reverse order
41      for (const { step, result } of completedSteps.reverse()) {
42        try {
43          await step.compensate({ ...orderData, ...result });
44        } catch (compensateError) {
45          // Log and alert - manual intervention may be needed
46          logger.error("Compensation failed", compensateError);
47        }
48      }
49      
50      return { success: false, error };
51    }
52  }
53}

Saga State Management

Track saga state:

 1SagaInstance:
 2{
 3  "sagaId": "saga-12345",
 4  "type": "OrderSaga",
 5  "status": "InProgress",
 6  "currentStep": "ProcessPayment",
 7  "completedSteps": ["CreateOrder"],
 8  "data": {...},
 9  "startedAt": "...",
10  "updatedAt": "..."
11}

Handling Failures

Idempotency: Each step must be idempotent (can be retried safely)

 1// Check if already processed
 2if (alreadyProcessed(requestId)) {
 3  return previousResult;
 4}
 5
 6// Process
 7result = process();
 8
 9// Store result with requestId
10store(requestId, result);
11return result;

Timeout Handling:

1try {
2  result = await withTimeout(step.action(data), 30000);
3} catch (TimeoutError) {
4  // Decide: retry or compensate
5}

When to Use Saga

βœ… Use when:

  • Distributed transactions needed
  • Long-running processes
  • Need to maintain consistency across services

❌ Don't use when:

  • Single database (use local transactions)
  • Immediate consistency required
  • Simple workflows

Q4: What is API Gateway pattern and what problems does it solve?

Answer:

Definition: Single entry point for all client requests to microservices.

Architecture:

Problems It Solves

1. Multiple Client Types:

  • Mobile needs different data than web
  • Different protocols (HTTP, WebSocket, gRPC)
  • Gateway adapts responses per client

2. Cross-Cutting Concerns:

  • Authentication/Authorization
  • Rate limiting
  • Logging/Monitoring
  • SSL termination
  • Caching

3. Service Discovery:

  • Clients don't need to know service locations
  • Gateway routes to appropriate service
  • Load balancing

4. Protocol Translation:

  • External: REST/HTTP
  • Internal: gRPC, message queues
  • Gateway translates

Key Responsibilities

1. Request Routing

1GET /api/users/123 β†’ User Service
2GET /api/orders/456 β†’ Order Service
3GET /api/products/789 β†’ Product Service

2. Request Aggregation

 1Client requests: GET /api/user-dashboard/123
 2
 3Gateway:
 41. GET /users/123 β†’ User Service
 52. GET /orders?userId=123 β†’ Order Service
 63. GET /recommendations/123 β†’ Recommendation Service
 7
 8Combine results:
 9{
10  "user": {...},
11  "recentOrders": [...],
12  "recommendations": [...]
13}

3. Authentication/Authorization

11. Client sends request with token
22. Gateway validates token
33. If valid, forwards to service with user context
44. If invalid, returns 401

4. Rate Limiting

1User tier limits:
2- Free: 100 requests/hour
3- Premium: 1000 requests/hour
4- Enterprise: Unlimited
5
6Gateway tracks and enforces limits

5. Response Transformation

 1Internal service response:
 2{
 3  "user_id": 123,
 4  "first_name": "Alice",
 5  "last_name": "Smith",
 6  "created_timestamp": 1702468800
 7}
 8
 9Gateway transforms for mobile:
10{
11  "id": 123,
12  "name": "Alice Smith",
13  "memberSince": "2023-12-13"
14}

Implementation Patterns

Backend for Frontend (BFF):

Each client type has dedicated gateway optimized for its needs.

Cloud:

  • AWS API Gateway
  • Azure API Management
  • Google Cloud API Gateway

Self-Hosted:

  • Kong
  • Tyk
  • Apigee
  • Ambassador

Lightweight:

  • Nginx
  • Traefik
  • Envoy

Example Configuration (Kong)

 1services:
 2  - name: user-service
 3    url: http://user-service:8080
 4    routes:
 5      - name: user-route
 6        paths:
 7          - /api/users
 8    plugins:
 9      - name: rate-limiting
10        config:
11          minute: 100
12      - name: jwt
13        config:
14          claims_to_verify:
15            - exp
16      - name: cors
17        config:
18          origins:
19            - "*"

Challenges

Single Point of Failure:

  • Solution: Multiple gateway instances with load balancer
  • Health checks and auto-scaling

Performance Bottleneck:

  • Solution: Horizontal scaling
  • Caching
  • Async processing where possible

Complexity:

  • Can become bloated with too much logic
  • Keep gateway thin (routing, auth, basic transforms only)
  • Complex logic belongs in services

Best Practices

  1. Keep it thin: Routing and cross-cutting concerns only
  2. Don't put business logic: Belongs in services
  3. Cache aggressively: Reduce backend load
  4. Monitor closely: Gateway is critical path
  5. Version APIs: Support multiple API versions
  6. Use circuit breakers: Prevent cascade failures

Q5: Explain Circuit Breaker pattern.

Answer:

Problem: Prevent cascading failures when service is down.

Definition: Monitor failures and "open circuit" to fail fast instead of waiting for timeouts.

States

How It Works

CLOSED State (Normal):

11. All requests pass through
22. Track failures
33. If failures β‰₯ threshold β†’ OPEN

OPEN State (Failing):

11. Immediately return error (fail fast)
22. Don't call service
33. After timeout β†’ HALF-OPEN

HALF-OPEN State (Testing):

11. Allow limited requests through
22. If successful β†’ CLOSED
33. If failed β†’ OPEN

Implementation Example

 1class CircuitBreaker {
 2  constructor(options = {}) {
 3    this.failureThreshold = options.failureThreshold || 5;
 4    this.timeout = options.timeout || 60000; // 60 seconds
 5    this.monitoringPeriod = options.monitoringPeriod || 10000; // 10 seconds
 6    
 7    this.state = 'CLOSED';
 8    this.failureCount = 0;
 9    this.nextAttempt = Date.now();
10    this.successCount = 0;
11  }
12  
13  async execute(operation) {
14    if (this.state === 'OPEN') {
15      if (Date.now() < this.nextAttempt) {
16        throw new Error('Circuit breaker is OPEN');
17      }
18      // Timeout expired, try again
19      this.state = 'HALF-OPEN';
20      this.successCount = 0;
21    }
22    
23    try {
24      const result = await operation();
25      this.onSuccess();
26      return result;
27    } catch (error) {
28      this.onFailure();
29      throw error;
30    }
31  }
32  
33  onSuccess() {
34    this.failureCount = 0;
35    
36    if (this.state === 'HALF-OPEN') {
37      this.successCount++;
38      if (this.successCount >= 3) {
39        this.state = 'CLOSED';
40        console.log('Circuit breaker CLOSED');
41      }
42    }
43  }
44  
45  onFailure() {
46    this.failureCount++;
47    
48    if (this.state === 'HALF-OPEN') {
49      this.state = 'OPEN';
50      this.nextAttempt = Date.now() + this.timeout;
51      console.log('Circuit breaker OPEN (from HALF-OPEN)');
52      return;
53    }
54    
55    if (this.failureCount >= this.failureThreshold) {
56      this.state = 'OPEN';
57      this.nextAttempt = Date.now() + this.timeout;
58      console.log('Circuit breaker OPEN');
59    }
60  }
61  
62  getState() {
63    return {
64      state: this.state,
65      failureCount: this.failureCount,
66      nextAttempt: new Date(this.nextAttempt)
67    };
68  }
69}
70
71// Usage
72const breaker = new CircuitBreaker({
73  failureThreshold: 5,
74  timeout: 60000
75});
76
77async function callExternalService() {
78  try {
79    return await breaker.execute(async () => {
80      const response = await fetch('https://api.example.com/data');
81      if (!response.ok) throw new Error('Service error');
82      return await response.json();
83    });
84  } catch (error) {
85    if (error.message === 'Circuit breaker is OPEN') {
86      // Return fallback or cached data
87      return getFallbackData();
88    }
89    throw error;
90  }
91}

Benefits

Prevent Cascade Failures:

 1Without Circuit Breaker:
 2Service A β†’ Service B (down)
 3  ↓ waits for timeout (30s)
 4  ↓ all threads blocked
 5  ↓ Service A becomes unresponsive
 6  ↓ Services calling A also block
 7  ↓ Entire system fails
 8
 9With Circuit Breaker:
10Service A β†’ Circuit Breaker β†’ Service B (down)
11  ↓ Circuit opens after 5 failures
12  ↓ Fail fast (immediate response)
13  ↓ Service A remains responsive
14  ↓ System degraded but functional

Faster Recovery:

  • Don't overwhelm failing service
  • Give it time to recover
  • Gradually test recovery (HALF-OPEN)

Better User Experience:

  • Fast failures (no waiting)
  • Can return cached/fallback data
  • Clear error messages

Advanced Features

Fallback:

1async function callWithFallback() {
2  try {
3    return await breaker.execute(() => fetchFromAPI());
4  } catch (error) {
5    return await fetchFromCache();
6  }
7}

Monitoring:

 1breaker.on('open', () => {
 2  metrics.increment('circuit_breaker.open');
 3  alerting.notify('Circuit breaker opened for service X');
 4});
 5
 6breaker.on('half-open', () => {
 7  metrics.increment('circuit_breaker.half_open');
 8});
 9
10breaker.on('closed', () => {
11  metrics.increment('circuit_breaker.closed');
12});

Per-Endpoint Breakers:

1const breakers = {
2  userService: new CircuitBreaker({...}),
3  orderService: new CircuitBreaker({...}),
4  paymentService: new CircuitBreaker({...})
5};

When to Use

βœ… Use when:

  • Calling external services
  • Service failures expected
  • Want to prevent cascade failures
  • Need graceful degradation

❌ Don't use when:

  • Internal method calls
  • Database queries (use connection pooling instead)
  • Critical operations that must succeed

Summary

Medium architecture topics:

  • Event-Driven Architecture: Async, decoupled communication
  • CQRS: Separate read/write models
  • Saga Pattern: Distributed transactions
  • API Gateway: Single entry point, cross-cutting concerns
  • Circuit Breaker: Prevent cascade failures

These patterns enable building scalable, resilient distributed systems.

Related Snippets