13 - Scaling - Josh's Notes

- [[#The Four Axes of Scaling|The Four Axes of Scaling]] - [[#The Four Axes of Scaling#1. Vertical Scaling|1. Vertical Scaling]] - [[#The Four Axes of Scaling#2. Horizontal Duplication|2. Horizontal Duplication]] - [[#The Four Axes of Scaling#3. Data Partitioning (Sharding)|3. Data Partitioning (Sharding)]] - [[#The Four Axes of Scaling#4. Functional Decomposition|4. Functional Decomposition]] - [[#Combining Models|Combining Models]] - [[#Caching|Caching]] - [[#Caching#Cache Locations|Cache Locations]] - [[#Caching#Cache Invalidation Strategies|Cache Invalidation Strategies]] - [[#Caching#Caching in MusicCorp|Caching in MusicCorp]] - [[#Autoscaling|Autoscaling]] - [[#Autoscaling#Types of Autoscaling|Types of Autoscaling]] - [[#Autoscaling#Kubernetes Horizontal Pod Autoscaler|Kubernetes Horizontal Pod Autoscaler]] - [[#Autoscaling#Autoscaling Best Practices|Autoscaling Best Practices]] - [[#Scaling Considerations for MusicCorp|Scaling Considerations for MusicCorp]] - [[#Scaling Considerations for MusicCorp#Current Bottlenecks|Current Bottlenecks]] - [[#Scaling Considerations for MusicCorp#Scaling by Service|Scaling by Service]] - [[#Scaling Considerations for MusicCorp#Kafka Scaling|Kafka Scaling]] - [[#Key Principles|Key Principles]] - [[#Key Principles#Start Small|Start Small]] - [[#Key Principles#Scaling for Load vs Robustness|Scaling for Load vs Robustness]] - [[#Key Principles#Rearchitecture Is a Sign of Success|Rearchitecture Is a Sign of Success]] - [[#Key Principles#Experimentation and Measurement|Experimentation and Measurement]] - [[#How MusicCorp Compares to Chapter 13 Recommendations|How MusicCorp Compares to Chapter 13 Recommendations]] - [[#Action Items for MusicCorp|Action Items for MusicCorp]] - [[#Action Items for MusicCorp#Low Priority (When Needed)|Low Priority (When Needed)]] - [[#Action Items for MusicCorp#Future Considerations|Future Considerations]] - [[#Discussion Questions|Discussion Questions]] - [[#Key Quotes|Key Quotes]] - [[#Recommended Reading|Recommended Reading]] ## The Four Axes of Scaling ### 1. Vertical Scaling **What:** Get a bigger machine (more CPU, RAM, disk). **Characteristics:** - Quick and easy on cloud platforms - Limited by available hardware - Doesn't improve robustness - Good first step for quick wins **When to use:** - Database servers (often easier than sharding) - Compute-bound workloads - When horizontal scaling is complex ```yaml # Kubernetes: Request more resources resources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "2" memory: "2Gi" ``` ### 2. Horizontal Duplication **What:** Run multiple copies doing the same work. **Characteristics:** - Load balancers distribute requests - Stateless services scale easily - Read replicas for databases - Relatively straightforward **In MusicCorp:** ```yaml # Scale to 3 replicas spec: replicas: 3 ``` **Patterns:** - Kubernetes Deployments with multiple pods - Competing consumers for Kafka partitions - Read replicas for PostgreSQL ### 3. Data Partitioning (Sharding) **What:** Distribute data based on a key. **Characteristics:** - Great for write-heavy workloads - Partition key selection is critical - Cross-partition queries are complex - Can combine with horizontal duplication **Sharding strategies:** | Strategy | Example | Trade-off | |----------|---------|-----------| | **Hash-based** | Hash(order_id) % N | Even distribution, hard to query range | | **Range-based** | Orders by date | Good for time queries, hotspots possible | | **Geographic** | US, EU, APAC | Data locality, cross-region queries hard | ### 4. Functional Decomposition **What:** Extract functionality to scale independently. **Characteristics:** - The microservices approach - Rightsize infrastructure per service - Most complex but most flexible - Enables organizational scaling too **In MusicCorp:** - Catalog (read-heavy) can scale differently than Payment (transaction-heavy) - Each service has appropriate resource limits --- ## Combining Models Real systems use multiple scaling axes: ``` ┌─────────────────────────────────────────────────────────────────┐ │ MusicCorp Scaling │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Functional Decomposition (5 services) │ │ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ Catalog │ │ Order │ │ Payment │ │ │ │ (3 replicas) │ │ (2 replicas) │ │ (2 replicas) │ │ │ │ Horizontal │ │ Horizontal │ │ Horizontal │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ PostgreSQL │ │ │ │ (Vertical scaling) │ │ │ │ + Read replicas (Horizontal) │ │ │ └───────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Caching ### Cache Locations | Location | Pros | Cons | |----------|------|------| | **Client-side** | Best latency, reduces server load | Hard to invalidate | | **Server-side** | Easier invalidation, transparent | Still hits server | | **CDN** | Global distribution | Only for static content | ### Cache Invalidation Strategies | Strategy | How It Works | When to Use | |----------|--------------|-------------| | **TTL** | Expire after time | Simple, eventual consistency OK | | **Event-driven** | Invalidate on write events | Kafka events trigger invalidation | | **Write-through** | Update cache on write | Strong consistency needed | | **Write-behind** | Async cache update | Performance critical, some risk | ### Caching in MusicCorp **Current state:** No explicit caching implemented. **Opportunities:** - Cache album prices (rarely change) - Cache stock counts (with short TTL) - HTTP caching headers on Catalog API ```python # Example: Add caching headers @app.get("/albums/{sku}") def get_album(sku: str): album = db.get_album(sku) response = Response(album.json()) response.headers["Cache-Control"] = "public, max-age=300" # 5 minutes return response ``` --- ## Autoscaling ### Types of Autoscaling | Type | Trigger | Example | |------|---------|---------| | **Predictive** | Known patterns | Scale up before Black Friday | | **Reactive** | Observed metrics | Scale on CPU > 70% | | **Event-driven** | Queue depth | Scale on Kafka lag | ### Kubernetes Horizontal Pod Autoscaler ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: order-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: order minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` ### Autoscaling Best Practices 1. **Start with failure handling** before load-based scaling 2. **Scale up fast, scale down slow** to avoid flapping 3. **Set appropriate min/max** to control costs 4. **Test autoscaling** before you need it --- ## Scaling Considerations for MusicCorp ### Current Bottlenecks | Component | Scaling Approach | Notes | |-----------|-----------------|-------| | **PostgreSQL** | Vertical first, then read replicas | Single point of failure | | **Kafka** | Add partitions | Currently single partition | | **Application services** | Horizontal (replicas) | Already stateless | ### Scaling by Service | Service | Current | Scale Strategy | |---------|---------|----------------| | **Catalog** | 1 pod | Horizontal (read-heavy, cacheable) | | **Inventory** | 1 pod | Horizontal (with care for stock updates) | | **Order** | 1 pod | Horizontal (saga state in DB) | | **Payment** | 1 pod | Horizontal (external provider handles load) | | **Shipping** | 1 pod | Horizontal (async processing) | ### Kafka Scaling **Current:** Single partition per topic = single consumer per topic. **Scaling path:** 1. Add partitions to topics 2. Partition by order_id (related events to same partition) 3. Add consumer instances (one per partition) ``` ┌─────────────────────────────────────────────────────────────────┐ │ order.placed topic │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Partition 0│ │ Partition 1│ │ Partition 2│ │ │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │Consumer 0│ │Consumer 1│ │Consumer 2│ │ │ │(Payment) │ │(Payment) │ │(Payment) │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Key Principles ### Start Small > Avoid premature optimization. You probably don't need to scale yet. ### Scaling for Load vs Robustness | Goal | Approach | |------|----------| | **Handle more load** | Add replicas, caching, CDN | | **Improve robustness** | Multi-AZ, replicas, circuit breakers | Sometimes the same technique serves both; sometimes not. ### Rearchitecture Is a Sign of Success > Needing to redesign your architecture at certain thresholds is a sign of growth, not failure. ### Experimentation and Measurement - Load test before production - Measure actual bottlenecks (don't guess) - Profile before optimizing --- ## How MusicCorp Compares to Chapter 13 Recommendations | Book Recommendation | Our Implementation | Status | |---------------------|-------------------|--------| | **Horizontal scaling** | Kubernetes Deployments | Ready | | **Autoscaling** | Not configured | Gap | | **Caching** | Not implemented | Gap | | **Database scaling** | Single instance | Gap | | **Kafka partitioning** | Single partition | Gap | | **Load testing** | Not implemented | Gap | --- ## Action Items for MusicCorp ### Low Priority (When Needed) 1. **Add HPA for services** - Start with CPU-based autoscaling - Set min=1, max=5 for dev 2. **Add caching layer** - Redis for session/frequently accessed data - HTTP cache headers for Catalog API 3. **Scale Kafka** - Add partitions to topics - Partition key by order_id 4. **Add PostgreSQL read replica** - Route reads to replica - Writes to primary ### Future Considerations 1. **Load testing** - k6, Locust, or Gatling - Test before scaling decisions 2. **CDN for static assets** - If we add a frontend --- ## Discussion Questions 1. **Premature optimization**: We have 5 services handling demo traffic. When should we actually start thinking about scaling? 2. **Database scaling**: PostgreSQL is the likely bottleneck. Would you choose vertical scaling, read replicas, or sharding first? 3. **Kafka partitioning**: If we add partitions, how do we ensure related events (same order) go to the same partition? 4. **Caching trade-offs**: What would we cache? How do we handle cache invalidation when prices change? 5. **Cost vs performance**: How do you balance scaling costs against performance requirements? --- ## Key Quotes > "Start small. Avoid premature optimization. You probably don't need to scale yet." > "Scaling for load and scaling for robustness may require different approaches." > "Needing to rearchitect at certain thresholds is a sign of success, not failure." > "Each scaling technique adds complexity. Choose carefully." --- ## Recommended Reading - "The Art of Scalability" by Martin Abbott and Michael Fisher - "Designing Data-Intensive Applications" by Martin Kleppmann - "High Performance MySQL" by Baron Schwartz et al. - Google SRE Book (free online)