- [[#The Four Axes of Scaling|The Four Axes of Scaling]]
- [[#The Four Axes of Scaling#1. Vertical Scaling|1. Vertical Scaling]]
- [[#The Four Axes of Scaling#2. Horizontal Duplication|2. Horizontal Duplication]]
- [[#The Four Axes of Scaling#3. Data Partitioning (Sharding)|3. Data Partitioning (Sharding)]]
- [[#The Four Axes of Scaling#4. Functional Decomposition|4. Functional Decomposition]]
- [[#Combining Models|Combining Models]]
- [[#Caching|Caching]]
- [[#Caching#Cache Locations|Cache Locations]]
- [[#Caching#Cache Invalidation Strategies|Cache Invalidation Strategies]]
- [[#Caching#Caching in MusicCorp|Caching in MusicCorp]]
- [[#Autoscaling|Autoscaling]]
- [[#Autoscaling#Types of Autoscaling|Types of Autoscaling]]
- [[#Autoscaling#Kubernetes Horizontal Pod Autoscaler|Kubernetes Horizontal Pod Autoscaler]]
- [[#Autoscaling#Autoscaling Best Practices|Autoscaling Best Practices]]
- [[#Scaling Considerations for MusicCorp|Scaling Considerations for MusicCorp]]
- [[#Scaling Considerations for MusicCorp#Current Bottlenecks|Current Bottlenecks]]
- [[#Scaling Considerations for MusicCorp#Scaling by Service|Scaling by Service]]
- [[#Scaling Considerations for MusicCorp#Kafka Scaling|Kafka Scaling]]
- [[#Key Principles|Key Principles]]
- [[#Key Principles#Start Small|Start Small]]
- [[#Key Principles#Scaling for Load vs Robustness|Scaling for Load vs Robustness]]
- [[#Key Principles#Rearchitecture Is a Sign of Success|Rearchitecture Is a Sign of Success]]
- [[#Key Principles#Experimentation and Measurement|Experimentation and Measurement]]
- [[#How MusicCorp Compares to Chapter 13 Recommendations|How MusicCorp Compares to Chapter 13 Recommendations]]
- [[#Action Items for MusicCorp|Action Items for MusicCorp]]
- [[#Action Items for MusicCorp#Low Priority (When Needed)|Low Priority (When Needed)]]
- [[#Action Items for MusicCorp#Future Considerations|Future Considerations]]
- [[#Discussion Questions|Discussion Questions]]
- [[#Key Quotes|Key Quotes]]
- [[#Recommended Reading|Recommended Reading]]
## The Four Axes of Scaling
### 1. Vertical Scaling
**What:** Get a bigger machine (more CPU, RAM, disk).
**Characteristics:**
- Quick and easy on cloud platforms
- Limited by available hardware
- Doesn't improve robustness
- Good first step for quick wins
**When to use:**
- Database servers (often easier than sharding)
- Compute-bound workloads
- When horizontal scaling is complex
```yaml
# Kubernetes: Request more resources
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
```
### 2. Horizontal Duplication
**What:** Run multiple copies doing the same work.
**Characteristics:**
- Load balancers distribute requests
- Stateless services scale easily
- Read replicas for databases
- Relatively straightforward
**In MusicCorp:**
```yaml
# Scale to 3 replicas
spec:
replicas: 3
```
**Patterns:**
- Kubernetes Deployments with multiple pods
- Competing consumers for Kafka partitions
- Read replicas for PostgreSQL
### 3. Data Partitioning (Sharding)
**What:** Distribute data based on a key.
**Characteristics:**
- Great for write-heavy workloads
- Partition key selection is critical
- Cross-partition queries are complex
- Can combine with horizontal duplication
**Sharding strategies:**
| Strategy | Example | Trade-off |
|----------|---------|-----------|
| **Hash-based** | Hash(order_id) % N | Even distribution, hard to query range |
| **Range-based** | Orders by date | Good for time queries, hotspots possible |
| **Geographic** | US, EU, APAC | Data locality, cross-region queries hard |
### 4. Functional Decomposition
**What:** Extract functionality to scale independently.
**Characteristics:**
- The microservices approach
- Rightsize infrastructure per service
- Most complex but most flexible
- Enables organizational scaling too
**In MusicCorp:**
- Catalog (read-heavy) can scale differently than Payment (transaction-heavy)
- Each service has appropriate resource limits
---
## Combining Models
Real systems use multiple scaling axes:
```
┌─────────────────────────────────────────────────────────────────┐
│ MusicCorp Scaling │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Functional Decomposition (5 services) │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Catalog │ │ Order │ │ Payment │ │
│ │ (3 replicas) │ │ (2 replicas) │ │ (2 replicas) │ │
│ │ Horizontal │ │ Horizontal │ │ Horizontal │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ (Vertical scaling) │ │
│ │ + Read replicas (Horizontal) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
---
## Caching
### Cache Locations
| Location | Pros | Cons |
|----------|------|------|
| **Client-side** | Best latency, reduces server load | Hard to invalidate |
| **Server-side** | Easier invalidation, transparent | Still hits server |
| **CDN** | Global distribution | Only for static content |
### Cache Invalidation Strategies
| Strategy | How It Works | When to Use |
|----------|--------------|-------------|
| **TTL** | Expire after time | Simple, eventual consistency OK |
| **Event-driven** | Invalidate on write events | Kafka events trigger invalidation |
| **Write-through** | Update cache on write | Strong consistency needed |
| **Write-behind** | Async cache update | Performance critical, some risk |
### Caching in MusicCorp
**Current state:** No explicit caching implemented.
**Opportunities:**
- Cache album prices (rarely change)
- Cache stock counts (with short TTL)
- HTTP caching headers on Catalog API
```python
# Example: Add caching headers
@app.get("/albums/{sku}")
def get_album(sku: str):
album = db.get_album(sku)
response = Response(album.json())
response.headers["Cache-Control"] = "public, max-age=300" # 5 minutes
return response
```
---
## Autoscaling
### Types of Autoscaling
| Type | Trigger | Example |
|------|---------|---------|
| **Predictive** | Known patterns | Scale up before Black Friday |
| **Reactive** | Observed metrics | Scale on CPU > 70% |
| **Event-driven** | Queue depth | Scale on Kafka lag |
### Kubernetes Horizontal Pod Autoscaler
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
### Autoscaling Best Practices
1. **Start with failure handling** before load-based scaling
2. **Scale up fast, scale down slow** to avoid flapping
3. **Set appropriate min/max** to control costs
4. **Test autoscaling** before you need it
---
## Scaling Considerations for MusicCorp
### Current Bottlenecks
| Component | Scaling Approach | Notes |
|-----------|-----------------|-------|
| **PostgreSQL** | Vertical first, then read replicas | Single point of failure |
| **Kafka** | Add partitions | Currently single partition |
| **Application services** | Horizontal (replicas) | Already stateless |
### Scaling by Service
| Service | Current | Scale Strategy |
|---------|---------|----------------|
| **Catalog** | 1 pod | Horizontal (read-heavy, cacheable) |
| **Inventory** | 1 pod | Horizontal (with care for stock updates) |
| **Order** | 1 pod | Horizontal (saga state in DB) |
| **Payment** | 1 pod | Horizontal (external provider handles load) |
| **Shipping** | 1 pod | Horizontal (async processing) |
### Kafka Scaling
**Current:** Single partition per topic = single consumer per topic.
**Scaling path:**
1. Add partitions to topics
2. Partition by order_id (related events to same partition)
3. Add consumer instances (one per partition)
```
┌─────────────────────────────────────────────────────────────────┐
│ order.placed topic │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Partition 0│ │ Partition 1│ │ Partition 2│ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Consumer 0│ │Consumer 1│ │Consumer 2│ │
│ │(Payment) │ │(Payment) │ │(Payment) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
---
## Key Principles
### Start Small
> Avoid premature optimization. You probably don't need to scale yet.
### Scaling for Load vs Robustness
| Goal | Approach |
|------|----------|
| **Handle more load** | Add replicas, caching, CDN |
| **Improve robustness** | Multi-AZ, replicas, circuit breakers |
Sometimes the same technique serves both; sometimes not.
### Rearchitecture Is a Sign of Success
> Needing to redesign your architecture at certain thresholds is a sign of growth, not failure.
### Experimentation and Measurement
- Load test before production
- Measure actual bottlenecks (don't guess)
- Profile before optimizing
---
## How MusicCorp Compares to Chapter 13 Recommendations
| Book Recommendation | Our Implementation | Status |
|---------------------|-------------------|--------|
| **Horizontal scaling** | Kubernetes Deployments | Ready |
| **Autoscaling** | Not configured | Gap |
| **Caching** | Not implemented | Gap |
| **Database scaling** | Single instance | Gap |
| **Kafka partitioning** | Single partition | Gap |
| **Load testing** | Not implemented | Gap |
---
## Action Items for MusicCorp
### Low Priority (When Needed)
1. **Add HPA for services**
- Start with CPU-based autoscaling
- Set min=1, max=5 for dev
2. **Add caching layer**
- Redis for session/frequently accessed data
- HTTP cache headers for Catalog API
3. **Scale Kafka**
- Add partitions to topics
- Partition key by order_id
4. **Add PostgreSQL read replica**
- Route reads to replica
- Writes to primary
### Future Considerations
1. **Load testing**
- k6, Locust, or Gatling
- Test before scaling decisions
2. **CDN for static assets**
- If we add a frontend
---
## Discussion Questions
1. **Premature optimization**: We have 5 services handling demo traffic. When should we actually start thinking about scaling?
2. **Database scaling**: PostgreSQL is the likely bottleneck. Would you choose vertical scaling, read replicas, or sharding first?
3. **Kafka partitioning**: If we add partitions, how do we ensure related events (same order) go to the same partition?
4. **Caching trade-offs**: What would we cache? How do we handle cache invalidation when prices change?
5. **Cost vs performance**: How do you balance scaling costs against performance requirements?
---
## Key Quotes
> "Start small. Avoid premature optimization. You probably don't need to scale yet."
> "Scaling for load and scaling for robustness may require different approaches."
> "Needing to rearchitect at certain thresholds is a sign of success, not failure."
> "Each scaling technique adds complexity. Choose carefully."
---
## Recommended Reading
- "The Art of Scalability" by Martin Abbott and Michael Fisher
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "High Performance MySQL" by Baron Schwartz et al.
- Google SRE Book (free online)