- [[#The Big Idea|The Big Idea]]
- [[#Looking for the Ideal Technology|Looking for the Ideal Technology]]
- [[#Technology Options Overview|Technology Options Overview]]
- [[#Remote Procedure Calls (RPC)|Remote Procedure Calls (RPC)]]
- [[#Remote Procedure Calls (RPC)#Advantages|Advantages]]
- [[#Remote Procedure Calls (RPC)#Challenges|Challenges]]
- [[#Remote Procedure Calls (RPC)#Author's Recommendation|Author's Recommendation]]
- [[#REST|REST]]
- [[#REST#HTTP Gives You for Free|HTTP Gives You for Free]]
- [[#REST#HATEOAS (Hypermedia)|HATEOAS (Hypermedia)]]
- [[#REST#Where REST Excels|Where REST Excels]]
- [[#GraphQL|GraphQL]]
- [[#GraphQL#The Problem It Solves|The Problem It Solves]]
- [[#GraphQL#Challenges|Challenges]]
- [[#GraphQL#Where to Use It|Where to Use It]]
- [[#Message Brokers|Message Brokers]]
- [[#Message Brokers#Queues vs Topics|Queues vs Topics]]
- [[#Message Brokers#Guaranteed Delivery|Guaranteed Delivery]]
- [[#Message Brokers#Kafka Special Features|Kafka Special Features]]
- [[#Schemas: The Author's Strong Opinion|Schemas: The Author's Strong Opinion]]
- [[#Schemas: The Author's Strong Opinion#Schema Types by Technology|Schema Types by Technology]]
- [[#Schemas: The Author's Strong Opinion#Structural vs Semantic Breakages|Structural vs Semantic Breakages]]
- [[#Schemas: The Author's Strong Opinion#Schema Comparison Tools|Schema Comparison Tools]]
- [[#Handling Breaking Changes|Handling Breaking Changes]]
- [[#Handling Breaking Changes#The Goal: Independent Deployability|The Goal: Independent Deployability]]
- [[#Handling Breaking Changes#Five Strategies to Avoid Breaking Changes|Five Strategies to Avoid Breaking Changes]]
- [[#Handling Breaking Changes#When Breaking Changes Are Unavoidable|When Breaking Changes Are Unavoidable]]
- [[#Handling Breaking Changes#Expand and Contract Pattern|Expand and Contract Pattern]]
- [[#Handling Breaking Changes#Semantic Versioning|Semantic Versioning]]
- [[#Client Libraries: A Double-Edged Sword|Client Libraries: A Double-Edged Sword]]
- [[#Client Libraries: A Double-Edged Sword#The Problem|The Problem]]
- [[#Client Libraries: A Double-Edged Sword#The AWS Model (Recommended)|The AWS Model (Recommended)]]
- [[#Client Libraries: A Double-Edged Sword#Netflix's Approach|Netflix's Approach]]
- [[#Service Discovery|Service Discovery]]
- [[#Service Discovery#DNS (Simple but Limited)|DNS (Simple but Limited)]]
- [[#Service Discovery#Dynamic Registries|Dynamic Registries]]
- [[#Service Discovery#Kubernetes Service Discovery|Kubernetes Service Discovery]]
- [[#API Gateways vs Service Meshes|API Gateways vs Service Meshes]]
- [[#API Gateways vs Service Meshes#API Gateway: Do's and Don'ts|API Gateway: Do's and Don'ts]]
- [[#API Gateways vs Service Meshes#Service Mesh Features|Service Mesh Features]]
- [[#API Gateways vs Service Meshes#How Service Meshes Work|How Service Meshes Work]]
- [[#API Gateways vs Service Meshes#Do You Need a Service Mesh?|Do You Need a Service Mesh?]]
- [[#Documenting Services|Documenting Services]]
- [[#Documenting Services#Explicit Schemas Help, But Aren't Enough|Explicit Schemas Help, But Aren't Enough]]
- [[#Documenting Services#Tools|Tools]]
- [[#Documenting Services#The "Humane Registry"|The "Humane Registry"]]
- [[#How MusicCorp Compares to Chapter 5 Recommendations|How MusicCorp Compares to Chapter 5 Recommendations]]
- [[#Discussion Questions|Discussion Questions]]
- [[#Key Quotes|Key Quotes]]
- [[#Recommended Reading|Recommended Reading]]
## The Big Idea
This chapter is about the **practical technology choices** for implementing the communication styles from Chapter 4. The key principle: let your communication style guide technology selection, not the other way around.
## Looking for the Ideal Technology
Five criteria for evaluating communication technology:
| Criterion | Why It Matters |
|-----------|---------------|
| **Backward Compatibility** | Adding fields shouldn't break clients |
| **Explicit Interface** | Clear contract between service and consumers |
| **Technology Agnostic** | Don't lock yourself into one stack |
| **Simple for Consumers** | Easy adoption without tight coupling |
| **Hide Implementation** | Internal changes shouldn't break clients |
---
## Technology Options Overview
```
┌──────────────────────────────────────────────────────────────────┐
│ Communication Technology │
├─────────────────┬─────────────────┬─────────────────┬────────────┤
│ RPC │ REST │ GraphQL │ Brokers │
│ (gRPC, SOAP) │ (HTTP + JSON) │ (Queries) │ (Kafka, │
│ │ │ │ RabbitMQ) │
├─────────────────┼─────────────────┼─────────────────┼────────────┤
│ Sync req-resp │ Sync req-resp │ Sync req-resp │ Async │
│ Binary protocol │ Text protocol │ Query language │ Events │
│ Schema required │ Schema optional │ Schema required │ Pub/Sub │
└─────────────────┴─────────────────┴─────────────────┴────────────┘
```
---
## Remote Procedure Calls (RPC)
Makes remote calls look like local calls. Examples: gRPC, SOAP, Thrift.
### Advantages
- Automatic client stub generation from schema
- Binary protocols = smaller payloads, faster serialization
- Strong typing and IDE support
### Challenges
- **Technology coupling**: Some (like Java RMI) lock you into a platform
- **Local ≠ Remote**: Network failures, latency, and marshaling costs are hidden
- **Brittleness**: Adding/removing fields can break client stubs (especially Java RMI)
### Author's Recommendation
>
> "If I was looking at options in this space, **gRPC would be at the top of my list**."
gRPC excels when you control both client and server. For wide interoperability, prefer REST.
---
## REST
Architectural style built on resources, representations, and HTTP verbs.
### HTTP Gives You for Free
- Caching (Varnish, CDNs)
- Load balancing (nginx, HAProxy)
- Security (TLS, auth mechanisms)
- Well-understood error codes (4xx, 5xx)
### HATEOAS (Hypermedia)
The theory: clients discover endpoints via links, not hardcoded URLs.
```xml
<album>
<name>Give Blood</name>
<link rel="/artist" href="/artist/theBrakes" />
<link rel="/instantpurchase" href="/instantPurchase/1234" />
</album>
```
**Reality check**: Author admits HATEOAS is "rarely practiced" and hasn't seen evidence it delivers enough value for the effort.
### Where REST Excels
- External APIs (wide client compatibility)
- Caching-heavy workloads
- When you need maximum interoperability
---
## GraphQL
Client-defined queries that aggregate data from multiple services.
### The Problem It Solves
Mobile app needs customer info + last 5 orders. Without GraphQL:
- 2 API calls (Customer + Orders)
- Over-fetching (gets all fields, only needs a few)
- Wastes bandwidth and battery
With GraphQL: One query, exactly the fields needed.
### Challenges
- Expensive queries can hammer the server (no query planner like SQL)
- Caching is complex (can't use HTTP caching easily)
- Works better for reads than writes
- Can reinforce "microservices as database wrappers" mindset
### Where to Use It
- Mobile clients (constrained bandwidth)
- External APIs that need flexibility (e.g., GitHub)
- **NOT** for general microservice-to-microservice communication
---
## Message Brokers
Middleware for asynchronous communication. Examples: RabbitMQ, Kafka, AWS SQS/SNS.
### Queues vs Topics
| Queues | Topics |
|--------|--------|
| Point-to-point | Pub/sub |
| One consumer group | Multiple consumer groups |
| Load distribution | Event broadcast |
| Sender knows destination | Sender doesn't know who's listening |
### Guaranteed Delivery
The killer feature: broker holds messages until delivered, even if downstream is unavailable.
> **Warning**: "Guaranteed delivery" means different things to different brokers. Read the docs carefully!
### Kafka Special Features
- **Message permanence**: Messages stored forever (not just until consumed)
- **Massive scale**: 50,000+ producers/consumers on one cluster (Netflix)
- **Stream processing**: KSQL for real-time transformations
- **Ordering**: Guaranteed within a partition (not across partitions)
---
## Schemas: The Author's Strong Opinion
> "I think that having an explicit schema more than offsets any perceived benefit of having schemaless communication."
### Schema Types by Technology
| Technology | Schema Format |
|------------|---------------|
| REST (JSON) | JSON Schema, OpenAPI |
| REST (XML) | XSD |
| gRPC | Protocol Buffers |
| SOAP | WSDL |
| Kafka | Avro (often), Protocol Buffers |
| Events | CloudEvents, AsyncAPI |
### Structural vs Semantic Breakages
| Type | Example | How to Catch |
|------|---------|--------------|
| **Structural** | Remove a field | Schema comparison tools |
| **Semantic** | `calculate(a,b)` changes from add to multiply | Testing only |
### Schema Comparison Tools
- **Protolock**: Protocol buffers
- **json-schema-diff-validator**: JSON Schema
- **openapi-diff**: OpenAPI
- **Confluent Schema Registry**: JSON Schema, Avro, Protocol Buffers
---
## Handling Breaking Changes
### The Goal: Independent Deployability
Never force consumers to upgrade in lockstep with you.
### Five Strategies to Avoid Breaking Changes
1. **Expansion changes**: Only add, never remove
2. **Tolerant reader**: Consumers ignore unknown fields
3. **Right technology**: gRPC's field numbers handle additions gracefully
4. **Explicit interface**: Clear schema = clear boundaries
5. **Catch breaks early**: Schema comparison in CI
### When Breaking Changes Are Unavoidable
| Option | Description | Author's Take |
|--------|-------------|---------------|
| **Lockstep deployment** | Everyone upgrades together | "Flies in the face of independent deployability" |
| **Coexist versions** | Run V1 and V2 simultaneously | Problematic (branched code, shared state) |
| **Emulate old interface** | V2 service exposes both V1 and V2 endpoints | **Preferred approach** |
### Expand and Contract Pattern
```
Phase 1: Expand (add V2 endpoint, keep V1)
└── Consumers migrate at their own pace
Phase 2: Contract (remove V1 when no longer used)
└── Track usage to know when safe
```
### Semantic Versioning
`MAJOR.MINOR.PATCH`
- MAJOR: Breaking changes
- MINOR: New backward-compatible features
- PATCH: Bug fixes
---
## Client Libraries: A Double-Edged Sword
### The Problem
If the same team writes server AND client library, logic leaks into the client.
### The AWS Model (Recommended)
- AWS exposes raw SOAP/REST APIs
- SDKs are written by **different teams** (or community)
- Clients control when to upgrade
### Netflix's Approach
Client libraries handle:
- Service discovery
- Failure modes
- Logging
- Retry logic
But even Netflix admits this has led to "problematic coupling."
---
## Service Discovery
How do microservices find each other?
### DNS (Simple but Limited)
```
accounts.musiccorp.net → 192.168.1.10
accounts-uat.musiccorp.net → 192.168.2.10
```
**Problem**: TTL caching means stale entries. Solution: Point DNS to a load balancer.
### Dynamic Registries
| Tool | Key Features |
|------|--------------|
| **Consul** | HTTP API, built-in DNS, health checks |
| **etcd** | Bundled with Kubernetes |
| **ZooKeeper** | "Better solutions exist nowadays" |
### Kubernetes Service Discovery
- Pods register with metadata
- Services pattern-match to find pods
- Built-in, no extra tools needed
---
## API Gateways vs Service Meshes
```
┌──────────────────────┐
External │ │
Clients ───────▶│ API Gateway │───── North-South
│ (perimeter) │
└──────────────────────┘
│
┌────────▼────────┐
│ │
┌──────┴──────┐ ┌───────┴───────┐
│ Service │ │ Service │
│ Mesh │ │ Mesh │──── East-West
│ (proxy) │ │ (proxy) │
└──────┬──────┘ └───────┬───────┘
│ │
┌──────▼──────┐ ┌───────▼───────┐
│ Microservice│ │ Microservice │
│ A │ │ B │
└─────────────┘ └───────────────┘
```
### API Gateway: Do's and Don'ts
**Do:**
- Route external requests to internal services
- Handle API keys, rate limiting, logging
- Expose developer portal
**Don't:**
- Aggregate calls (use GraphQL or BFF pattern instead)
- Rewrite protocols ("turn SOAP into REST")
- Put business logic in the gateway
> "Keeping smarts in our microservices helps [independent deployability]. If we now also have to make changes in intermediate layers, things become more problematic."
### Service Mesh Features
- Mutual TLS (mTLS)
- Correlation IDs
- Service discovery
- Load balancing
- Consistent behavior across languages
### How Service Meshes Work
Local proxy (often Envoy) runs alongside each microservice instance:
```
Order Processor → Local Proxy → Network → Local Proxy → Payment
```
The proxy handles retries, TLS, tracing—microservice doesn't know it's there.
### Do You Need a Service Mesh?
Author's advice for years was: "If you can wait 6 months, wait 6 months."
Now (2024): Space has matured. Consider if:
- Running on Kubernetes
- Have many microservices (not just 5)
- Multiple programming languages
- Need consistent cross-cutting behavior
---
## Documenting Services
### Explicit Schemas Help, But Aren't Enough
Schemas show structure. Documentation explains behavior.
### Tools
| Type | Options |
|------|---------|
| REST APIs | OpenAPI + portal (Ambassador, SwaggerUI) |
| Events | AsyncAPI, CloudEvents |
| Service Catalog | Spotify Backstage, Ambassador Service Catalog |
### The "Humane Registry"
More than a wiki—pull in live data:
- Service discovery info
- Health status
- API documentation
- Team ownership
**Example**: Financial Times' Biz Ops calculates a "System Operability Score" based on completeness of metadata, health checks, etc.
---
## How MusicCorp Compares to Chapter 5 Recommendations
| Book Recommendation | Our Implementation | Status |
|---------------------|-------------------|--------|
| **REST for sync calls** | Flask REST APIs (JSON) | Done |
| **Message broker for events** | Kafka (confluent-kafka) | Done |
| **Explicit schemas** | No formal schema (JSON by convention) | Gap |
| **OpenAPI documentation** | Swagger UI at /docs | Done |
| **Schema comparison in CI** | Not implemented | Gap |
| **Kubernetes service discovery** | K8s Services with DNS | Done |
| **Correlation IDs** | X-Correlation-ID header | Done |
| **Tolerant reader pattern** | Implicit (dict.get()) | Partial |
| **Service mesh** | Not implemented | Not yet |
| **API Gateway** | nginx Ingress | Done |
| **Service catalog/registry** | Not implemented | Gap |
---
## Discussion Questions
1. **Schema first or code first?** We have working REST APIs with OpenAPI specs generated from code. Should we write specs first (more work upfront) or continue generating from code?
2. **Kafka benefits**: We migrated from Redis pub/sub to Kafka. This gives us message persistence, replay capability, and better scalability. Future improvements could include schema registry integration and dead letter queues.
3. **Breaking change handling**: If we need to add a required field to `order.placed` event, how do we handle it? What's our strategy for versioning events?
4. **The API gateway question**: We now have nginx Ingress routing external traffic. What additional gateway features might we need as we scale?
5. **Service mesh ROI**: The book says 5 microservices don't justify a service mesh. We have 5. What would need to change for it to make sense?
6. **Consumer-driven contracts**: The book mentions Pact for testing contracts. Should we implement this? How would it work with our event-driven architecture?
7. **The GraphQL question**: We don't have a BFF or GraphQL. If we built a mobile app that needed data from multiple services, would we add GraphQL or create a BFF?
---
## Key Quotes
> "I think that having an explicit schema more than offsets any perceived benefit of having schemaless communication."
> "Keep middleware dumb, smarts in endpoints."
> "Lockstep deployment flies in the face of independent deployability."
> "If you're having to support a wide variety of other applications that might need to talk to your microservices, [REST] would likely be a better fit [than gRPC]."
> "Do you need a service mesh? ...If you have five microservices, I don't think you can easily justify a service mesh."
---
## Recommended Reading
- *REST in Practice* by Jim Webber et al.
- *Designing Event-Driven Systems* by Ben Stopford (Kafka deep dive)
- *Kafka: The Definitive Guide* by Neha Narkhede et al.