- [[#The Big Idea|The Big Idea]]
- [[#Inter-Process vs In-Process: The Three Big Differences|Inter-Process vs In-Process: The Three Big Differences]]
- [[#Inter-Process vs In-Process: The Three Big Differences#1. Performance|1. Performance]]
- [[#Inter-Process vs In-Process: The Three Big Differences#2. Changing Interfaces|2. Changing Interfaces]]
- [[#Inter-Process vs In-Process: The Three Big Differences#3. Error Handling|3. Error Handling]]
- [[#The Communication Styles Model|The Communication Styles Model]]
- [[#Pattern: Synchronous Blocking|Pattern: Synchronous Blocking]]
- [[#Pattern: Synchronous Blocking#Advantages|Advantages]]
- [[#Pattern: Synchronous Blocking#Disadvantages|Disadvantages]]
- [[#Pattern: Synchronous Blocking#MusicCorp Example: The Fraud Detection Problem|MusicCorp Example: The Fraud Detection Problem]]
- [[#Pattern: Asynchronous Nonblocking|Pattern: Asynchronous Nonblocking]]
- [[#Pattern: Asynchronous Nonblocking#Advantages|Advantages]]
- [[#Pattern: Asynchronous Nonblocking#Disadvantages|Disadvantages]]
- [[#Pattern: Asynchronous Nonblocking#The async/await Trap|The async/await Trap]]
- [[#Pattern: Communication Through Common Data|Pattern: Communication Through Common Data]]
- [[#Pattern: Communication Through Common Data#Examples|Examples]]
- [[#Pattern: Communication Through Common Data#Advantages|Advantages]]
- [[#Pattern: Communication Through Common Data#Disadvantages|Disadvantages]]
- [[#Pattern: Communication Through Common Data#When to Use|When to Use]]
- [[#Pattern: Request-Response|Pattern: Request-Response]]
- [[#Pattern: Request-Response#Sync vs Async Implementation|Sync vs Async Implementation]]
- [[#Pattern: Request-Response#Request vs Command|Request vs Command]]
- [[#Pattern: Request-Response#Parallel vs Sequential Calls|Parallel vs Sequential Calls]]
- [[#Pattern: Event-Driven Communication|Pattern: Event-Driven Communication]]
- [[#Pattern: Event-Driven Communication#The Fundamental Inversion|The Fundamental Inversion]]
- [[#Pattern: Event-Driven Communication#MusicCorp Example|MusicCorp Example]]
- [[#Pattern: Event-Driven Communication#Events vs Messages|Events vs Messages]]
- [[#Pattern: Event-Driven Communication#What Goes in an Event?|What Goes in an Event?]]
- [[#Pattern: Event-Driven Communication#Implementation: Message Brokers vs Atom|Implementation: Message Brokers vs Atom]]
- [[#Cautionary Tale: The Pricing System Disaster|Cautionary Tale: The Pricing System Disaster]]
- [[#Key Takeaways|Key Takeaways]]
- [[#How This Connects to Our MusicCorp Demo|How This Connects to Our MusicCorp Demo]]
- [[#Discussion Questions|Discussion Questions]]
- [[#Recommended Further Reading|Recommended Further Reading]]
## The Big Idea
Don't pick technology first, then figure out your communication style. Instead:
1. Understand the **style** of communication you need
2. **Then** pick technology that fits
## Inter-Process vs In-Process: The Three Big Differences
### 1. Performance
- In-process: compiler can inline, optimize, pass pointers
- Inter-process: serialization, network round-trip (measured in milliseconds), payload size matters
- **Implication**: APIs that make sense in-process may not make sense across a network
### 2. Changing Interfaces
- In-process: IDE refactoring, atomic deployment
- Inter-process: Either lockstep deployment OR phased rollout with backward compatibility
- **Implication**: Interface changes require more coordination
### 3. Error Handling
Five failure modes (from Tanenbaum & Steen):
| Failure Type | What Happened |
|--------------|---------------|
| **Crash** | Server died, reboot |
| **Omission** | Sent something, got nothing back |
| **Timing** | Response came too late (or too early!) |
| **Response** | Got a response, but it's wrong/incomplete |
| **Byzantine** | Something's wrong, but parties can't agree what |
Many errors are **transient** - short-lived problems that might resolve. Rich error semantics (like HTTP 4xx vs 5xx) help clients decide whether to retry.
## The Communication Styles Model
```
┌─────────────────────┐
│ Communication │
│ Style │
└──────────┬──────────┘
┌───────────────────┼───────────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ Synchronous │ │ Asynchronous│ │ Common │
│ Blocking │ │ Nonblocking │ │ Data │
└──────┬──────┘ └──────┬──────┘ └─────────────┘
│ │
│ ┌──────┴──────┐
│ │ │
┌──────┴──────┐ ┌───┴───┐ ┌─────┴─────┐
│ Request- │ │Request│ │ Event- │
│ Response │ │Response │ Driven │
└─────────────┘ └───────┘ └───────────┘
```
**Key insight**: Start by deciding request-response vs event-driven, THEN sync vs async.
---
## Pattern: Synchronous Blocking
The microservice makes a call and **waits** for a response.
### Advantages
- Simple and familiar
- Matches most developers' mental models
- Good starting point when learning distributed systems
### Disadvantages
- **Temporal coupling**: Both services must be available simultaneously
- **Cascading failures**: Slow downstream = slow upstream
- **Long call chains** are problematic (A→B→C→D)
### MusicCorp Example: The Fraud Detection Problem
```
Bad: Long synchronous chain
Order → Payment → Fraud Detection → Customer
(If any link fails or is slow, the whole order fails)
Better: Move fraud detection out of critical path
Order → Payment (check pre-computed fraud flag)
↑
Background job: Fraud Detection → Customer
(Updates fraud flags asynchronously)
```
---
## Pattern: Asynchronous Nonblocking
The microservice sends a call and **continues processing** without waiting.
Three main styles:
1. **Communication through common data**
2. **Async request-response**
3. **Event-driven**
### Advantages
- No temporal coupling - services don't need to be available simultaneously
- Great for long-running processes (e.g., packaging and shipping could take days)
- Queues can buffer requests during load spikes
### Disadvantages
- More complex mental model
- More technology choices to navigate
- New failure modes (dead letter queues, message ordering, etc.)
### The async/await Trap
Just because code uses `async/await` doesn't mean it's truly asynchronous! If you `await` immediately, you're still blocking:
```javascript
let rate = await eurToGbp; // This BLOCKS until resolved
process(rate); // Won't run until above completes
```
---
## Pattern: Communication Through Common Data
One service writes data to a shared location; others read from it.
### Examples
- File dropped on filesystem
- **Data Lake**: Raw data in any format, consumers figure it out
- **Data Warehouse**: Structured store, producers must conform to schema
### Advantages
- Simple, widely understood technology
- Great for interoperability (even old mainframes can read files)
- Handles large data volumes well
### Disadvantages
- Usually relies on polling - not good for low latency
- Shared data store becomes coupling point
- Bidirectional updates (both read AND write) = common coupling (bad!)
### When to Use
- Legacy system integration
- Large batch data transfers (multi-GB files)
- When real-time isn't required
---
## Pattern: Request-Response
Service sends a request, expects a response with the result.
### Sync vs Async Implementation
**Synchronous**: Open connection → send request → wait → receive response on same connection
**Asynchronous**:
```
Order Processor → [Request Queue] → Inventory
↓
Order Processor ← [Response Queue] ←────┘
```
With async, the receiver needs to know where to send the response, and the requester needs to correlate responses with original requests (often via database state).
### Request vs Command
Book prefers "request" over "command":
- **Command** implies a directive that must be obeyed
- **Request** implies something that can be rejected
- Microservices should always be able to reject invalid requests
### Parallel vs Sequential Calls
If you need results from 3 independent services:
- **Sequential**: Total latency = sum of all latencies
- **Parallel**: Total latency = slowest single call
**Always prefer parallel when calls are independent.**
---
## Pattern: Event-Driven Communication
A microservice **emits events** (facts about what happened). Other services react if interested.
### The Fundamental Inversion
| Request-Response | Event-Driven |
|-----------------|--------------|
| Sender knows what recipient should do | Sender just broadcasts what happened |
| Sender depends on recipient's capabilities | Sender doesn't know who's listening |
| Tight domain coupling | Loose coupling |
### MusicCorp Example
```
Warehouse emits: "Order Packaged"
│
├── Notifications (sends email to customer)
└── Inventory (updates stock levels)
Warehouse doesn't know or care who's listening!
```
### Events vs Messages
- **Event**: A fact (payload) - "Customer registered"
- **Message**: The transport medium that carries events
- You put events INTO messages
### What Goes in an Event?
**Option 1: Just an ID**
```json
{"event": "customer.registered", "customer_id": "123"}
```
- Small payload
- Receivers must call back for details (domain coupling!)
- Barrage of requests if many consumers
**Option 2: Fully Detailed (Recommended)**
```json
{
"event": "customer.registered",
"customer_id": "123",
"name": "John Doe",
"email": "
[email protected]"
}
```
- Receivers are self-sufficient
- Events become historical record (useful for audit/event sourcing)
- Larger payloads (but Kafka allows 1MB, RabbitMQ up to 512MB)
- Data becomes part of contract (can't easily remove fields)
- May leak data to services that don't need it (PII concerns)
**Author's rule**: Put info in events if you'd share it via API anyway.
### Implementation: Message Brokers vs Atom
**Message Brokers** (RabbitMQ, Kafka):
- Handle pub/sub, subscription management, consumer state
- Add operational complexity (another system to run)
- **"Keep middleware dumb, smarts in endpoints"**
**Atom** (HTTP-based feeds):
- REST-compliant, reuses HTTP scaling
- Consumers must manage their own state and polling
- Can become complex reimplementing broker features
---
## Cautionary Tale: The Pricing System Disaster
> "A bug caused workers to crash. Dead worker released the message. Next worker picked it up. Crashed. Repeat. All workers died."
Lessons learned:
1. Set **maximum retry limits** on queues
2. Implement a **dead letter queue** (message hospital)
3. Build UI to view and replay failed messages
4. These problems aren't obvious if you only know synchronous communication
---
## Key Takeaways
1. **Inter-process ≠ in-process**: Performance, error handling, and interface changes are fundamentally different
2. **Style before technology**: Decide request-response vs event-driven first
3. **Sync is simpler but couples temporally**: Both parties must be available
4. **Async decouples but adds complexity**: Dead letters, correlation, ordering
5. **Events invert responsibility**: Emitter broadcasts, receivers decide what to do
6. **Fat events > skinny events**: Avoid callbacks, reduce coupling
7. **Failure handling is hard everywhere**: Even sync calls have "did it work?" ambiguity
---
## How This Connects to Our MusicCorp Demo
Our implementation demonstrates several concepts from this chapter:
| Concept | Our Implementation |
|---------|-------------------|
| Request-Response (sync) | Order Service → Catalog/Inventory REST calls |
| Event-Driven | `order.placed` → Payment → `payment.received` → Shipping |
| Fat Events | Events include all needed data (order_id, amount, sku, etc.) |
| Temporal Decoupling | Services can process events when ready |
| Mix and Match | REST for queries, events for state changes |
---
## Discussion Questions
1. **The callback dilemma**: When an event only contains an ID, every consumer must call back for details. At what scale does this become a problem? How would you detect it?
2. **Event content evolution**: The book says data in events becomes "part of the contract." How do you handle schema evolution? What if you need to add required fields?
3. **The fraud detection refactor**: The book shows moving fraud detection out of the critical path. What are the trade-offs? Could a fraudulent order slip through?
4. **Dead letter queues**: Our MusicCorp demo doesn't have one. What would happen if the Payment service consistently failed for a specific order? How would we know?
5. **Sync vs async decision**: The book says request-response can be either sync or async. When would you choose async request-response over sync? Give a concrete example.
6. **The "keep middleware dumb" advice**: What does this mean in practice? What's an example of "smart middleware" that violates this principle?
7. **Byzantine failures**: The book mentions these as the worst kind. Can you think of a scenario in MusicCorp where services might disagree about whether something happened?
8. **Event ordering**: If the Shipping service receives `payment.received` before the Order service has transitioned to PAID state, what happens? Is this a problem in our demo?
---
## Recommended Further Reading
- *Enterprise Integration Patterns* by Gregor Hohpe and Bobby Woolf
- *Distributed Systems* by Tanenbaum and Steen (for failure modes)