04 - Microservice Communication Styles

- [[#The Big Idea|The Big Idea]] - [[#Inter-Process vs In-Process: The Three Big Differences|Inter-Process vs In-Process: The Three Big Differences]] - [[#Inter-Process vs In-Process: The Three Big Differences#1. Performance|1. Performance]] - [[#Inter-Process vs In-Process: The Three Big Differences#2. Changing Interfaces|2. Changing Interfaces]] - [[#Inter-Process vs In-Process: The Three Big Differences#3. Error Handling|3. Error Handling]] - [[#The Communication Styles Model|The Communication Styles Model]] - [[#Pattern: Synchronous Blocking|Pattern: Synchronous Blocking]] - [[#Pattern: Synchronous Blocking#Advantages|Advantages]] - [[#Pattern: Synchronous Blocking#Disadvantages|Disadvantages]] - [[#Pattern: Synchronous Blocking#MusicCorp Example: The Fraud Detection Problem|MusicCorp Example: The Fraud Detection Problem]] - [[#Pattern: Asynchronous Nonblocking|Pattern: Asynchronous Nonblocking]] - [[#Pattern: Asynchronous Nonblocking#Advantages|Advantages]] - [[#Pattern: Asynchronous Nonblocking#Disadvantages|Disadvantages]] - [[#Pattern: Asynchronous Nonblocking#The async/await Trap|The async/await Trap]] - [[#Pattern: Communication Through Common Data|Pattern: Communication Through Common Data]] - [[#Pattern: Communication Through Common Data#Examples|Examples]] - [[#Pattern: Communication Through Common Data#Advantages|Advantages]] - [[#Pattern: Communication Through Common Data#Disadvantages|Disadvantages]] - [[#Pattern: Communication Through Common Data#When to Use|When to Use]] - [[#Pattern: Request-Response|Pattern: Request-Response]] - [[#Pattern: Request-Response#Sync vs Async Implementation|Sync vs Async Implementation]] - [[#Pattern: Request-Response#Request vs Command|Request vs Command]] - [[#Pattern: Request-Response#Parallel vs Sequential Calls|Parallel vs Sequential Calls]] - [[#Pattern: Event-Driven Communication|Pattern: Event-Driven Communication]] - [[#Pattern: Event-Driven Communication#The Fundamental Inversion|The Fundamental Inversion]] - [[#Pattern: Event-Driven Communication#MusicCorp Example|MusicCorp Example]] - [[#Pattern: Event-Driven Communication#Events vs Messages|Events vs Messages]] - [[#Pattern: Event-Driven Communication#What Goes in an Event?|What Goes in an Event?]] - [[#Pattern: Event-Driven Communication#Implementation: Message Brokers vs Atom|Implementation: Message Brokers vs Atom]] - [[#Cautionary Tale: The Pricing System Disaster|Cautionary Tale: The Pricing System Disaster]] - [[#Key Takeaways|Key Takeaways]] - [[#How This Connects to Our MusicCorp Demo|How This Connects to Our MusicCorp Demo]] - [[#Discussion Questions|Discussion Questions]] - [[#Recommended Further Reading|Recommended Further Reading]] ## The Big Idea Don't pick technology first, then figure out your communication style. Instead: 1. Understand the **style** of communication you need 2. **Then** pick technology that fits ## Inter-Process vs In-Process: The Three Big Differences ### 1. Performance - In-process: compiler can inline, optimize, pass pointers - Inter-process: serialization, network round-trip (measured in milliseconds), payload size matters - **Implication**: APIs that make sense in-process may not make sense across a network ### 2. Changing Interfaces - In-process: IDE refactoring, atomic deployment - Inter-process: Either lockstep deployment OR phased rollout with backward compatibility - **Implication**: Interface changes require more coordination ### 3. Error Handling Five failure modes (from Tanenbaum & Steen): | Failure Type | What Happened | |--------------|---------------| | **Crash** | Server died, reboot | | **Omission** | Sent something, got nothing back | | **Timing** | Response came too late (or too early!) | | **Response** | Got a response, but it's wrong/incomplete | | **Byzantine** | Something's wrong, but parties can't agree what | Many errors are **transient** - short-lived problems that might resolve. Rich error semantics (like HTTP 4xx vs 5xx) help clients decide whether to retry. ## The Communication Styles Model ``` ┌─────────────────────┐ │ Communication │ │ Style │ └──────────┬──────────┘ ┌───────────────────┼───────────────────┐ │ │ │ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ │ Synchronous │ │ Asynchronous│ │ Common │ │ Blocking │ │ Nonblocking │ │ Data │ └──────┬──────┘ └──────┬──────┘ └─────────────┘ │ │ │ ┌──────┴──────┐ │ │ │ ┌──────┴──────┐ ┌───┴───┐ ┌─────┴─────┐ │ Request- │ │Request│ │ Event- │ │ Response │ │Response │ Driven │ └─────────────┘ └───────┘ └───────────┘ ``` **Key insight**: Start by deciding request-response vs event-driven, THEN sync vs async. --- ## Pattern: Synchronous Blocking The microservice makes a call and **waits** for a response. ### Advantages - Simple and familiar - Matches most developers' mental models - Good starting point when learning distributed systems ### Disadvantages - **Temporal coupling**: Both services must be available simultaneously - **Cascading failures**: Slow downstream = slow upstream - **Long call chains** are problematic (A→B→C→D) ### MusicCorp Example: The Fraud Detection Problem ``` Bad: Long synchronous chain Order → Payment → Fraud Detection → Customer (If any link fails or is slow, the whole order fails) Better: Move fraud detection out of critical path Order → Payment (check pre-computed fraud flag) ↑ Background job: Fraud Detection → Customer (Updates fraud flags asynchronously) ``` --- ## Pattern: Asynchronous Nonblocking The microservice sends a call and **continues processing** without waiting. Three main styles: 1. **Communication through common data** 2. **Async request-response** 3. **Event-driven** ### Advantages - No temporal coupling - services don't need to be available simultaneously - Great for long-running processes (e.g., packaging and shipping could take days) - Queues can buffer requests during load spikes ### Disadvantages - More complex mental model - More technology choices to navigate - New failure modes (dead letter queues, message ordering, etc.) ### The async/await Trap Just because code uses `async/await` doesn't mean it's truly asynchronous! If you `await` immediately, you're still blocking: ```javascript let rate = await eurToGbp; // This BLOCKS until resolved process(rate); // Won't run until above completes ``` --- ## Pattern: Communication Through Common Data One service writes data to a shared location; others read from it. ### Examples - File dropped on filesystem - **Data Lake**: Raw data in any format, consumers figure it out - **Data Warehouse**: Structured store, producers must conform to schema ### Advantages - Simple, widely understood technology - Great for interoperability (even old mainframes can read files) - Handles large data volumes well ### Disadvantages - Usually relies on polling - not good for low latency - Shared data store becomes coupling point - Bidirectional updates (both read AND write) = common coupling (bad!) ### When to Use - Legacy system integration - Large batch data transfers (multi-GB files) - When real-time isn't required --- ## Pattern: Request-Response Service sends a request, expects a response with the result. ### Sync vs Async Implementation **Synchronous**: Open connection → send request → wait → receive response on same connection **Asynchronous**: ``` Order Processor → [Request Queue] → Inventory ↓ Order Processor ← [Response Queue] ←────┘ ``` With async, the receiver needs to know where to send the response, and the requester needs to correlate responses with original requests (often via database state). ### Request vs Command Book prefers "request" over "command": - **Command** implies a directive that must be obeyed - **Request** implies something that can be rejected - Microservices should always be able to reject invalid requests ### Parallel vs Sequential Calls If you need results from 3 independent services: - **Sequential**: Total latency = sum of all latencies - **Parallel**: Total latency = slowest single call **Always prefer parallel when calls are independent.** --- ## Pattern: Event-Driven Communication A microservice **emits events** (facts about what happened). Other services react if interested. ### The Fundamental Inversion | Request-Response | Event-Driven | |-----------------|--------------| | Sender knows what recipient should do | Sender just broadcasts what happened | | Sender depends on recipient's capabilities | Sender doesn't know who's listening | | Tight domain coupling | Loose coupling | ### MusicCorp Example ``` Warehouse emits: "Order Packaged" │ ├── Notifications (sends email to customer) └── Inventory (updates stock levels) Warehouse doesn't know or care who's listening! ``` ### Events vs Messages - **Event**: A fact (payload) - "Customer registered" - **Message**: The transport medium that carries events - You put events INTO messages ### What Goes in an Event? **Option 1: Just an ID** ```json {"event": "customer.registered", "customer_id": "123"} ``` - Small payload - Receivers must call back for details (domain coupling!) - Barrage of requests if many consumers **Option 2: Fully Detailed (Recommended)** ```json { "event": "customer.registered", "customer_id": "123", "name": "John Doe", "email": "[email protected]" } ``` - Receivers are self-sufficient - Events become historical record (useful for audit/event sourcing) - Larger payloads (but Kafka allows 1MB, RabbitMQ up to 512MB) - Data becomes part of contract (can't easily remove fields) - May leak data to services that don't need it (PII concerns) **Author's rule**: Put info in events if you'd share it via API anyway. ### Implementation: Message Brokers vs Atom **Message Brokers** (RabbitMQ, Kafka): - Handle pub/sub, subscription management, consumer state - Add operational complexity (another system to run) - **"Keep middleware dumb, smarts in endpoints"** **Atom** (HTTP-based feeds): - REST-compliant, reuses HTTP scaling - Consumers must manage their own state and polling - Can become complex reimplementing broker features --- ## Cautionary Tale: The Pricing System Disaster > "A bug caused workers to crash. Dead worker released the message. Next worker picked it up. Crashed. Repeat. All workers died." Lessons learned: 1. Set **maximum retry limits** on queues 2. Implement a **dead letter queue** (message hospital) 3. Build UI to view and replay failed messages 4. These problems aren't obvious if you only know synchronous communication --- ## Key Takeaways 1. **Inter-process ≠ in-process**: Performance, error handling, and interface changes are fundamentally different 2. **Style before technology**: Decide request-response vs event-driven first 3. **Sync is simpler but couples temporally**: Both parties must be available 4. **Async decouples but adds complexity**: Dead letters, correlation, ordering 5. **Events invert responsibility**: Emitter broadcasts, receivers decide what to do 6. **Fat events > skinny events**: Avoid callbacks, reduce coupling 7. **Failure handling is hard everywhere**: Even sync calls have "did it work?" ambiguity --- ## How This Connects to Our MusicCorp Demo Our implementation demonstrates several concepts from this chapter: | Concept | Our Implementation | |---------|-------------------| | Request-Response (sync) | Order Service → Catalog/Inventory REST calls | | Event-Driven | `order.placed` → Payment → `payment.received` → Shipping | | Fat Events | Events include all needed data (order_id, amount, sku, etc.) | | Temporal Decoupling | Services can process events when ready | | Mix and Match | REST for queries, events for state changes | --- ## Discussion Questions 1. **The callback dilemma**: When an event only contains an ID, every consumer must call back for details. At what scale does this become a problem? How would you detect it? 2. **Event content evolution**: The book says data in events becomes "part of the contract." How do you handle schema evolution? What if you need to add required fields? 3. **The fraud detection refactor**: The book shows moving fraud detection out of the critical path. What are the trade-offs? Could a fraudulent order slip through? 4. **Dead letter queues**: Our MusicCorp demo doesn't have one. What would happen if the Payment service consistently failed for a specific order? How would we know? 5. **Sync vs async decision**: The book says request-response can be either sync or async. When would you choose async request-response over sync? Give a concrete example. 6. **The "keep middleware dumb" advice**: What does this mean in practice? What's an example of "smart middleware" that violates this principle? 7. **Byzantine failures**: The book mentions these as the worst kind. Can you think of a scenario in MusicCorp where services might disagree about whether something happened? 8. **Event ordering**: If the Shipping service receives `payment.received` before the Order service has transitioned to PAID state, what happens? Is this a problem in our demo? --- ## Recommended Further Reading - *Enterprise Integration Patterns* by Gregor Hohpe and Bobby Woolf - *Distributed Systems* by Tanenbaum and Steen (for failure modes)