5. How do you choose between synchronous and asynchronous communication? 

Synchronous (REST, gRPC): Use when the caller needs the result immediately to proceed, the operation is fast and bounded, and failure of the dependency is a hard failure (e't can't continue without it).

Asynchronous (queues, events, message buses): Use when the caller doesn't need an immediate result, operations are long-running, the system must tolerate downstream unavailability, or you need to decouple producers from consumers for independent scaling.

Decision signals:

Warning: Mixing sync and async carelessly produces the worst outcome — a synchronous call that blocks waiting for an async operation to complete, combining the latency of async with the coupling of sync.

6. How do you design idempotent operations for systems with retries? 

An operation is idempotent if calling it N times produces the same result as calling it once.

Techniques:

Critical for: payment processing, order creation, email sending, inventory mutations. The idempotency key TTL must exceed your maximum retry window.


7. How would you prevent race conditions in a high-throughput workflow? 

Race conditions occur when correctness depends on execution order and that order isn't guaranteed. 

Imagine the last concert ticket is available. Two people click “buy” at the exact same time. Our program checks the stock, sees one ticket left then both users get a “Purchase Confirmed” email, even though there was only one ticket.

Our logic wasn’t wrong, but timing can betray us. Race conditions can make correct code act incorrectly, breaking things unexpectedly.

A race condition happens when two or more operations try to access and modify the same data (shared resources) at the same time, and the final outcome depends on the exact order in which these operations run.

Think of two people editing the same document at the same time. Without coordination, changes can be lost or overwritten.

Approaches by scope:

High-throughput key insight: Locking reduces throughput. Design workflows so that entities that need to be atomically updated are co-located (same DB row, same aggregate) rather than spanning services.