16. What are the signals that a system needs vertical scaling vs horizontal scaling?
Vertical scaling (bigger machine): More CPU, RAM, faster disk on the same node.
Signals for vertical:
Single-threaded workloads that can't parallelise (some DB operations, in-memory computation)
Working set doesn't fit in memory — adding RAM is cheaper than sharding
Quick fix when horizontal scaling requires significant re-architecture
Stateful systems (databases) where horizontal scaling is complex
Horizontal scaling (more machines): Add more nodes, distribute load.
Signals for horizontal:
CPU-bound workloads that parallelise well (stateless APIs, workers)
You're at the ceiling of available vertical options
Need geographic distribution
Fault tolerance requires redundancy across machines
Traffic is spiky — you want to scale out and scale back in
Practical note: Most stateless application tiers should always be horizontally scaled. Databases should be vertically scaled first (it's simpler), then add read replicas, then shard only when forced to.
17. How do you design read-heavy workloads for maximum throughput?
Reads are much easier to scale than writes because reads are (usually) side-effect free.
Layered strategy:
Edge caching (CDN): For cacheable content (static assets, public API responses), cache at the CDN. Zero origin load for cache hits.
Application cache (Redis/Memcached): Cache expensive query results. Key design matters — use content-addressable keys so invalidation is precise.
Read replicas: Route all reads to read replicas; writes go to primary. Most relational databases support this natively. Adds eventual consistency lag (usually milliseconds).
CQRS (Command Query Responsibility Segregation): Separate the read model from the write model. Reads query a materialised view optimised for query patterns (denormalised, pre-joined). Writes update the normalised write model and propagate changes asynchronously.
Materialised views / pre-computation: For expensive aggregations, compute results on a schedule or on write and store them. Reads are then O(1).
Connection pooling and efficient query patterns (proper indexes, avoiding N+1) are table stakes before any of the above.
18. What patterns help reduce database load without harming consistency?
Read replicas — Offload reads to replicas. Slight lag acceptable for most reads; use primary for reads that require latest state (e.g., post-write reads).
Caching with write-through or TTL — Cache database results. Write-through keeps consistency; TTL-based cache accepts a staleness window.
Denormalisation — Pre-join data for frequent read patterns. Reduces query complexity at cost of write complexity and storage.
Partial indexes and covering indexes — The database does less work if queries are satisfied entirely from an index (no heap fetch). Profile query plans regularly.
Query batching and DataLoader pattern — Prevent N+1 queries. Batch multiple single-row fetches into one WHERE id IN (...) query. Standard in GraphQL ecosystems (DataLoader).
Async writes with eventual consistency — For non-critical writes (analytics, audit logs, metrics), use an async pipeline (Kafka → consumer → DB). Prevents write spikes from hitting the primary.
Database connection pooling (PgBouncer) — Reduces connection overhead. A database handling 10,000 connections is far less efficient than 100 pooled connections through PgBouncer.