16. What are the signals that a system needs vertical scaling vs horizontal scaling? 

Vertical scaling (bigger machine): More CPU, RAM, faster disk on the same node.

Signals for vertical:

Horizontal scaling (more machines): Add more nodes, distribute load.

Signals for horizontal:

Practical note: Most stateless application tiers should always be horizontally scaled. Databases should be vertically scaled first (it's simpler), then add read replicas, then shard only when forced to.

17. How do you design read-heavy workloads for maximum throughput? 

Reads are much easier to scale than writes because reads are (usually) side-effect free.

Layered strategy:

Connection pooling and efficient query patterns (proper indexes, avoiding N+1) are table stakes before any of the above.

18. What patterns help reduce database load without harming consistency? 

Read replicas — Offload reads to replicas. Slight lag acceptable for most reads; use primary for reads that require latest state (e.g., post-write reads).

Caching with write-through or TTL — Cache database results. Write-through keeps consistency; TTL-based cache accepts a staleness window.

Denormalisation — Pre-join data for frequent read patterns. Reduces query complexity at cost of write complexity and storage.

Partial indexes and covering indexes — The database does less work if queries are satisfied entirely from an index (no heap fetch). Profile query plans regularly.

Query batching and DataLoader pattern — Prevent N+1 queries. Batch multiple single-row fetches into one WHERE id IN (...) query. Standard in GraphQL ecosystems (DataLoader).

Async writes with eventual consistency — For non-critical writes (analytics, audit logs, metrics), use an async pipeline (Kafka → consumer → DB). Prevents write spikes from hitting the primary.

Database connection pooling (PgBouncer) — Reduces connection overhead. A database handling 10,000 connections is far less efficient than 100 pooled connections through PgBouncer.