Manab's Notes - SECTION 4: Performance & Scalability

16. What are the signals that a system needs vertical scaling vs horizontal scaling?

Vertical scaling (bigger machine): More CPU, RAM, faster disk on the same node.

Signals for vertical:

Single-threaded workloads that can't parallelise (some DB operations, in-memory computation)
Working set doesn't fit in memory — adding RAM is cheaper than sharding
Quick fix when horizontal scaling requires significant re-architecture
Stateful systems (databases) where horizontal scaling is complex

Horizontal scaling (more machines): Add more nodes, distribute load.

Signals for horizontal:

CPU-bound workloads that parallelise well (stateless APIs, workers)
You're at the ceiling of available vertical options
Need geographic distribution
Fault tolerance requires redundancy across machines
Traffic is spiky — you want to scale out and scale back in

Practical note: Most stateless application tiers should always be horizontally scaled. Databases should be vertically scaled first (it's simpler), then add read replicas, then shard only when forced to.

17. How do you design read-heavy workloads for maximum throughput?

Reads are much easier to scale than writes because reads are (usually) side-effect free.

Layered strategy:

Edge caching (CDN): For cacheable content (static assets, public API responses), cache at the CDN. Zero origin load for cache hits.
Application cache (Redis/Memcached): Cache expensive query results. Key design matters — use content-addressable keys so invalidation is precise.
Read replicas: Route all reads to read replicas; writes go to primary. Most relational databases support this natively. Adds eventual consistency lag (usually milliseconds).
CQRS (Command Query Responsibility Segregation): Separate the read model from the write model. Reads query a materialised view optimised for query patterns (denormalised, pre-joined). Writes update the normalised write model and propagate changes asynchronously.
Materialised views / pre-computation: For expensive aggregations, compute results on a schedule or on write and store them. Reads are then O(1).

Connection pooling and efficient query patterns (proper indexes, avoiding N+1) are table stakes before any of the above.

18. What patterns help reduce database load without harming consistency?

Read replicas — Offload reads to replicas. Slight lag acceptable for most reads; use primary for reads that require latest state (e.g., post-write reads).

Caching with write-through or TTL — Cache database results. Write-through keeps consistency; TTL-based cache accepts a staleness window.

Denormalisation — Pre-join data for frequent read patterns. Reduces query complexity at cost of write complexity and storage.

Partial indexes and covering indexes — The database does less work if queries are satisfied entirely from an index (no heap fetch). Profile query plans regularly.

Query batching and DataLoader pattern — Prevent N+1 queries. Batch multiple single-row fetches into one WHERE id IN (...) query. Standard in GraphQL ecosystems (DataLoader).

Async writes with eventual consistency — For non-critical writes (analytics, audit logs, metrics), use an async pipeline (Kafka → consumer → DB). Prevents write spikes from hitting the primary.

Database connection pooling (PgBouncer) — Reduces connection overhead. A database handling 10,000 connections is far less efficient than 100 pooled connections through PgBouncer.

19. How do you approach hot partition problems?

A hot partition occurs when disproportionate traffic routes to one shard/partition, defeating the purpose of distribution.

Causes: Poor partition key choice (e.g., partitioning by date means all writes go to "today"), viral content, celebrity accounts.

Solutions:

Partition key redesign: Choose high-cardinality, uniformly distributed keys. Composite keys (user_id + content_type) often distribute better than single keys.
Key salting / sharding suffix: Append a random suffix (0–N) to hot keys, distributing them across N partitions. Reads must fan out across all N partitions and aggregate — adds read complexity.
Adaptive partitioning: Some databases (DynamoDB, Cosmos DB) detect hot partitions and automatically split them.
Local aggregation before write: For high-volume writes to the same key (counters, leaderboards), aggregate locally in memory for a short window before writing. Reduces write amplification by orders of magnitude.
Caching hot keys: Cache the hot partition's data at L1/L2. Cache handles read pressure; database only sees cache misses.
Dedicated infrastructure for known hot keys: Identify them in advance (top K items by traffic) and route them to dedicated resources.

20. How do you handle high-volume write workloads?

Writes are harder to scale than reads because they require coordination and durability.

Strategies:

Write buffering / batching: Accumulate writes in memory and flush to storage in batches. Reduces I/O operations dramatically. Risk: data loss on crash — mitigate with WAL (Write-Ahead Log) or acknowledgement strategies.
Append-only / log-structured storage: LSM trees (LevelDB, RocksDB, Cassandra) convert random writes into sequential writes, which are an order of magnitude faster on disk. Used in many write-heavy databases.
Async write pipelines: Accept write to a queue (Kafka), acknowledge to the client immediately, process durably in the background. Decouples write acceptance from write persistence.
Event sourcing: Append events to an immutable log — appends are extremely fast. State is derived by replaying events. Write throughput is limited only by log throughput.
Sharding by write key: Distribute writes across shards by a natural key (user_id, tenant_id). Each shard handles a fraction of total write load.
CQRS: Separate write path from read path. Optimise each independently. Write model can be simple/normalised; read model is materialised asynchronously.
Connection pooling + async I/O: Ensure the write path is non-blocking end to end. Synchronous, blocking writes waste threads waiting on I/O.

Google Sites

Report abuse