This blog is written as per the codekarle system design series.
NFR to ask:
Length of the shortened URL or how many unique url requests are coming for the shortening per second, minute, hours, day, or month.
Retention period of such URL before generating the new one.
Allowed character.
Assumption:
Length will be in a range of billons like Google or Youtube but it is a fixed number.
The retention period is 10 years
Allowed characters are a-z, A-Z, 0-9
Lets X number of requests is coming per second
X * 60 * 60 * 24 *365 * 10 years = Y
Total number of Alphabets
Lowercase letters (a-z): 26 + Uppercase letters (A-Z): 26 + Digits (0-9): 10 = 62
If the length of the URL is 1 then we can support 62 urls, if 2 then 62^2 urls, if n then 62^n
we have to make sure that 62^n>Y, so n= log(base 62)Y. So the length of the URL will be n
Now, 62^6= 58 billion, 62^7= 3.5 trillion.
Assuming for our use case, we will go for 7 characters.
NFR we know
The system should be High availability and low latency
Potential Architecture:
The token service on my sql db will send a range and the short URL service will store it in its in memory database to avoid any collisions. Then when the service generates the url, it will be saved to the Cassandra database and returned as an output of the service.
The other design is to get the long url from the short url from the database
For the matrix and analytics like the origin of the requests to decide the primary DC and increase the capacity, user agent, header information etc, use Short URL Service-> event grid-> async Kafka ->Hadoop to run Hive queries. Async Kafka call can lose some requests when the Kafka service is down which is acceptable in this use case.