Design eCommerce Website (Part II)
Design eCommerce Website (Part II)
This is the second post of Design eCommerce Website series posts. If you haven’t read the first post, it’s better to check it first as we’ll continue our discussion here.
To briefly remind you what we have discussed in the previous post, we started with the data model design for an eCommerce website. Although relational database is the most common approach, we notice that NoSQL database like MongoDB provides a lot of advantages and flexibilities when building an eCommerce website. To scale the system, concurrency is one of the key factors to consider.
In the post, I’ll mostly focus on scalability of eCommerce websites. Building a single machine system can be simple, but when we decide to scale the website to support millions or even billions of requests with multiple servers, tons of scalability issues need to be considered.
Concurrency (continued)
One common scenario is that there’s only one book left in the store and two people buy it simultaneously. Without any concurrency mechanism, it’s absolutely possible that both have bought it successfully.
In our previous post, we know that one approach is to place a lock on the row whenever someone accesses the resource (book) so that at most one person can read/write the data at a time. The solution is called pessimistic concurrency control. Although the approach can prevent two people accessing the same data simultaneously, placing locks is quite costly. How would you solve this problem more efficiently?
Optimistic concurrency control is another way to support concurrency. The idea is very simple – instead of using locks, each process/thread can access data freely. However, before committing changes, each transaction should check if the data has the same state as it did when you last read it. In other words, you check the data in the beginning of the transaction and check again before committing to see if they are still the same.
If the data hasn’t been modified, you can safely commit it. Otherwise, roll back and try again until there’s no conflict. It’s important to compare the two approaches here. For OCC (Optimistic concurrency control), apparently it’s more efficient to read/write data unless there are conflicts. With that in mind, for systems that are unlikely to have conflicts, OCC is a better option. However, when resources are frequently accessed by multiple clients, restarting transaction becomes costly in OCC and it’s better to place locks in each transaction (PCC).
In applications like Amazon, there are so many products that it’s not that frequent to have multiple people accessing the same product simultaneously. As a result, OCC is a better choice.
//OCC (Optimistic concurrency control) 代码实现?
Availability in eCommerce
It’s a big loss if Amazon website is down for 1 minute. To achieve high availability in distributed systems, the best way is to have hundreds or thousands of replicas so that you can tolerate many failures. However, it’s worth to note here that availability and consistency go hand in hand.
If you have many replicas, it’s definitely harder to guarantee that each replica has the same data. On the flip side, if you want to achieve high consistency, you’d better have fewer replicas, thus the system is prone to failure.
The idea here is not trying to achieve both. Instead, based on the nature of the product, you should be able to make the trade-off. For an eCommerce website, latency and down time usually means losing revenue, which can be a big number sometimes. As a result, we might care more about availability than consistency. The latter can be improved through other approaches.
Consistency in eCommerce
Given that we have hundreds or thousands of replicas, how would you guarantee that each replica keeps the same data? To explain the problem in detail, suppose data D is replicated in multiple servers. When a process is trying to update D to D1, it starts from one server and follows a particular order to propagate the changes. At the same time, another process is trying to update D to D2 and it may start from another server. As a result, some servers having data D1 and some D2.
Strong consistency
One approach is to force all updates to happen in the same order atomically. More specifically, when someone is updating the resource, it’s locked across all servers until all of them share the same value (after the update). As a result, if an application is built upon a system with strong consistency, it’s exactly the same as working on a single machine. Apparently, this is the most costly approach as not only is placing locks expensive, but it also blocks the system for every update.
Weak consistency
Another extreme case is that we can provide minimum curation. Every replica will see every update, however, they may be in different orders. As a result, this approach makes the update operation extremely light-weighted, but the downside is providing a minimum guarantee of consistency.
Note: we are not explaining the accurate definition of consistency model. Instead, we’d like to illustrate ideas with examples, which are more helpful for preparing system design interviews.
Eventual consistency
As you can imagine, a more practical approach lies somewhere in between. In a nutshell, the system only guarantees that every replica will have the same value eventually. At a certain period of time, data might be inconsistent. But in the long term, the system will resolve the conflict.
Let’s take Amazon’s Dynamo as an example. Basically, it’s possible that each replica holds different versions of the data at a particular time. So when the client read the data, it may get multiple versions. At this point, the client (instead of the database) is responsible to resolve all the conflicts and update them back to the server.
You might wonder how does the client resolve those conflicts. It’s mostly a product decision. Take the shopping cart as an example, it’s very important to not lose any additions since losing additions means losing revenue. So when facing different values, the client can just choose the one with most items.
//难道不是选最新的数据吗?不太清楚怎么这个策略,需要看一些其他的
Summary
As you can see, a lot of techniques here are common among distributed systems. What’s important is to understand the trade-off between each other and select the approach that best fit the product.
By the way, if you want to have more guidance from experienced interviewers, you can check Gainlo that allows you to have mock interview with engineers from Google, Facebook etc..
The post is written by Gainlo - a platform that allows you to have mock interviews with employees from Google, Amazon etc..