当前有很多分布式系统都采用了不同方法来生成squence num,其中UUID是比较费力气和费空间的方法.在分配squence num时候,其实为了达到数据的分布和均衡效果,是应该把squence num分配给client。现在介绍两种其他生成squence num的方法:
1、通过config server(这个是单独的管理元数据的服务器,不同系统叫法不同,有的叫master server,有的叫root server)来协调生成。
首先该需要生成的squence num的字段需要注册到config server ,然后client需要使用该字段时候,进行insert操作,需要去向config server 去申请,每次申请一定的步长(比如申 请1-100)。当client用完了这个步长,就需要再去申请。
这个方法优点是统一管理,比较简单,一般不会失败。但缺点是,当client把一个步长用完后,去申请新的步长,config server不能down,不适合在正常运转中,可以没有config server, 系统继续良好运转的条件。
2、通过data server自己的管理
就是在squence num也像表里一个字段,client抢占的来更新,就是说比如有三个data server ,可以把1-100定义好分配给S1, 101-200分配给S2,201-300分配给S3,然后 多个client比如去抢占S1,就把1-100分配给其中一台机器,其他的就是失败的,下次再来向S1申请,就是申请301-400给client.
这样的方法优点是可以当作普通的表的字段来处理,但缺点是扩展难,需要解决单点问题
3、简单的数据库自增长列生成
4、twitter的做法http://engineering.twitter.com/2010/06/announcing-snowflake.html
Problem
We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).
Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL.
Our requirements for this system were pretty simple, yet demanding:
We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.
These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.[1]
Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.
Options
We considered a number of approaches: MySQL-based ticket servers (like flickr uses), but those didn’t give us the ordering guarantees we needed without building some sort of re-syncing routine. We also considered various UUIDs, but all the schemes we could find required 128 bits. After that we looked at Zookeeper sequential nodes, but were unable to get the performance characteristics we needed and we feared that the coordinated approach would lower our availability for no real payoff.
Solution
To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number.
Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file).
We encourage you to peruse and play with the code: you’ll find it on github. Please remember, however, that it is currently alpha-quality software that we aren’t yet running in production and is very likely to change.
Feedback
If you find bugs, please report them on github. If you are having trouble understanding something, come ask in the #twinfra IRC channel on freenode. If you find anything that you think may be a security problem, please email security@twitter.com (and cc myself: ryan@twitter.com).
[1] In mathematical terms, although the tweets will no longer be sorted, they will be k-sorted. We’re aiming to keep our k below 1 second, meaning that tweets posted within a second of one another will be within a second of one another in the id space too.