HDFS essay 2 - Clarify Name Node / Checkpoint Node/ Backup Node
为什么想用英文写了?我获取知识、技术的大部分途径都是通过英文,所以按照自己的理解用英文写下来也比较容易,另外,很多term都是不能翻译的,如果要持续学习技术和知识,那就不但要习惯去阅读,听,还要写,说。可惜从IBM出来后,很少有机会和人去说了,只能写了。就当提高自己英文水平吧
I am going to explain how Secondary Name Node in this article. It is not called SNN any more but Checkpoint node which is more precise.
FS Image & Edit log
Edit log is an append-only file which stores the metadata changes for durability. When client issues a command like 'create file','chmod' and 'rename' , NN will write this operation ( the command itself and the data) to Edit log as an edit entry / log which is uniquely identified by an ID , we called transaction ID as each operation is considered a transaction. Transacttion ID is an auto-increment variable. NN will also apply this to the in-memory medatadata structure. The form of the entry looks like below. During run time, NN does not write directly to fsimage as writing to fsimage is a time-consuming operation.
tranID: 7
put
'/xx/xx/log.txt',rwx------, ricky, ricky, 1024, $timestamp.....
tranID: 8
chmod 777
'/xx/xx/log.txt'
FS Image is the metadata file of HDFS namespace in NN . It is always outdated because NN does not write it frequently. Actually, when NN starts up, it will read FS image and merge Edit log to generate a new FS Image, the old one and Edit logs will be deleted. Imaging if there are many edit entries accumulated along the time(months), it will take much time to starup NN next time as there are too many entries to merge with the fsimage. Also considering this situaion, NN is down due to power outage, it tried to restore / catch up the newest state during startup by merging fsimage and too many edit logs and user is expecting it is up ASAP~~~.
Why SNN / Checkpoint and how it works
I already pointed one of the reason above. Another reason is merging fs image with edit logs is expensive operation(I/O, CPU intensive) which may restrict the client accessing to the NN. So, we will use a seperate machine as Checkpoint node to hand over the load to. So, NN remains active all the time.
How it works?
The fsimage file name contains the transaction id. For example, fsimage_7, means this image file has all the transaction up to transaction id 7. Similarly, The current in-use edit log file is edits_in_progress_$tranID(edits_in_progress_21) which means transaction whose ID aboves 21 will be stored in this file. Old edit log file will be renamed to edits_8-20 and no write to it any more. So, what happens when checkpointing?
- CPN gets the known most recent transaction id from current in-use edit log which is 21.
- CPN gets the fs image transaction id which is 7.
- CPN will download the edit logs whose name is edit_8-20.
- CPN will check if it has the fs image file whose transaction is 7(it usually has).
- CPN will do the checkpointing by merging the fsimage_7 and edits_8-20, generating new fsimage_20.
- CPN will send the fsimage_20 to NN . so, both of them has the newly generated fsimage_20. Next time, CNP will not download it from NN.
When the checkpoint get triggered? according to official doc , either of the 2 meets will trigger checkpointing.
- dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
- dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
Please note that, CPN does not keep in-memory metadata and does not have block locations which are from block reports sent by DNs.
Backup Node
See here . It looks to me that Backup Node is better than CPN.