mongodb journal文件格式
如果开启journal,在dbpath选项指定的目录下会创建journal目录来存放journal文件,文件名形如j._<n>。journal文件用于数据库异常退出时恢复数据。这里是解析journal文件的示例代码。
journal文件的大小定义如下,smallfiles选项可以指定为128M。
// Rotate after reaching this data size in a journal (j._<n>) file // We use a smaller size for 32 bit as the journal is mmapped during recovery (only) // Note if you take a set of datafiles, including journal files, from 32->64 or vice-versa, it must // work. (and should as-is) // --smallfiles makes the limit small. #if defined(_DEBUG) unsigned long long DataLimitPerJournalFile = 128 * 1024 * 1024; #elif defined(__APPLE__) // assuming a developer box if OS X unsigned long long DataLimitPerJournalFile = 256 * 1024 * 1024; #else unsigned long long DataLimitPerJournalFile = (sizeof(void*)==4) ? 256 * 1024 * 1024 : 1 * 1024 * 1024 * 1024; #endif
journal文件什么时候会被删除?
- 数据库正常退出时,全部删除。
- 对于老的journal文件,如果lastEventTimeMs < _lastFlushTime + ExtraKeepTimeMs,则删除。其中lastEventTimeMs为journal文件关闭的时间,_lastFlushTime为数据文件刷入到磁盘的时间,ExtraKeepTimeMs为额外的保留时间。
- 执行fsync命令
journal文件存储的是对数据库文件(dbname.ns、dbname.<#>系列文件)的修改日志,包括写操作和创建文件操作。对数据库文件的写操作会记录一个WriteIntent,创建数据库文件会记录一个DurOp。WriteIntent记录了写操作的指针和长度,可以定位到修改的数据文件的位置和长度。DurOp由一个操作码来确定是什么操作,不同的操作,日志的格式不一样。每个WriteIntent或者DurOp都会形成一个JEntry。
journal文件由一个头部JHeader和很多Section组成,每次groupCommit都会产生一个Section。Section由一个头部JSectHeader、很多个JEntry和一个JSectFooter组成,每个JEntry代表一条修改日志。另外,JEntry前面可能会有一个JDbContext,表示修改的是哪个数据文件。如果多个JEntry都是同一个数据文件的操作,则只有一个JDbContext。除掉JSectHeader 和JSectFooter ,Section中间那部分会压缩。
JHeader
beginning header for a journal/j._<n> file
there is nothing important int this header at this time. except perhaps version #.
char magic[2] | "j\n". j means journal, then a linefeed, fwiw if you were to run "less" on the file or something... |
unsigned short _version | 0x4149 |
char n1 | '\n' |
char ts[20] | ascii timestamp of file generation. for user reading, not used by code. for example: "Jan 29 14:14:23" |
char n2 | '\n' |
char dbpath[128] | path/filename of this file for human reading and diagnostics. not used by code |
char n3, n4 | '\n', '\n' |
unsigned long long fileId | unique identifier that will be in each JSectHeader. important as we recycle prealloced files |
char reserved3[8026] | 8KB total for the file header |
char txt2[2] | "\n\n" at the end |
头部的大部分字段都是可读的,很直观,用文本文件打开可以看到。
JSectHeader
"Section" header. A section corresponds to a group commit.len is length of the entire section including header and footer.header and footer are not compressed, just the stuff in between.
unsigned _sectionLen | unpadded length in bytes of the whole section |
unsigned long long seqNumber | sequence number that can be used on recovery to not do too much work |
unsigned long long fileId | matches JHeader::fileId |
_sectionLen是unpadded的长度,每个Section都会填充对齐。
每个Section都会有一个seqNumber,其值是上次同步数据库文件到磁盘的服务器时间,会不断增长。LSNFile(文件名是lsn)里面也存储了一个seqNumber,含义是一样的。恢复数据的时候,如果Section的seqNumber小于LSNFile的seqNumber,则不需要恢复。
JEntry
an individual write operation within a group commit section. Either the entire section should be applied, or nothing. (We check the md5 for the whole section before doing anything on recovery.)
unsigned len | length in bytes of the data of the JEntry. does not include the JEntry header. 实际就是数据data的长度 |
OpCodes opcode | 此字段和上面len字段共用的。如果是对数据库文件的修改,则为长度len;否则为操作类型。 |
unsigned ofs | offset in file. 指的是被修改的数据文件的偏移量 |
int _fileNo |
high bit is set to indicate it should be the <dbpath>/local database. dbname.#文件中的数字,即#。
如果是dbname.ns文件,则值为0x7fffffff。高位bit为1表示是local库。
|
char data[] | 更新的数据 |
OpCodes的取值
OpCode_Footer | JSectFooter的sentinel字段的取值 |
OpCode_DbContext | JDbContext的sentinel字段的取值 |
OpCode_FileCreated | 表示创建pdfile的操作 |
OpCode_DropDb | 表示dropDb的操作,目前没有使用 |
这几个取值都很大,因为跟JEntry的len是共用,必须比可能的长度值大得多
事实上JEntry、JDbContext和JSectFooter的前4字节是含义是相同的。这样比较容易解析。
当JEntry为Op时,格式根据不同的Op而变化,如OpCode_FileCreated的格式为
OpCodes opcode | OpCode_FileCreated |
unsigned long long reserved1 | |
unsigned long long reserved2 | |
unsigned long long _len | size of file, not length of name |
RelativePath _p | 文件名的相对路径,以'\0'结尾的字符串 |
JDbContext
declares "the next entry(s) are for this database / file path prefix"
sentinel | 哨兵值,为了解析方便,为OpCode_DbContext |
file path | 库的相对路径,相对于dbpath,以'\0'结尾的字符串。for a filename a/b/c.3, file path() is "a/b/c" |
JSectFooter
group commit section footer. md5 is a key field.
unsigned sentinel | 哨兵值,为了解析方便,为OpCode_Footer |
unsigned char hash[16] | 除掉JSectFooter,整个Section的checksum |
unsigned long long reserved | |
char magic[4] | "\n\n\n\n" |
LSNFile
last sequence number. lsn文件。
unsigned ver | 为0,用以检查是否合法 |
unsigned reserved2 | |
unsigned long long lsn | last sequence number。值为数据库文件上次同步到磁盘的时间。 |
unsigned long long checkbytes | lsn的值取反,用以检查是否合法 |
unsigned long long reserved[8] |