MongoDB GridFS规范

This is being changed for 2.4.10 and 2.6.0-rc3. Tyler Brock's explanation:

Now that the server uses power of 2 by default, if the default chunk size for gridfs is 256k we will almost always be throwing away some storage space. This is because if the bindata field of a chunk will occupy 256k (an exact power of 2), then _id and foreign key reference to the files collection, etc will take up additional space that will cause the document's allocated storage to be rounded up to 512k (the next power of 2). This would be a huge waste.

Instead, if we make the default chunk size 255k then we have an extra 1k to store the _id and other metadata so that when the document is persisted we round up to 256k and not 512k upon persisting the document.

 

 MongoDB从2.4.10开始将默认的chunkSize修改为255KB,之前都是256KB。上面这段话说明了为什么要修改,原来mongodb的服务器总是以2^n个字节获取空间的,当默认设置的chunkSize为256K的时候,binaryData将会消耗掉256K的空间,而其他的字段如_id, file_ids 和 n 就会占用额外的几十个字节的空间。这样一来就会超过256K,那么服务器就会给每一个chunk分配512K,这样浪费就大了。。。。。

 

The chunks Collection

Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFSstore. The following is a prototype document from the chunks collection.:

{
  "_id" : <ObjectId>,
  "files_id" : <ObjectId>,
  "n" : <num>,
  "data" : <binary>
}

A document from the chunks collection contains the following fields:

chunks._id

The unique ObjectId of the chunk.

chunks.files_id

The _id of the “parent” document, as specified in the files collection.

chunks.n

The sequence number of the chunk. GridFS numbers all chunks, starting with 0.

chunks.data

The chunk’s payload as a BSON binary type.

The chunks collection uses a compound index on files_id and n, as described in GridFS Index.

 

The files Collection

Each document in the files collection represents a file in the GridFS store. Consider the following prototype of a document in the files collection:

{
  "_id" : <ObjectId>,
  "length" : <num>,
  "chunkSize" : <num>,
  "uploadDate" : <timestamp>,
  "md5" : <hash>,

  "filename" : <string>,
  "contentType" : <string>,
  "aliases" : <string array>,
  "metadata" : <dataObject>,
}

Documents in the files collection contain some or all of the following fields. Applications may create additional arbitrary fields:

files._id

The unique ID for this document. The _id is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectId.

files.length

The size of the document in bytes.

files.chunkSize

The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is 255 kilobytes.

Changed in version 2.4.10: The default chunk size changed from 256k to 255k.

files.uploadDate

The date the document was first stored by GridFS. This value has the Date type.

files.md5

An MD5 hash returned by the filemd5 command. This value has the String type.

files.filename

Optional. A human-readable name for the document.

files.contentType

Optional. A valid MIME type for the document.

files.aliases

Optional. An array of alias strings.

files.metadata

Optional. Any additional information you want to store.

posted @ 2014-07-18 10:00  clivelee  阅读(466)  评论(0编辑  收藏  举报