4.MongoDB_032024-08-04

MongoDB_03

Attribution: MongoDB: The Definitive Guide, Third Edition by Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow (O’Reilly). Copyright 2020 Shannon Bradshaw and Eoin Brazil, 978-1-491-95446-1.

Dates

In JavaScript, the Date class is used for MongoDB’s date type.

When creating a new Date object, always call new Date(), not just Date().

Calling the constructor as a function (i.e., Date()) returns a string representation of the date, not an actual Date object. If you are not careful to use the new Date() so this can cause problems with removing, updating, querying…pretty much everything.

Dates in the shell are displayed using local time zone settings. However, dates in the database are just stored as milliseconds since the epoch, so they have no time zone information associated with them. (Time zone information could, of course, be stored as the value for another key.)

Arrays

Arrays are values that can be used interchangeably for both ordered operations (as though they were lists, stacks, or queues) and unordered operations (as though they were sets).

In the following document, the key "things" has an array value:

{"things" : ["pie", 3.14]}

As you can see from this example, arrays can contain different data types as values (in this case, a string and a floating-point number). In fact, array values can be any of the supported value types for normal key/value pairs, even nested arrays.

One of the great things about arrays in documents is that MongoDB “understands” their structure and knows how to reach inside of arrays to perform operations on their contents. This allows us to query on arrays and build indexes using their contents.

For instance, in the previous example, MongoDB can query for all documents where 3.14 is an element of the "things" array. If this is a common query, you can even create an index on the "things" key to improve the query’s speed.

If this is a common query, you can even create an index on the "things" key to improve the query’s speed.

MongoDB also allows atomic updates that modify the contents of arrays, such as reaching into the array and changing the value "pie" to pi. We’ll see more examples of these types of operations throughout future blogs.

Embedded Documents

A document can be used as the value for a key which is called an embedded document.

Embedded documents can be used to organize data in a more natural way than just a flat structure of key/value pairs.

For example, if we have a document representing a person and want to store that person’s address, we can nest this information in an embedded "address" document:

{
    "name" : "Saya sakisaka",
    "address" : {
        "street" : "123 Park Street",
        "city" : "Anytown",
        "state" : "NY"
    }
}

The value for the "address" key in this example is an embedded document with its own key/value pairs for "street", "city", and "state".

As with arrays, MongoDB “understands” the structure of embedded documents and is able to reach inside them to build indexes, perform queries, or make updates.

In a relational database, the previous document would probably be modeled as two separate rows in two different tables (people and addresses).

With MongoDB we can embed the "address" document directly within the "person" document. Thus, when used properly, embedded documents can provide a more natural representation of information.

The flip side of this is that there can be more data repetition with MongoDB.

Suppose addresses was a separate table in a relational database and we needed to fix a typo in an address.

When we did a join with people and addresses, we’d get the updated address for everyone who shares it. With MongoDB, we’d need to fix the typo in each person’s document.

Suggestions for practice

Mix embedding and references:

In MongoDB, you can mix embedding and references according to specific needs. For data that is frequently queried and rarely modified, you can choose embedding; for data that is frequently modified and needs to be consistent, you can choose references.

Use Schema design tools:

Use MongoDB's Schema design tools and best practices to help you design the most appropriate Schema.

Batch update:

MongoDB provides a powerful batch update function that can update multiple documents in one operation.

Reasonable index:

Create indexes for commonly used query conditions and update conditions to improve query and update efficiency.

_id and ObjectIds

Every document stored in MongoDB must have an "_id" key. The "_id" key’s value can be any type, but it defaults to an ObjectId.

In a single collection, every document must have a unique value for "_id", which ensures that every document in a collection can be uniquely identified.

That is, if you had two collections, each one could have a document where the value for "_id" was 123. However, neither collection could contain more than one document with an "_id" of 123.

OBJECTIDS

ObjectId is the default type for "_id". The ObjectId class is designed to be lightweight, while still being easy to generate in a globally unique way across different machines.

MongoDB’s distributed nature is the main reason why it uses ObjectIds as opposed to something more traditional, like an autoincrementing primary key: it is difficult and time-consuming to synchronize autoincrementing primary keys across multiple servers.

Because MongoDB was designed to be a distributed database, it was important to be able to generate unique identifiers in a sharded environment..

ObjectIds use 12 bytes of storage, which gives them a string representation that is 24 hexadecimal digits: 2 digits for each byte. This causes them to appear larger than they are, which makes some people nervous. It’s important to note that even though an ObjectId is often represented as a giant hexadecimal string, the string is actually twice as long as the data being stored.

If you create multiple new ObjectIds in rapid succession, you can see that only the last few digits change each time. In addition, a couple of digits in the middle of the ObjectId will change if you space the creations out by a couple of seconds. This is because of the manner in which ObjectIds are created.

The 12 bytes of an ObjectId are generated as follows:

Byte 0-3	Byte 4-6	Byte 7-8	Byte 9-11
Timestamp	Machine ID	Process ID	Counter
4 bytes	3 bytes	2 bytes	3 bytes

Timestamp

The first four bytes of an ObjectId are a timestamp in seconds since the epoch. This provides a couple of useful properties:

The timestamp, when combined with the next five bytes (which will be described in a moment), provides uniqueness at the granularity of a second. This mean that the first four byte of all ObjectIds generated in the same second are identical.

For example, if two ObjectIds were generated at 12:00:00 on July 31, 2024, their timestamp portion will be identical.

Because the timestamp comes first, ObjectIds will sort in rough insertion order. This is not a strong guarantee but does have some nice properties, such as making ObjectIds efficient to index.

In these four bytes exists an implicit timestamp of when each document was created. Most drivers expose a method for extracting this information from an ObjectId.

Because the current time is used in ObjectIds, some users worry that their servers will need to have synchronized clocks. Although synchronized clocks are a good idea for other reasons, the actual timestamp doesn’t matter to ObjectIds, only that it is often new (once per second) and increasing.

An important feature of ObjectId design is its incremental nature. The existence of timestamps ensures that ObjectId is incremented by time when it is generated. This allows ObjectId creation to maintain a certain time order even in a highly concurrent environment.

Machine ID , Process ID & Counter

The next five bytes of an ObjectId are a random value.

The final three bytes are a counter that starts with a random value to avoid generating colliding ObjectIds on different machines.

The uniqueness of ObjectId relies on a combination of timestamp, machine identifier, process ID and counter. The timestamp part provides the incrementing feature, but the actual timestamp value does not directly affect the uniqueness of ObjectId.

Even if the timestamp part is repeated in the same second, the last three bytes are simply an incrementing counter that is responsible for uniqueness within a second in a single process. This allows for up to 256^3 (16,777,216) unique ObjectIds to be generated per process in a single second.

AUTOGENERATION OF _ID

As stated earlier, if there is no "_id" key present when a document is inserted, one will be automatically added to the inserted document. This can be handled by the MongoDB server but will generally be done by the driver on the client side.