MongoDB and C#
Introduction
Most likely you have used a relational database and been fairly happy with it. I know I have. Be it SQL Server or MySQL, I know how to use my tools efficiently to push, pull, and transform the data I need. When we sit down to analyze a project, we'll debate over the language, the servers, etc..., but we never talk about what type of database fits the problem. We always just assume we'll use a relational database. But there are a lot of problems that can exist as a result of this choice. I not only need to know my preferred language (C#, Ruby), but I also need to know SQL. In addition, there is a known impedance mismatch between a relational structure and the domain models I so carefully craft. While I do use an OR/M (nHibernate, LINQ to SQL, etc...), it simply allows me to ignore (most of the time) this problem.
So what am I getting at? Well, there are other types of databases that exist. Object-databases like db4o and other non-relational databases like Casandra or Amazon's SimpleDB provide some relief, but none really handles the problems I need fixed. I need something that is cross-language, but also that supports an object model. I need something that is highly scalable, but still fast.
As it turns out, there is another type of database. A document-database. As I looked around, there were a number of them that fit the mold. CouchDB, RavenDB, and MongoDB are the most notable ones. I settled on MongoDB because it continued to provide me dynamic query capabilities that the other two do not.
In this article, I'd like to help you get started with MongoDB, from install to your first project. In addition, I'll show a little bit about the LINQ provider and let you play with it on your own.
Getting MongoDB
The MongoDB site has a downloads link sitting right at the top. You'll want to download and install the one that is pertinent to your system. I am using version 1.4.2 while writing this article.
After you have downloaded it, you can simply unzip it anywhere on your box. I unzipped mine to c:\Program Files\MongoDB. In addition, you'll need to go and manually create a directory at c:\data\db. This is the default location where MongoDB stores its files. Finally, if you go back to the MongoDB directory, you can run the server by executing mongod.exe (for Windows users).
At this point, you are running the MongoDB server on localhost, port 27017. These are the defaults, and will work just fine for our test project.
There is a shell you can run to play around with the database if you'd like. I will not get into that here, but there is a tutorial on MongoDB's site if you click "Try It Out". Note that the syntax here is not C#, but rather JavaScript.
MongoDB Introduction
We are going to create a simple little test project that simulates a blogging system. This is a fairly well-understood paradigm, so I won't go into the details about the object model. Shown below is the entity model we'll be using.
public class Post
{
public Oid Id { get; private set; }
public string Title { get; set; }
public string Body { get; set; }
public int CharCount { get; set; }
public IList<Comment> Comments { get; set; }
}
public class Comment
{
public DateTime TimePosted { get; set; }
public string Email { get; set; }
public string Body { get; set; }
}
Experienced OR/M users will notice that there are no identifiers on Comment
. That is because these entities do not have their own table. They are part of the Post document and stored as an array. In other words, when we fetch a Post, we pull back all of its related comments without doing anything special. This is where the impedance mismatch mentioned in the introduction goes away. There is now no difference between my entity model here and the one stored in MongoDB.
MongoDB stores its data in BSON (binary JSON). Each server has a number of databases, and each database has a number of collections. You can think of collections like you think of tables in a relational store. In our example above, we only need a single collection to model our data. We'll go ahead and call that collection "Post" for now, after our class name.
If we were to query the Post collection from the shell (after inserting some data), we'd see JSON come back representing our data. For our example, blog will be our database name. Shown below is an example of this data.
//blog is our database name.
//shell commands
use blog
db.Post.find()
//results
{ _id: ObjectId("4be05365340d000000002554"),
Title: "My First Post",
Body: "This isn't a very long post.",
CharCount: 28,
Comments: [
{ TimePosted: "Fri Jan 01 2010 00:00:00 GMT-0600 (Central Standard Time)",
Email: "bob_mcbob@gmail.com",
Body: "This article is too short!"
},
{ TimePosted: "Fri Jan 02 2010 00:00:00 GMT-0600 (Central Standard Time)",
Email: "Jane.McJane@gmail.com",
Body: "I agree with Bob."
}
]
}
A couple of things to note about the above results. First, _id
is our identifier. While I think you probably figured that out, you may not know some of the ins and outs. _id
will be automatically generated if you don't provide one. Since I did not when I created this record, a type of object called an ObjectId
was used. This type is documented on MongoDB's site, and is fully supported by all the client-side implementations, including MongoDB-CSharp. MongoDB-CSharp also lets you specify either your own identifier, or you can use other types of auto-generated identifiers like a GUID. Second, notice how comments are stored as an array and embedded right within the Post document. As I mentioned before, we have no need of performing a join to get all the information we need about a post, it is already a part of the document.
The C# Project
OK, enough chit-chat. Let's build our project. The included sample project contains MongoDB.dll to reference. If you'd like to download it yourself, the project location is at http://github.com/samus/mongodb-csharp/downloads. I've included the .90 beta 1 DLL in the libs folder. There are quite a few other C# drivers out there, but so far, I believe the functionality delivered by this driver is farther along and more mature. In full disclosure, I am biased because I helped write some of the project. Any prior version will not have the LINQ support or configuration support that exists in this one. In addition, you can consult the wiki at the location above for examples and other documentation.
So, at this point, I've created a new VS2008 solution called MongoDB Blog, with a project I call MongoDBBlog.Tester. Tester is a console application. In addition, I added a reference to MongoDB.dll that we downloaded earlier from github.
Finally, Some Code!!!
First things first. We need to save some posts.
using MongoDB;
using MongoDB.Linq;
//etc...
//Create a default mongo object. This handles our connections to the database.
//By default, this will connect to localhost,
//port 27017 which we already have running from earlier.
var mongo = new Mongo();
mongo.Connect();
//Get the blog database. If it doesn't exist, that's ok because MongoDB will create it
//for us when we first use it. Awesome!!!
var db = mongo.GetDatabase("blog");
//Get the Post collection. By default, we'll use
//the name of the class as the collection name. Again,
//if it doesn't exist, MongoDB will create it when we first use it.
var collection = db.GetCollection<Post>();
//this deletes everything out of the collection so we can run this over and over again.
collection.Delete (p => true);
//Create a Post to enter into the database.
var post = new Post()
{
Title = "My First Post",
Body = "This isn't a very long post.",
CharCount = 27,
Comments = new List<Comment>
{
{ new Comment() { TimePosted = new DateTime(2010,1,1),
Email = "bob_mcbob@gmail.com",
Body = "This article is too short!" } },
{ new Comment() { TimePosted = new DateTime(2010,1,2),
Email = "Jane.McJane@gmail.com",
Body = "I agree with Bob." } }
}
};
//Save the post. This will perform an upsert. As in, if the post
//already exists, update it, otherwise insert it.
collection.Save(post);
Great. Now we have a post in our database, but those of you astute readers will notice I can't count. I shouldn't have hardcoded that value, but then I wouldn't have a reason to show you how to update. The actual character count above is 28, not 27. That's OK, it is easy to update.
//Get the first post that is not matching correctly...
var post = collection.Linq().First(x => x.CharCount != x.Body.Length);
post.CharCount = post.Body.Length;
//this will perform an update this time because we have already inserted it.
collection.Save(post);
OK, good. All is right with the world again. At this point, we can go ahead and query our posts. In the attached code, I have entered three posts at this point so our queries have a little substance to them.
LINQ is fully supported up to the limits of MongoDB. Projections, Where
clauses, and ordering are all a part of our LINQ support. Joins, however, are not. This is due to a lot of reasons that go back to consistency, but we don't support Joins because MongoDB does not support Joins. Below, we'll see some simple queries.
//count all the Posts
var totalNumberOfPosts = collection.Count();
//count only the Posts that have 2 comments
var numberOfPostsWith2Comments =
collection.Count(p => p.Comments.Count == 2);
//find the titles of the posts that Jane commented on...
var postsThatJaneCommentedOn =
from p in collection.Linq()
where p.Comments.Any(c => c.Email.StartsWith("Jane"))
select p.Title;
//find the titles and comments of the posts
//that have comments after January First.
var postsWithCommentsAfterJanuary1st = from p in collection.Linq()
where p.Comments.Any(c => c.TimePosted >
new DateTime(2010, 1, 1))
select new { Title = p.Title,
Comments = p.Comments };
//find posts with less than 40 characters
var postsWithLessThan40Chars = from p in collection.Linq()
where p.CharCount < 40
select p;
As you can see, the query capabilities of MongoDB are satisfactory. We can get what we want when we want it. We can even do some aggregation using the Map-Reduce capabilities of MongoDB.
MongoDB uses Map-Reduce to perform scalable aggregation and rollups of data. This generally involves writing some JavaScript to get it working. Below is an example of using JavaScript to sum up our word count, first using JavaScript, and then using our LINQ provider's automatic transformation.
//Manual map-reduce
var sum = Convert.ToInt32(collection.MapReduce()
.Map(new Code(@"
function() {
emit(1, this.CharCount);
}"))
.Reduce(new Code(@"
function(key, values) {
var sum = 0;
values.forEach(function(prev) {
sum += prev;
});
return sum;
}"))
.Documents.Single()["value"]);
//Using Linq to automatically build the above query. Awesome!!!
var linqSum = collection.Linq().Sum(p => p.CharCount);
//Now imagine about doing this by hand...
var stats = from p in collection.Linq()
where p.Comments.Any(c => c.Email.StartsWith("bob"))
group p by p.CharCount < 40 into g
select new
{
LessThan40 = g.Key,
Sum = g.Sum(x => x.CharCount),
Count = g.Count(),
Average = g.Average(x => x.CharCount),
Min = g.Min(x => x.CharCount),
Max = g.Max(x => x.CharCount)
};
Summary
I hope you've seen how easy it is to get started. There is a whole lot more we can talk about, from the proper way to use a document database to the CAP theorem and how it applies. I encourage you to do some digging on your own to find out about some of these ideas. Also, play around with the sample project. Add some fields and try stuff out. It is still a work in progress and so some LINQ functionality may not work as expected. Let us know and we'll do our best to get it resolved.
We have a Google group for users to come and ask questions, at http://groups.google.com/group/mongodb-csharp. Come and participate and, if you so desire, pull down the source and start helping us out. We truly want this to be a great library for the .NET community that functions off feedback and contributions.
References
- MongoDB: http://www.mongodb.org
- MongoDB-CSharp: http://github.com/samus/mongodb-csharp/downloads
- MongoDB-CSharp Group: http://groups.google.com/group/mongodb-csharp
Author
Craig G. Wilson (United States)