记一次MongoDB Map&Reduce入门操作
-
需求说明
用Map&Reduce计算几个班级中,每个班级10岁和20岁之间学生的数量:
-
需求分析
-
学生表的字段:
db.students.insert({classid:1, age:14, name:'Tom'})
将classid随机1和2、age在8-25岁之间随机,name在3-7个字符之间随机。
-
数据写入
-
数据写入java脚本
往mrtask库中students写入1000万条数据:
package org.test; import java.util.ArrayList; import java.util.List; import java.util.Random; import com.mongodb.BasicDBObject; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBCursor; import com.mongodb.DBObject; import com.mongodb.MongoClient; import com.mongodb.ServerAddress; public class TestMongoDBReplSet { public static void main(String[] args) { try { List<ServerAddress> addresses = new ArrayList<ServerAddress>(); ServerAddress address1 = new ServerAddress("172.16.16.89", 27017); addresses.add(address1); MongoClient client = new MongoClient(addresses); DB db = client.getDB("mrtask"); DBCollection coll = db.getCollection("students"); // 数据写入 BasicDBObject object = new BasicDBObject(); for (int i = 1; i <= 10000000; i++) { object.append("classid", 1 + (int) (Math.random() * 2)); object.append("age", 8 + (int) (Math.random() * 17)); object.append("name", getName()); coll.insert(object); object.clear(); } } catch (Exception e) { e.printStackTrace(); } } public static String getName() { ArrayList list = new ArrayList(); for (char c = 'a'; c <= 'z'; c++) { list.add(c); } String str = ""; int end = 3 + (int) (Math.random() * 4); for (int i = 0; i < end; i++) { int num = (int) (Math.random() * 26); str = str + list.get(num); } return str; } }
-
查看数据写入
经查看,mrtask库中students表中有1000万条的数据:
[root@localhost bin]# ./mongo
MongoDB shell version: 2.6.11
connecting to: test
> show dbs
admin (empty)
local 0.078GB
mrtask 3.952GB
test 0.453GB
> use mrtask
switched to db mrtask
> db.students.find().count()
10000000
-
Map&Reduce计算
-
Map计算
> mapfun = function(){emit(this.classid,1)}
-
Reduce计算
> reducefun=function (key, values) { var count = 0; values.forEach(function (v) {count += v;}); return count; }
> ff = function (key, value) { return {classid:key, count:value}; }
-
Result输出
> classid_res = db.runCommand({
mapreduce:"students",
map:mapfun,
reduce:reducefun,
out:"students_classid_res",
finalize:ff,
query:{age:{$gt:10,$lt:20}}
});
-
计算结果
> db.students_classid_res.find()
{ "_id" : 1, "value" : { "classid" : 1, "count" : 2643128 } }
{ "_id" : 2, "value" : { "classid" : 2, "count" : 2650870 } }