mrjob 使用 mongodb 作为数据源

  • When using a mongoDB collection as input, add the arguments -jobconf mongo.input.uri=<input mongo URI> and -inputformat com.mongodb.hadoop.mapred.MongoInputFormat
  • When using a mongoDB collection as output, add the arguments -jobconf mongo.output.uri=<input mongo URI> and -outputformat com.mongodb.hadoop.mapred.MongoOutputFormat

Examples:

  • mongodb://joe:12345@weyland-yutani:27017/analytics.users?readPreference=secondary Authenticate as "joe" with the password "12345" and read from only SECONDARY nodes from the "users" collection in the database "analytics".
  • mongodb://joe:12345@weyland-yutani:27017/production.customers?readPreferenceTags=dc:tokyo,type:hadoop Authenticate "joe" with the password "12345" and read the "users" collection in database "analytics" only on nodes tagged with "dc:tokyo" and "type:hadoop".

参考:

https://github.com/mongodb/mongo-hadoop/wiki/Streaming-Usage

https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference

https://docs.mongodb.org/manual/reference/connection-string/

posted @ 2015-11-04 15:35  高山流的不是水  阅读(250)  评论(0编辑  收藏  举报