logstash同步mongodb到es
环境:
OS:Centos 7
说明:
logstash本身不自带logstash-input-mongodb插件,需要自行安装。
项目地址:
https://github.com/phutchins/logstash-input-mongodb
1.安装编译工具
yum install git
yum install gem
2.源码编译
[root@localhost]#git clone https://github.com/phutchins/logstash-input-mongodb.git
[root@localhost]#cd logstash-input-mongodb
[root@localhost logstash-input-mongodb]# gem build *.gemspec
Successfully built RubyGem
Name: logstash-input-mongodb
Version: 0.4.1
File: logstash-input-mongodb-0.4.1.gem
这个时候会生成logstash-input-mongodb-0.4.1.gem文件
[root@localhost logstash-input-mongodb]# ls -al
total 40
drwxr-xr-x. 6 root root 234 Nov 2 04:13 .
drwxr-xr-x. 3 root root 36 Nov 2 04:12 ..
-rw-r--r--. 1 root root 720 Nov 2 04:12 DEVELOPER.md
-rw-r--r--. 1 root root 38 Nov 2 04:12 Gemfile
-rw-r--r--. 1 root root 2335 Nov 2 04:12 Gemfile.lock
drwxr-xr-x. 8 root root 163 Nov 2 04:12 .git
drwxr-xr-x. 3 root root 22 Nov 2 04:12 lib
-rw-r--r--. 1 root root 594 Nov 2 04:12 LICENSE
-rw-r--r--. 1 root root 11776 Nov 2 04:13 logstash-input-mongodb-0.4.1.gem
-rw-r--r--. 1 root root 1255 Nov 2 04:12 logstash-input-mongodb.gemspec
-rw-r--r--. 1 root root 33 Nov 2 04:12 Rakefile
-rw-r--r--. 1 root root 3453 Nov 2 04:12 README.md
drwxr-xr-x. 3 root root 20 Nov 2 04:12 spec
drwxr-xr-x. 2 root root 31 Nov 2 04:12 test
3.查看当前安装的插件
[root@localhost bin]# cd /opt/logstash-6.8.5/bin
[root@localhost bin]#./logstash-plugin list
4.安装
[root@localhost bin]# cd /opt/logstash-6.8.5/bin
[root@localhost bin]# ./logstash-plugin install /soft/mongo2es/logstash-input-mongodb/logstash-input-mongodb-0.4.1.gem
提示报错:
ERROR: Something went wrong when installing /soft/mongo2es/logstash-input-mongodb/logstash-input-mongodb-0.4.1.gem, message: execution expired
ERROR: Something went wrong when installing /soft/mongo2es/logstash-input-mongodb/logstash-input-mongodb-0.4.1.gem, message: Socket closed
解决办法:
重新运行,该命令要执行很久,至少一个小时
5.logstash同步配置文件
[root@localhost config]# more sync_mongo_es.conf
input {
mongodb {
codec => "json"
parse_method => "simple"
uri => 'mongodb://192.168.1.108:29001/db_pushmsg'
placeholder_db_dir => '/opt/logstash-6.8.5/db_dir'
placeholder_db_name =>'app_message_all.db'
collection => 'app_message_all'
}
}
# 该部分被注释,表示filter是可选的
filter {
mutate {
remove_field => ["host","@version","logdate","log_entry","@timestamp","mongo_id"]
}
mutate {
rename => { "_id" => "uid" }
}
}
output {
elasticsearch {
hosts => ["http://192.168.1.109:19200"]
user => "elastic"
password => "elastic123"
index => "index_app_message_all"
##document_type => "%{[@metadata][_type]}"
##document_id => "%{[@metadata][_id]}"
}
}
问题:
1.同步会丢失第一条记录,因为增量同步是通过id>同步的。
第一条数据会丢失:
db.app_message_all.find().sort({_id: 1}).limit(1)
2.不需要明确些调度关键字schedule => "*/5 * * * * *",默认就是增量同步的。
3.配置参数
Name Type Description
uri [String] A MongoDB URI for your database or cluster (check the MongoDB documentation for further info on this) [No Default, Required]
placeholder_db_dir [String] Path where the place holder database will be stored locally to disk [No Default, Required]
This gets created by the plugin so the directory needs to be writeable by the user that logstash is running as
placeholder_db_name [String] Name of the database file that will be created [Default: logstash_sqlite.db]
collection [String] A regex that will be used to find desired collecitons. [No Default, Required]
generateId [Boolean] If true, this will add a field '_id' that contains the MongoDB Document id
batch_size [Int] Size of the batch of mongo documents to pull at a time [Default: 30]
parse_method [String] Built in parsing of the mongodb document object [Default: 'flatten']
dig_fields [Array] An array of fields that should employ the dig method
dig_dig_fields [Array] This provides a second level of hash flattening after the initial dig has been done
#############mongodb同步到es不同步第一条记录的处理办法#############
1.获取同步的mongodb表的最小id
s1:PRIMARY> db.app_message_all.find().sort({_id: 1}).limit(1)
{ "_id" : ObjectId("65445964cfc901aef5a80dda"), "message_id" : NumberLong("6797283726620357"), "user_id" : 24681552, "sender_seq_no" : "fbaf70cc-cc80-4bbe-a927-74f173c6c672", "create_time" : "2023-11-03 10:22:28.363511" }
这里的id是 65445964cfc901aef5a80dda,我们找一个比该id小的进行同步的起始位置:
如(修改最后一位):
65445964cfc901aef5a80dd0
2.更新sqlite3数据库
我这里的sqlite3版本是3.42.0,版本太低的的话需要升级,linux系统自带的版本太低,需要升级才能使用,升级连接:https://www.cnblogs.com/hxlasky/p/17807431.html
[root@localhost db_dir]# sqlite3
[root@localhost db_dir]# cd /opt/logstash-6.8.5/db_dir
[root@localhost db_dir]# sqlite3
SQLite version 3.42.0 2023-05-16 12:36:15
sqlite>.open logstash_sqlite.db
sqlite> select * from since_table;
logstash_since_app_message_all|65445a021d992ec438987ead
sqlite> .schema since_table
CREATE TABLE `since_table` (`table` varchar(255), `place` Int);
sqlite> update since_table set place = '65445964cfc901aef5a80dd0' where table='logstash_since_app_message_all';
sqlite> select * from since_table;
logstash_since_app_message_all|65445964cfc901aef5a80dd0
3.重新同步
[root@localhost config]# /opt/logstash-6.8.5/bin/logstash -f /opt/logstash-6.8.5/config/sync_mongo_es.conf