使用logstash拉取MySQL数据存储到es中的再次操作

使用情况说明：
已经使用logstash拉取MySQL数据存储到es中，es中也创建了相应的索引，也存储了数据。假若把这个索引给删除了，再次进行同步操作的话要咋做，从最开始的数据进行同步，而不是新增的数据

官方文档地址：https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-tracking_column

官方原话：
The plugin will persist the sql_last_value parameter in the form of a metadata file stored in the configured last_run_metadata_path. Upon query execution, this file will be updated with the current value of sql_last_value. Next time the pipeline starts up, this value will be updated by reading from the file. If clean_run is set to true, this value will be ignored and sql_last_value will be set to Jan 1, 1970, or 0 if use_column_value is true, as if no query has ever been executed.

翻译：
插件将以sql_last_value元数据文件的形式保存配置文件中的参数last_run_metadata_path。执行查询后，该文件将更新为的当前值sql_last_value。下次管道启动时，将通过从文件中读取来更新此值。如果 clean_run设置为true，则将忽略此值并将其sql_last_value设置为1970年1月1日；如果use_column_value为true，则将其设置为0 ，就好像从未执行过任何查询一样。

具体到操作：

  jdbc {   
    jdbc_connection_string => "jdbc:mysql://192.168.0.145:3306/db_example?useUnicode=true&characterEncoding=UTF-8&serverTimezone=UTC"
    jdbc_user => "root"
    jdbc_password => "root"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_driver_library => ""
    jdbc_paging_enabled => true
    tracking_column => "unix_ts_in_secs"
    use_column_value => true
    tracking_column_type => "numeric"
    schedule => "*/5 * * * * *"
    statement => "SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_value AND modification_time < NOW()) ORDER BY modification_time ASC"
  }

通过查看jdbc{}语句中的statement，可以知道监控的是unix_ts_in_secs字段值，其值是UNIX_TIMESTAMP(modification_time)过来的

参数last_run_metadata_path默认会从$HOME/.logstash_jdbc_last_run文件中获取最后一次的值，也就是说，数据表中监控的modification_time字段数值，比文件中存储的大，则会拉取数据，否则就不会。

具体到我这边，这个文件的路径是/root/.logstash_jdbc_last_run

[root@bogon ~]# cat /root/.logstash_jdbc_last_run 
--- 1589189560

里面存储的是unix时间戳，跟数据表中最后一条数据的modification_time字段值想匹配。

那么接下来，或者修改这个文件里的时间戳值为数据表中modification_time字段最早的那个值，或者删除这个文件，然后再次执行拉取命令即可实现想要的再次同步数据到es的效果。

posted @ 2020-05-11 18:22 哈喽哈喽111111 阅读(956) 评论(0) 编辑收藏举报

刷新页面返回顶部