win环境下把MySql中的数据导入到Elasticsearch(二)
环境问题参考我上文:
https://blog.csdn.net/qq_24265945/article/details/81168158
环境问题已经好了,接下来,我们讲实战。
下载:mysql-connector-java-5.1.46.zip
该压缩包帮助mysql与其他平台连接。看到很多资源都要积分,不能选0分,所以选1分了
https://dev.mysql.com/downloads/file/?id=476197
在bin目录下创建jdbc.config
根据需求改连接,账号,密码,名字,端口等等。
input { stdin { } jdbc { # mysql jdbc connection string to our backup databse jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/zhangjiang?characterEncoding=UTF-8&useSSL=false" # the user we wish to excute our statement as jdbc_user => "root" jdbc_password => "111111" # the path to our downloaded jdbc driver jdbc_driver_library => "D:/logstash-6.3.1/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar" # the name of the driver class for mysql jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_paging_enabled => "true" jdbc_page_size => "50000" jdbc_default_timezone => "UTC" statement_filepath => "D:/logstash-6.3.1/logstash-6.3.1/bin/jdbc.sql" schedule => "* * * * *" type => "patent" } } filter { json { source => "message" remove_field => ["message"] } } output { elasticsearch { hosts => "localhost:9200" index => "patent" document_id => "%{id}" } stdout { codec => json_lines } }
创建jdbc.sql(这里根据自己需求改啦)
select Patent_ID as id, Patent_Num as pnum, Patent_Name as pname, Patent_Link as link, Patent_Applicant as applicant, Patent_Summary as summary, Patent_Date as pdate, Patent_ClassNum as classnum, Patent_Update as pupdate, Patent_Status as pstatus from patentinfor
在bin中输入 logstash -f jdbc.conf
疯狂导入啦~~~
这里也记录下几个报错的问题给大家参考:
1.
D:\logstash-6.3.1\logstash-6.3.1\bin>logstash -f jdbc.conf
Sending Logstash's logs to D:/logstash-6.3.1/logstash-6.3.1/logs which is now configured via log4j2.properties
[2018-07-23T09:40:23,744][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
2018-07-23 09:40:23,822 LogStash::Runner ERROR Unable to move file D:\logstash-6.3.1\logstash-6.3.1\logs\logstash-plain.log to D:\logstash-6.3.1\logstash-6.3.1\logs\logstash-plain-2018-07-20-1.log: java.nio.file.FileSystemException D:\logstash-6.3.1\logstash-6.3.1\logs\logstash-plain.log -> D:\logstash-6.3.1\logstash-6.3.1\logs\logstash-plain-2018-07-20-1.log: 另一个程序正在使用此文件,进程无法访问。
已经有端口使用了。查看下是不是开着别的cmd在调用logstash,关闭就是了
2.
D:\logstash-6.3.1\logstash-6.3.1\bin>logstash -f jdbc.conf
Sending Logstash's logs to D:/logstash-6.3.1/logstash-6.3.1/logs which is now configured via log4j2.properties
[2018-07-23T09:44:41,964][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-07-23T09:44:42,995][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.1"}
[2018-07-23T09:44:46,715][ERROR][logstash.outputs.elasticsearch] Unknown setting 'host' for elasticsearch
[2018-07-23T09:44:46,715][ERROR][logstash.outputs.elasticsearch] Unknown setting 'port' for elasticsearch
[2018-07-23T09:44:46,715][ERROR][logstash.outputs.elasticsearch] Unknown setting 'protocol' for elasticsearch
[2018-07-23T09:44:46,715][ERROR][logstash.outputs.elasticsearch] Unknown setting 'cluster' for elasticsearch
[2018-07-23T09:44:46,746][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Something is wrong with your configuration.", :backtrace=>["D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/config/mixin.rb:89:in `config_init'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/outputs/base.rb:62:in `initialize'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:202:in `initialize'", "org/logstash/config/ir/compiler/OutputDelegatorExt.java:68:in `initialize'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/plugins/plugin_factory.rb:93:in `plugin'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/pipeline.rb:110:in `plugin'", "(eval):39:in `<eval>'", "org/jruby/RubyKernel.java:994:in `eval'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/pipeline.rb:82:in `initialize'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/pipeline.rb:167:in `initialize'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/pipeline_action/create.rb:40:in `execute'", "D:/logstash-6.3.1/logstash-6.3.1/logstash-core/lib/logstash/agent.rb:305:in `block in converge_state'"]}
[2018-07-23T09:44:47,215][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
这里是参考了别的网页,本来在下文中还写了port,protocol等内容,删除就是了,同时注意是hosts而不是host;记得查看自己的sql文件,如果sql文件出错,同样会导致导入失败,结果是:可以手动在cmd写内容进es,但是mysql的内容并没有录入es
output {
elasticsearch {
hosts => "localhost:9200"
index => "patent"
document_id => "%{id}"
}
stdout {
codec => json_lines
}
最后附上成功运行后可视化软件kibana图:
一周后的后续:当我加入中文分词工具以后,突然发现一个问题,映射是elasticsearch默认自己创建的 ,一旦创建没有办法修改,所以建议大家插入数据前先创建mapping,把分词工具加进去,具体分词工具我这里采用elasticsearch中的IK Analysis,后续会考虑其他分词工具,比较效果。这里附上我分词工具IK Analysis的安装和使用后续:
https://blog.csdn.net/qq_24265945/article/details/81355504