ElasticSearch(十一)批量CURD bulk
1、bulk语法
POST /_bulk { "delete": { "_index": "test_index", "_type": "test_type", "_id": "3" }} { "create": { "_index": "test_index", "_type": "test_type", "_id": "12" }} { "test_field": "test12" } { "index": { "_index": "test_index", "_type": "test_type", "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} }
每一个操作要两个json串,语法如下: {"action": {"metadata"}} {"data"}
举例,比如你现在要创建一个文档,放bulk里面,看起来会是这样子的: {"index": {"_index": "test_index", "_type", "test_type", "_id": "1"}} {"test_field1": "test1", "test_field2": "test2"}
有哪些类型的操作可以执行呢? (1)delete:删除一个文档,只要1个json串就可以了 (2)create:PUT /index/type/id/_create,强制创建 (3)index:普通的put操作,可以是创建文档,也可以是全量替换文档 (4)update:执行的partial update操作
bulk api对json的语法,有严格的要求,每个json串不能换行,只能放一行,同时一个json串和一个json串之间,必须有一个换行
如果格式是这样:
POST /_bulk {"index": {"_index": "test_index", "_type": "test_type","_id": "1"} } {"test_field1": "test1", "test_field2": "test2"}
结果:
{ "error": { "root_cause": [ { "type": "json_e_o_f_exception", "reason": "Unexpected end-of-input within/between Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7da6bde0; line: 1, column: 21]" } ], "type": "json_e_o_f_exception", "reason": "Unexpected end-of-input within/between Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7da6bde0; line: 1, column: 21]" }, "status": 500 }
如果格式是这样:
POST /_bulk {"index": {"_index": "test_index", "_type": "test_type","_id": "1"}} {"test_field1": "test1", "test_field2": "test2"}
结果:
{ "took" : 19, "errors" : false, "items" : [ { "index" : { "_index" : "test_index", "_type" : "test_type", "_id" : "1", "_version" : 3, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 5, "status" : 200 } } ] }
bulk操作中,任意一个操作失败,是不会影响其他的操作的,但是在返回结果里,会告诉你异常日志
POST /test_index/_bulk { "delete": { "_type": "test_type", "_id": "3" }} { "create": { "_type": "test_type", "_id": "12" }} { "test_field": "test12" } { "index": { "_type": "test_type" }} { "test_field": "auto-generate id test" } { "index": { "_type": "test_type", "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} }
#! Deprecation: Deprecated field [_retry_on_conflict] used, expected [retry_on_conflict] instead { "took" : 64, "errors" : true, "items" : [ { "delete" : { "_index" : "test_index", "_type" : "test_type", "_id" : "3", "_version" : 1, "result" : "not_found", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 5, "status" : 404 } }, { "create" : { "_index" : "test_index", "_type" : "test_type", "_id" : "12", "status" : 409, "error" : { "type" : "version_conflict_engine_exception", "reason" : "[test_type][12]: version conflict, document already exists (current version [2])", "index_uuid" : "P8_8FJpGSgW8HglInNvZYQ", "shard" : "1", "index" : "test_index" } } }, { "index" : { "_index" : "test_index", "_type" : "test_type", "_id" : "Baq9WWgBjIP9BXE3vrJ2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 5, "status" : 201 } }, { "index" : { "_index" : "test_index", "_type" : "test_type", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 5, "status" : 201 } }, { "update" : { "_index" : "test_index", "_type" : "test_type", "_id" : "1", "_version" : 4, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 5, "status" : 200 } } ] }
同一个index和type的情况下:
POST /test_index/test_type/_bulk { "delete": { "_id": "3" }} { "create": { "_id": "12" }} { "test_field": "test12" } { "index": { }} { "test_field": "auto-generate id test" } { "index": { "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} }
2、bulk size最佳大小
bulk request会加载到内存里,如果太大的话,性能反而会下降,因此需要反复尝试一个最佳的bulk size。一般从1000~5000条数据开始,尝试逐渐增加。另外,如果看大小的话,最好是在5~15MB之间。