Elasticsearch 大量频繁使用UpdateByQuery 脚本更新字段值 报错
下面是出错内容:
WARNING:elasticsearch:POST http://es-cn-09k1o69vj0006jcz9.public.elasticsearch.aliyuncs.com:9200/crawl_basis_pn/_update_by_query [status:500 request:0.015s] DEBUG:elasticsearch:> {"query":{"term":{"_id":"bQlgboYBwWirVBbOLVBj"}},"script":{"source":"ctx._source.ProductUrl='https://www.bom2buy.com/partIntelligence/TL431AIYDT/';ctx._source.SubStatus=1"}} DEBUG:elasticsearch:< {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting","bytes_wanted":0,"bytes_limit":0,"durability":"TRANSIENT"}],"type":"general_script_exception","reason":"Failed to compile inline script [ctx._source.ProductUrl='https://www.bom2buy.com/partIntelligence/TL431AIYDT/';ctx._source.SubStatus=1] using lang [painless]","caused_by":{"type":"circuit_breaking_exception","reason":"[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting","bytes_wanted":0,"bytes_limit":0,"durability":"TRANSIENT"}},"status":500} ERROR:scrapy.core.engine:Error while obtaining start requests
ElasticSearch5分钟内执行脚本编译超过75个,编译太多而拒绝编译。编译是非常耗时的,这是ES的自我保护功能。下面是源码:
这个函数会时刻调用,要更新200w 条
def update_producturl(self,item): time.sleep(0.5) productUrl="https://www.bom2buy.com/partIntelligence/"+urllib.parse.quote(item['PN'],safe='')+"/" ubq = UpdateByQuery(using=esclient(), index=index_name) \ .query("term", _id=item['Id']) \ .script(source=f"ctx._source.ProductUrl='{productUrl}';ctx._source.SubStatus=1") res=ubq.execute() r=res
尝试解决办法:
将参数写入params,源码source就不需要重复编译。
def update_producturl(self,item): time.sleep(0.5) productUrl="https://www.bom2buy.com/partIntelligence/"+urllib.parse.quote(item['PN'],safe='')+"/" ubq = UpdateByQuery(using=esclient(), index=index_name) \ .query("term", _id=item['Id']) \ .script(source=f"ctx._source.ProductUrl=params.productUrl;ctx._source.SubStatus=1", params={ 'productUrl': productUrl }) res=ubq.execute() r=res