大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)

hbase有一个region一直处于rit状态,对该region进行move/assign/unassign都没有反应,使用hbck2进行assigns/unassigns也没有反应

查改hbase当前的lock状态发现

hbase(main):003:0> list_locks
NAMESPACE(default)                                                                                                                                                                                                                                      
Lock type: SHARED, count: 1                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                        
TABLE(apache_atlas_janus)                                                                                                                                                                                                                               
Lock type: SHARED, count: 1                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                        
REGION(05021681c404140ffcee58ea06f6c7d1)                                                                                                                                                                                                                
Lock type: EXCLUSIVE, procedure: {"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}
                                                                                                                                                                                                                                                        
Took 0.0737 seconds                                                                                                                                                                                                                                     
=> [{"resourceType"=>"NAMESPACE", "resourceName"=>"default", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"TABLE", "resourceName"=>"apache_atlas_janus", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"REGION", "resourceName"=>"05021681c404140ffcee58ea06f6c7d1", "lockType"=>"EXCLUSIVE", "exclusiveLockOwnerProcedure"=>{"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}, "sharedLockCount"=>0}]

改region上有一把lock,是procId=2的procedure加上的,查看所有的procedure

hbase(main):001:0> list_procedures
 PID Name State Submitted Last_Update Parameters
 2 org.apache.hadoop.hbase.master.assignment.UnassignProcedure WAITING_TIMEOUT 2022-06-15 18:24:02 +0800 2022-06-16 11:32:14 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}]
 3 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:25:02 +0800 2022-06-15 18:25:02 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 4 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:26:03 +0800 2022-06-15 18:26:03 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 37 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:43:25 +0800 2022-06-16 10:43:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 38 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:44:25 +0800 2022-06-16 10:44:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 39 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:45:25 +0800 2022-06-16 10:45:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 40 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 10:45:48 +0800 2022-06-16 10:45:48 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 41 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:46:25 +0800 2022-06-16 10:46:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 42 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:54:54 +0800 2022-06-16 10:54:54 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 43 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:11:45 +0800 2022-06-16 11:11:45 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 44 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:13:14 +0800 2022-06-16 11:13:14 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "override"=>true}]
 45 org.apache.hadoop.hbase.master.procedure.DisableTableProcedure RUNNABLE 2022-06-16 11:17:20 +0800 2022-06-16 11:17:20 +0800 [{}, {"userInfo"=>{"effectiveUser"=>"root"}, "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "skipTableStateCheck"=>false}]
 1556 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:30:11 +0800 2022-06-16 11:30:11 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
13 row(s)
Took 0.6656 seconds

发现刚才的命令触发了有很多个procedure都在尝试操作该region,然后卡在第一个procedure上,因为第一个procedure持有lock

hbase hbck -j hbase-operator-tools-1.1.0/hbase-hbck2/hbase-hbck2-1.1.0.jar bypass -o -r $PROCEDURE_PID

通过hbck2来bypass这些procedure,问题解决。

参考:
https://stackoverflow.com/questions/56321514/how-to-abort-kill-a-procedure-in-hbase

posted @ 2022-06-16 14:12  匠人先生  阅读(917)  评论(0编辑  收藏  举报