hbase迁移快照ExportSnapshot时遇到的错

1、Cannot allocate memory

报错信息:

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005c5330000, 8502706176, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid9168.log

  

 

 

 日志

上面报错信息提示,查看更多,去/root/hs_err_pid9168.log里面查看。

#查看
vim /root/hs_err_pid9168.log
#内容
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2743), pid=9168, tid=0x00007f22fdcce700
#
# JRE version:  (8.0_191-b12) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /root/core or core.9168
#

---------------  T H R E A D  ---------------

Current thread (0x00007f22f4016000):  JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)]

Stack: [0x00007f22fdbcf000,0x00007f22fdccf000],  sp=0x00007f22fdccd4c0,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xace425]  VMError::report_and_die()+0x2c5
V  [libjvm.so+0x4deb77]  report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0x67
V  [libjvm.so+0x90c570]  os::pd_commit_memory(char*, unsigned long, unsigned long, bool)+0x100
V  [libjvm.so+0x903eaf]  os::commit_memory(char*, unsigned long, unsigned long, bool)+0x1f
V  [libjvm.so+0xaca93c]  VirtualSpace::initialize(ReservedSpace, unsigned long)+0x20c
V  [libjvm.so+0x5ea477]  CardGeneration::CardGeneration(ReservedSpace, unsigned long, int, GenRemSet*)+0xc7
V  [libjvm.so+0x5eb842]  GenerationSpec::init(ReservedSpace, int, GenRemSet*)+0x182
V  [libjvm.so+0x5d699f]  GenCollectedHeap::initialize()+0x20f
V  [libjvm.so+0xa922ba]  Universe::initialize_heap()+0x16a
V  [libjvm.so+0xa92593]  universe_init()+0x33
V  [libjvm.so+0x62f0f0]  init_globals()+0x50
V  [libjvm.so+0xa74c57]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x257
V  [libjvm.so+0x6d49ff]  JNI_CreateJavaVM+0x4f
C  [libjli.so+0x7e74]  JavaMain+0x84
C  [libpthread.so.0+0x7dd5]  start_thread+0xc5


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )

Other Threads:

=>0x00007f22f4016000 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)]

VM state:not at safepoint (not fully initialized)

VM Mutex/Monitor currently owned by a thread: None

GC Heap History (0 events):
No events

Deoptimization events (0 events):
No events

Classes redefined (0 events):
No events

Internal exceptions (0 events):
No events

Events (0 events):
No events


Dynamic libraries:
00400000-00401000 r-xp 00000000 fd:00 202099235                          /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00600000-00601000 r--p 00000000 fd:00 202099235                          /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00601000-00602000 rw-p 00001000 fd:00 202099235                          /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00dc0000-00dfa000 rw-p 00000000 00:00 0                                  [heap]
5c0000000-5c5330000 rw-p 00000000 00:00 0
7f22e5000000-7f22e5270000 rwxp 00000000 00:00 0
7f22e5270000-7f22f4000000 ---p 00000000 00:00 0
7f22f4000000-7f22f4043000 rw-p 00000000 00:00 0
7f22f4043000-7f22f8000000 ---p 00000000 00:00 0
7f22f9b03000-7f22f9ec1000 rw-p 00000000 00:00 0
7f22f9ec1000-7f22fae97000 ---p 00000000 00:00 0
7f22fae97000-7f22fae98000 rw-p 00000000 00:00 0
7f22fae98000-7f22fae99000 ---p 00000000 00:00 0
7f22fae99000-7f22fafa3000 rw-p 00000000 00:00 0
7f22fafa3000-7f22fb359000 ---p 00000000 00:00 0
7f22fb359000-7f22fb373000 r-xp 00000000 fd:00 134320156                  /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb373000-7f22fb573000 ---p 0001a000 fd:00 134320156                  /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb573000-7f22fb574000 r--p 0001a000 fd:00 134320156                  /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb574000-7f22fb575000 rw-p 0001b000 fd:00 134320156                  /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb575000-7f22fb581000 r-xp 00000000 fd:00 1048275                    /usr/lib64/libnss_files-2.17.so
7f22fb581000-7f22fb780000 ---p 0000c000 fd:00 1048275                    /usr/lib64/libnss_files-2.17.so
7f22fb780000-7f22fb781000 r--p 0000b000 fd:00 1048275                    /usr/lib64/libnss_files-2.17.so
7f22fb781000-7f22fb782000 rw-p 0000c000 fd:00 1048275                    /usr/lib64/libnss_files-2.17.so
7f22fcdb2000-7f22fcfb2000 ---p 00ce2000 fd:00 19060                      /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fcfb2000-7f22fd048000 r--p 00ce2000 fd:00 19060                      /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fd048000-7f22fd079000 rw-p 00d78000 fd:00 19060                      /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fd079000-7f22fd0b4000 rw-p 00000000 00:00 0
7f22fd476000-7f22fd47a000 r--p 001c2000 fd:00 34193                      /usr/lib64/libc-2.17.so
7f22fd47a000-7f22fd47c000 rw-p 001c6000 fd:00 34193                      /usr/lib64/libc-2.17.so
7f22fd47c000-7f22fd481000 rw-p 00000000 00:00 0
7f22fd481000-7f22fd483000 r-xp 00000000 fd:00 34199                      /usr/lib64/libdl-2.17.so
7f22fd483000-7f22fd683000 ---p 00002000 fd:00 34199                      /usr/lib64/libdl-2.17.so
7f22fd683000-7f22fd684000 r--p 00002000 fd:00 34199                      /usr/lib64/libdl-2.17.so
7f22fd684000-7f22fd685000 rw-p 00003000 fd:00 34199                      /usr/lib64/libdl-2.17.so
7f22fd685000-7f22fd69c000 r-xp 00000000 fd:00 67296956                   /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd69c000-7f22fd89b000 ---p 00017000 fd:00 67296956                   /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89b000-7f22fd89c000 r--p 00016000 fd:00 67296956                   /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89c000-7f22fd89d000 rw-p 00017000 fd:00 67296956                   /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89d000-7f22fd8b4000 r-xp 00000000 fd:00 1051052                    /usr/lib64/libpthread-2.17.so
7f22fd8b4000-7f22fdab3000 ---p 00017000 fd:00 1051052                    /usr/lib64/libpthread-2.17.so
7f22fdab3000-7f22fdab4000 r--p 00016000 fd:00 1051052                    /usr/lib64/libpthread-2.17.so
7f22fdab4000-7f22fdab5000 rw-p 00017000 fd:00 1051052                    /usr/lib64/libpthread-2.17.so
7f22fdab5000-7f22fdab9000 rw-p 00000000 00:00 0
7f22fdab9000-7f22fdadb000 r-xp 00000000 fd:00 33871                      /usr/lib64/ld-2.17.so
7f22fdbc6000-7f22fdbce000 rw-s 00000000 fd:00 503948                     /tmp/hsperfdata_root/9168
7f22fdbce000-7f22fdbd2000 ---p 00000000 00:00 0
7f22fdbd2000-7f22fdcd3000 rw-p 00000000 00:00 0
7f22fdcd4000-7f22fdcd8000 rw-p 00000000 00:00 0
7f22fdcd8000-7f22fdcd9000 r--p 00000000 00:00 0
7f22fdcd9000-7f22fdcda000 rw-p 00000000 00:00 0
7f22fdcda000-7f22fdcdb000 r--p 00021000 fd:00 33871                      /usr/lib64/ld-2.17.so
7f22fdcdb000-7f22fdcdc000 rw-p 00022000 fd:00 33871                      /usr/lib64/ld-2.17.so
7f22fdcdc000-7f22fdcdd000 rw-p 00000000 00:00 0
7fff013af000-7fff013d3000 rw-p 00000000 00:00 0                          [stack]
7fff013ec000-7fff013ee000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

VM Arguments:
Launcher Type: SUN_STANDARD

Environment Variables:
JAVA_HOME=/usr/local/soft/jdk/jdk1.8.0_191
LD_LIBRARY_PATH=:/opt/hadoop-3.1.2/lib:/opt/hadoop-3.1.2/lib/native
SHELL=/bin/bash

Signal Handlers:
SIGSEGV: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGBUS: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGFPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGPIPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGXFSZ: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGILL: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGUSR2: [libjvm.so+0x907b70], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
SIGHUP: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGINT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGTERM: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGQUIT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none


---------------  S Y S T E M  ---------------

OS:CentOS Linux release 7.5.1804 (Core)

uname:Linux 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE infinity, NPROC 71318, NOFILE 102400, AS infinity
load average:4.18 4.20 4.11

/proc/meminfo:
MemTotal:       18281924 kB
MemFree:          199928 kB
MemAvailable:    1747332 kB
Buffers:               0 kB
Cached:          1806360 kB
SwapCached:         2488 kB
Active:         15057324 kB
Inactive(anon):  1729572 kB
Active(file):     865640 kB
Inactive(file):   864956 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       5238780 kB
SwapFree:        5196284 kB
Dirty:             11104 kB
Writeback:             0 kB
AnonPages:      15843208 kB
Mapped:            70700 kB
Shmem:             75776 kB
Slab:             202108 kB
SReclaimable:     156948 kB
SUnreclaim:        45160 kB
KernelStack:       14960 kB
PageTables:        38372 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    14379740 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2625536 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       81792 kB
DirectMap2M:     5111808 kB
DirectMap1G:    13631488 kB

container (cgroup) information:
container_type: cgroupv1
cpu_cpuset_cpus: 0
cpu_memory_nodes: 0
active_processor_count: 1
cpu_quota: -1
cpu_period: 100000
cpu_shares: -1
memory_limit_in_bytes: -1
memory_and_swap_limit_in_bytes: -1
memory_soft_limit_in_bytes: -1
memory_usage_in_bytes: 18073210880
memory_max_usage_in_bytes: 0


CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 85 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse
4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, tsc, tscinvbit, bmi1, bmi2, adx

/proc/cpuinfo:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
stepping        : 4
microcode       : 0x200004d
cpu MHz         : 2294.738
cache size      : 16896 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant
_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer ae
s xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm mpx avx512f avx512dq rdseed adx smap cl
flushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec ibpb ibrs stibp arat pku ospke spec_ctrl intel_stibp arch_capabilities
bogomips        : 4589.47
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:



Memory: 4k page, physical 18281924k(199928k free), swap 5238780k(5196284k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (25.191-b12) for linux-amd64 JRE (1.8.0_191-b12), built on Oct  6 2018 05:43:09 by "java_re" with gcc 7.3.0

time: Tue Sep 17 09:54:53 2019
elapsed time: 0 seconds (0d 0h 0m 0s)

原因分析

明显是由于内存不够,查看内存占用

df -h

 

 

 

解决

#重启hbase、hadoop
stop-hbase.sh
stop-all.sh

start-all.sh
start-hbase.sh
#清除缓存
sync ;
echo 1 >/proc/sys/vm/drop_caches 
echo 2 >/proc/sys/vm/drop_caches 
echo 3 >/proc/sys/vm/drop_caches 
#再次查看内存占用
df -h

 

 

 

2、Application is added to the scheduler and is not yet activated

再次迁移快照时,任务一直停留着不动

 

 

 查看web,显示如下:

Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : 
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>;
User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;

hbase shell能正常进入,但是输入命令,报错

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
        at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2977)
        at org.apache.hadoop.hbase.master.MasterRpcServices.getCompletedSnapshots(MasterRpcServices.java:949)
        at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

 

 

 

jps查看,HRegionServer没有起来

 

 

 

分析HRegionServer的日志

tailf /opt/hbase-2.1.4/logs/hbase-root-regionserver-hbase2.log -n 500
#报错信息

2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] coordination.ZkSplitLogWorkerCoordination: successfully transitioned task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568684725860 to final state ERR hbase2,16020,1568688376805
2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] handler.WALSplitterHandler: Worker hbase2,16020,1568688376805 done with task org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@9885422 in 200ms. Status = ERR
2019-09-17 10:59:23,158 INFO [SplitLogWorker-hbase2:16020] coordination.ZkSplitLogWorkerCoordination: worker hbase2,16020,1568688376805 acquired task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568683832472
2019-09-17 10:59:23,179 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] wal.WALSplitter: Splitting WAL=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472, length=138158040
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recover lease on dfs file hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472 after 0ms
2019-09-17 10:59:23,201 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter:
Found old edits file. It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,222 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-0] wal.WALSplitter: Found old edits file.
It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/2501a70608674eab4974e7f8006dac12/recovered.edits/0000000000007214923-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,235 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-2] wal.WALSplitter: Found old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hbase2:9000/hbase/default/tsdb/8d5fe84d54f4170f35d33dff7b830444/recovered.edits/0000000000006173823-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,411 WARN [Thread-8138] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Got while writing log entry to log
java.io.IOException: File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1601)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1559)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1084)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1076)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1046)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Exiting thread

分析日志,发现,在反复切割1568617738890文件,那我们将这个文件删除,再重启

#停止hbase
stop-hbase.sh


#查看WALs
hdfs dfs -ls  -R /hbase/WALs
#删除WALs
hdfs dfs -rm -R /hbase/WALs
#清空zk里面的hbase
zkCli.sh
rmr /hbase 
#启动hbase start-hbase.sh

 

 

 

 

再次jps查看

 

 发现HRegionServer成功启动,去hbase shell里面输命令

 

 3、Unexpected error starting NodeStatusUpdater

问题描述

 

 每次迁移快照时,就停留在2019-09-20 14:08:39,184 INFO  [main] mapreduce.Job: Running job: job_1568959466252_0001,去web上查看任务,可看到其诊断提示:

Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details :
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>;
Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>;
Queue AM Resource Usage = <memory:0, vCores:0>;

 

 

 

 

 

 

 退回8088页面,查看到Active Nodes为0:

 

 去50070页面,可以看到Live Nodes为2:

 

到这里,我估计是两个子节点,出现了问题,jps查看每个节点:                                                         

  

 

 看到ResourceManager上面的在hbase0上时,我想起,yarn.resourcemanager.hostname这个配置,当时集群搭建时,我全部给的是本机的hostname,这样肯定是不对的,于是去查两个子节点的日志:

tailf /opt/sfot/hadoop/hadoop-3.1.3/hadoop-root-nodemanager-hbase1.log -n 500

果不其然,看到了以下报错:

2019-09-20 11:53:52,458 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
java.net.ConnectException: Call From hbase1/192.168.0.211 to hbase1:8031 failed on connection exception: java.net.ConnectException: 
Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(P

在主节点上查看8031端口:

netstat -ntual |grep 8031

 

解决

 去2个子节点将配置文件中有关yarn.resourcemanager.hostname的配置修改过来:

vim /opt/soft/hadoop/hadoop-3.1.2/etc/hadoop/yarn-site.xml

 

验证 

重启hadoop、hbase之后,再在主节点上查看8031端口

 

 再次迁移快照:

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_tsdb_212 -copy-from hdfs://192.168.0.210:9000/hbase -copy-to hdfs://192.168.0.212:9000/hbase -mappers 20 -bandwidth 1024

发现成功

 

 并且在web上查看Active Nodes为2

 PS

到这里,想起要提醒一下,与伪分布式不同的是,要注意hbase的配置文件hbase-site.xml,在指定hbase.rootdir时,保持一致,我这里主节点是hbase0,3个节点的配置都如下:

<property>
              <name>hbase.rootdir</name>
              <value>hdfs://hbase0:9000/hbase</value>
        </property>
       <property>
              <name>hbase.cluster.distributed</name>
              <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
        </property>

        <property>
              <name>hbase.zookeeper.quorum</name>
              <value>hbase0,hbase1,hbase3</value>
        </property>
        <property>
                <name>hbase.unsafe.stream.capability.enforce</name>
                <value>false</value>
        </property>

        <property>
               <name>hbase.wal.provider</name>
              <value>filesystem</value>
        </property>

        <property>
              <name>hbase.tmp.dir</name>
              <value>/opt/soft/hbase/hbase-2.1.4/tmpdata</value>
        </property>
        <property>
                <name>hfile.block.cache.size</name>
                <value>0.2</value>
        </property>
        <property>
                <name>hbase.snapshot.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>zookeeper.session.timeout</name>
                <value>180000</value>
        </property>

 

 

 

posted @ 2019-09-17 14:27  落泪秋  阅读(1943)  评论(0编辑  收藏  举报