Otter专题

 

A:故障:node未找到.                   
原因:node机器添加完成后,跳转到机器列表页面,获取对应的机器序号nid,node节点对应的唯一标识

B: 故障:Unsupported BinlogFormat MIXED
              show variables like '%log%';
        看到 binlog_format值仍为MIXED
    原因:my.cnf下有多余binlog_format赋值MIXED

有一点特别注意:目前canal支持mixed,row,statement多种日志协议的解析,但配合otter进行数据库同步,目前仅支持row协议的同步,使用时需要注意.
https://github.com/alibaba/otter/wiki/QuickStart

C: 故障:channel挂起,从库未同步主库表数据。

 (解决方案同DDL语句不能执行故障Exception: no support ddl for)
解决方案;在otter的管理界面,点击同步管理,停用Channel,点击进入Pipeline==>管理=>最下面高级=>跳过ddl异常。重启就OK了

 

 


挂起时的报错信息:

pid:9 nid:4 exception:setl:com.alibaba.otter.node.etl.load.exception.LoadException: java.util.concurrent.ExecutionException: com.alibaba.otter.node.etl.load.exception.LoadException: com.alibaba.otter.node.etl.load.exception.LoadException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar []; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
Caused by: java.util.concurrent.ExecutionException: com.alibaba.otter.node.etl.load.exception.LoadException: com.alibaba.otter.node.etl.load.exception.LoadException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar []; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.alibaba.otter.node.etl.load.loader.db.DataBatchLoader.load(DataBatchLoader.java:107)
    at com.alibaba.otter.node.etl.load.loader.OtterLoaderFactory.load(OtterLoaderFactory.java:50)
    at com.alibaba.otter.node.etl.load.LoadTask$1.run(LoadTask.java:85)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.alibaba.otter.node.etl.load.exception.LoadException: com.alibaba.otter.node.etl.load.exception.LoadException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar []; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
Caused by: com.alibaba.otter.node.etl.load.exception.LoadException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar []; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
Caused by: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar []; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
    at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:98)
    at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
    at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:407)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.doDdl(DbLoadAction.java:357)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.load(DbLoadAction.java:135)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$$FastClassByCGLIB$$d932a4cb.invoke()
    at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191)
    at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$$EnhancerByCGLIB$$80fd23c2.load()
    at com.alibaba.otter.node.etl.load.loader.db.DataBatchLoader$2.call(DataBatchLoader.java:198)
    at com.alibaba.otter.node.etl.load.loader.db.DataBatchLoader$2.call(DataBatchLoader.java:189)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'CompanyName' doesn't exist in table
    at sun.reflect.GeneratedConstructorAccessor91.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:943)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3970)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3906)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2545)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2503)
    at com.mysql.jdbc.StatementImpl.executeInternal(StatementImpl.java:839)
    at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:739)
    at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
    at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$1.doInStatement(DbLoadAction.java:369)
    at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$1.doInStatement(DbLoadAction.java:357)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396)
    ... 14 more


https://blog.csdn.net/weixin_36567485/java/article/details/78021636





1. canal和otter的关系?
答: 在回答这问题之前,首先来看一张canal&otter和mysql复制的类比图.

mysql的自带复制技术可分成三步:

master将改变记录到二进制日志(binary log)中(这些记录叫做二进制日志事件,binary log events,可以通过show binlog events进行查看);
slave将master的binary log events拷贝到它的中继日志(relay log),这里是I/O thread线程.
slave重做中继日志中的事件,将改变反映它自己的数据,这里是SQL thread线程.
基于canal&otter的复制技术和mysql复制类似,具有类比性.

Canal对应于I/O thread,接收Master Binary Log.
Otter对应于SQL thread,通过Canal获取Binary Log数据,执行同步插入数据库.
两者的区别在于:

otter目前嵌入式依赖canal,部署为同一个jvm,目前设计为不产生Relay Log,数据不落地.
otter目前允许自定义同步逻辑,解决各类需求.
a. ETL转化. 比如Slave上目标表的表名,字段名,字段类型不同,字段个数不同等.
b. 异构数据库. 比如Slave可以是oracle或者其他类型的存储,nosql等.
c. M-M部署,解决数据一致性问题
d. 基于manager部署,方便监控同步状态和管理同步任务.
2. canal目前支持的数据库版本?
答: 支持mysql系列的5.1 ~ 5.6/5.7版本,mariadb 5/10版本. (全面支持ROW/STATEMENT/MIXED几种binlog格式的解析)

 

3. otter目前支持的数据库情况?
答:这里总结了一下

从问题1中的图中看到,otter依赖canal解决数据库增量日志,所以会收到canal的版本支持限制,仅支持mysql系列,不支持oracle做为master库进行解析.
mysql做为master,otter只支持ROW模式的数据同步,其他两种模式不支持. (只有ROW模式可保证数据的最终一致性)
目标库,也就是slave,可支持mysql/oracle,也就是说可以将mysql的数据同步到oracle库中,反过来不行.
4. otter目前存在的同步限制?
答:这里总结了一下

暂不支持无主键表同步. (同步的表必须要有主键,无主键表update会是一个全表扫描,效率比较差)
支持部分ddl同步 (支持create table / drop table / alter table / truncate table / rename table / create index / drop index,其他类型的暂不支持,比如grant,create user,trigger等等),同时ddl语句不支持幂等性操作,所以出现重复同步时,会导致同步挂起,可通过配置高级参数:跳过ddl异常,来解决这个问题.
不支持带外键的记录同步. (数据载入算法会打散事务,进行并行处理,会导致外键约束无法满足)
数据库上trigger配置慎重. (比如源库,有一张A表配置了trigger,将A表上的变化记录到B表中,而B表也需要同步。如果目标库也有这trigger,在同步时会插入一次A表,2次B表,因为A表的同步插入也会触发trigger插入一次B表,所以有2次B表同步.)

5. otter同步相比于mysql的优势?
答:

管理&运维方便. otter为纯java开发的系统,提供web管理界面,一站式管理整个公司的数据库同步任务.
同步效率提升. 在保证数据一致性的前提下,拆散原先Master的事务内容,基于pk hash的并发同步,可以有效提升5倍以上的同步效率.
自定义同步功能. 支持基于增量复制的前提下,定义ETL转化逻辑,完成特殊功能.
异地机房同步. 相比于mysql异地长距离机房的复制效率,比如阿里巴巴杭州和美国机房,复制可以提升20倍以上. 长距离传输时,master向slave传输binary log可能会是一个瓶颈.
双A机房同步. 目前mysql的M-M部署结构,不支持解决数据的一致性问题,基于otter的双向复制+一致性算法,可完美解决这个问题,真正实现双A机房.
特殊功能.
a. 支持图片同步. 数据库中的一条记录,比如产品记录,会在数据库里存在一张图片的path路径,可定义规则,在同步数据的同时,将图片同步到目标.

6. node jvm内存不够用,如何解决?
node出现java.lang.OutOfMemoryError : Gc overhead limit exceeded.

答:

单node建议的同步任务,建议控制下1~2wtps以下,不然内存不够用. 出现不够用时,具体的解决方案:

调大node的-Xms,-Xmx内存设置,默认为3G,heap区大概是2GB
减少每个同步的任务内存配置.
a. canal配置里有个内存存储buffer记录数参数,默认是32768,代表32MB的binlog,解析后在内存中会占用100MB的样子.
b. pipeline配置里有个批次大小设置,默认是6000,代表每次获取6MB左右,解析后在内存占用=并行度*6MB*3,大概也是100MB的样子.

所以默认参数,全速跑时,单个通道占用200MB的样子,2GB能跑几个大概能估算出来了

 

7. 源库binlog不正确,如何重置同步使用新的位点开始同步?
场景:

源库binlog被删除,比如出现:Could not find first log file name in binary log index file
源库binlog出现致命解析错误,比如运行过程使用了删除性质的ddl,drop table操作,导致后续binlog在解析时无法获取表结构.
答:

首先需要理解一下canal的位置管理,主要有两个位点信息:起始位置 和 运行位置(记录最后一次正常消费的位置).
优先加载运行位置,第一次启动无运行位置,就会使用起始位置进行初始化,第一次客户端反馈了ack信号后,就会记录运行位置.

所以重置位置的几步操作:
(1)删除运行位置. (pipeline同步进度页面配置)

(2)配置起始位置. (canal配置页面)

(3)检查是否生效. (pipeline对应的日志)

注意点:

如果日志中出现prepare to find start position just last position. 即代表是基于运行位置启动,新设置的位点并没有生效。 (此时需要检查下位点删除是否成功 或者 canal是否被多个pipeline引用,导致位点删除后,被另一个pipeline重新写入,导致新设置的位点没有生效.)

otter中使用canal,不允许pipeline共享一个canal. otter中配置的canal即为一个instance,而otter就是为其一个client,目前canal不支持一个instance多个client的模式,会导致数据丢失,慎重.

点 启用后的日志:
2020-05-22 18:54:34.094 [pipelineId = 1,taskName = SelectTask] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem is not run any in node
2020-05-22 18:54:34.153 [destination = crm_canal , address = mysql.zkh360.com/120.27.222.64:13306 , EventParser] WARN  c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - ---> begin to find start position, it will be long time for reset or first position
2020-05-22 18:54:34.153 [destination = crm_canal , address = mysql.zkh360.com/120.27.222.64:13306 , EventParser] WARN  c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - prepare to find start position mysql-uat-new-mysqlha-0-bin.002301:443951933:null
2020-05-22 18:54:34.153 [destination = crm_canal , address = mysql.zkh360.com/120.27.222.64:13306 , EventParser] WARN  c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-uat-new-mysqlha-0-bin.002301,position=443951933,serverId=<null>,gtid=<null>,timestamp=<null>] cost : 0ms , the next step is binlog dump

 

2020-05-20 18:21:54.686 [pipelineId = 4,taskName = SelectTask] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem check is interrupt
2020-05-20 18:56:24.826 [pipelineId = 4,taskName = SelectTask] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem is not run any in node
2020-05-20 18:56:24.872 [pipelineId = 4,taskName = SelectTask] WARN  c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Could not load properties from class path resource [canal.properties]: class path resource [canal.properties] cannot be opened because it does not exist
2020-05-20 18:56:24.872 [pipelineId = 4,taskName = SelectTask] WARN  c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Could not load properties from class path resource [instance.properties]: class path resource [instance.properties] cannot be opened because it does not exist
2020-05-20 18:56:24.929 [destination = _channel_new , address = rds-test.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2020-05-20 18:56:24.933 [destination = _channel_new , address = rds-test.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just last position
 {"identity":{"slaveId":-1,"sourceAddress":{"address":"rds-test.mysql.rds.aliyuncs.com","port":3306}},"postion":{"gtid":"","included":false,"journalName":"mysql-bin.005447","position":508370854,"serverId":2339357115,"timestamp":1589970112000}}
2020-05-20 18:56:24.944 [destination = _channel_new , address = rds-test.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.005447,position=508370854,serverId=2339357115,gtid=,timestamp=1589970112000] cost : 15ms , the next step is binlog dump

 

 一个现象,需要排查,为什么没有从 运行位置 启动,而是从 起始位置  启动:

运行位置:
{"journalName":"mysql-uat-new-mysqlha-0-bin.002534","position":314056714,"timestamp":1591076810000}

起始位置【配置在canal中】:
{"journalName":"mysql-uat-new-mysqlha-0-bin.002532","position":597087918,"timestamp":1591066862000};

在界面点停用,再点启动后,otter拿的起始位点是 canal中配置的那个,而不是停止时正在运行的那个位点。
2020-06-02 13:51:03.063 [destination = crm_canal , address = mysql.zkh360.com/120.27.222.64:13306 , EventParser] WARN c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - prepare to find start positionmysql-uat-new-mysqlha-0-bin.002532:597087918:1591066862000
2020-06-02 13:51:03.064 [destination = crm_canal , address = mysql.zkh360.com/120.27.222.64:13306 , EventParser] WARN c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-uat-new-mysqlha-0-bin.002532,position=597087918,serverId=<null>,gtid=<null>,timestamp=1591066862000] cost : 0ms , the next step is binlog dump

 


 

 com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser#findStartPositionInternal

 

 



8. 日志列表中出现POSITIONTIMEOUT后,数据进行重复同步?
答:首先需要理解下Position超时监控,该监控主要是监控当前同步进度中的位点的最后更新时间和当前时间的差值,简单点说就是看你同步进度的位点多少时间没有更新了,超过阀值后触发该报警规则.

还有一点需要说明,同步进度中的位点只会记录一个事务的BEGIN/COMMIT位置,保证事务处理的完整性,不会记录事务中的中间位置.

几种情况下会出现同步进度位点长时间无更新:

(1)源库出现大事务,比如进行load data/ delete * from xxx,同时操作几百万/千万的数据,同步该事务数据时,位点信息一直不会被更新,比如默认超过10分钟后,就会触发Position超时监控,此时就是一个误判,触发自动恢复,又进入重新同步,然后进入死循环。
(2)otter系统未知bug,导致系统的调度算法出现死锁,无法正常的同步数据,导致同步进度无法更新,触发该Position超时监控。
        此时:自动恢复的   一次停用+启用   同步任务,就可以恢复正常.
        ps. 该Position超时监控,可以说是主要做为一种otter的系统保险机制,可以平衡一下,如果误判的影响>系统bug触发的概率,可以考虑关闭Position超时监控,一般超时监控也会发出报警.


9. 日志列表中出现miss data with keys异常,同步出现挂起后又自动恢复?
异常信息:
pid:2 nid:2 exception:setl:load miss data with keys:[MemoryPipeKey[identity=Identity[channelId=1,pipelineId=2,processId=4991953],time=1383190001987,dataType=DB_BATCH]]
答:要理解该异常,需要先了解一下otter调度模型,里面SEDA中多个stage之间通过pipe进行数据交互,比如T模块完成后会将数据存到pipe中,然后通知SEDA中心,中心会通知L模块起来工作,L模块就会根据T传给中心的pipeKey去提取数据,而该异常就是当L模块根据pipeKey去提取数据时,发现数据没了。 主要原因:pipe在设计时,如果是单机传输时,会使用softReference来存储,所以当jvm内存不足时就会被GC掉,所以就会出现无数据的情况.
ps. 如果miss data with keys异常非常多的时候,你就得考虑是否当前node已经超负载运行,内存不够,需要将上面的部分同步任务迁移出去。如果是偶尔的异常,那可以忽略,该异常会有自动恢复RESTART同步任务的处理。

10. 日志列表中出现manager异常?
异常信息:
pid:2 nid:null exception:channel:can't restart by no select live node
该异常代表pipelineId = 2,select模块的node没有可用的节点.

异常信息:
pid:-1 nid:null exception:cid:2 restart recovery successful for rid:-1
该异常代表channelId = 2,成功发起了一次restart同步任务的操作.

异常信息:
pid:-1 nid:null exception:nid:2 is dead and restart cids:[1,2]
该异常代表node id = 2,因为该node挂了,触发了channelId = 1 / 2的两个同步任务发起restart同步任务的操作. (一种failover的机制)

https://github.com/alibaba/otter/wiki/Faq

 

 

监控通道的延迟情况,可以从otter.delay_stat表中得到
https://blog.csdn.net/woson_wang/article/details/89011647

监控日志可在表 otter.log_record表中查

 

 



 【todo】

pid:8 nid:4 exception:canal:espro:com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`retl`.`xdual`
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`retl`.`xdual`
Caused by: java.net.SocketException: Connection timed out (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:143)
at com.alibaba.otter.canal.parse.driver.mysql.socket.BioSocketChannel.write(BioSocketChannel.java:36)
at com.alibaba.otter.canal.parse.driver.mysql.utils.PacketManager.writeBody0(PacketManager.java:42)
at com.alibaba.otter.canal.parse.driver.mysql.utils.PacketManager.writeBody(PacketManager.java:35)
at com.alibaba.otter.canal.parse.driver.mysql.MysqlQueryExecutor.query(MysqlQueryExecutor.java:55)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.query(MysqlConnection.java:105)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.TableMetaCache.getTableMeta(TableMetaCache.java:170)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.getTableMeta(LogEventConvert.java:915)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEventForTableMeta(LogEventConvert.java:475)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEvent(LogEventConvert.java:496)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEvent(LogEventConvert.java:487)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parse(LogEventConvert.java:125)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parse(LogEventConvert.java:67)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser.parseAndProfilingIfNecessary(AbstractEventParser.java:409)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3$1.sink(AbstractEventParser.java:209)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:168)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:271)
at java.lang.Thread.run(Thread.java:748)



没有权限时,连续的报错:

pid:10 nid:3 exception:canal:Canal名字:com.alibaba.otter.canal.parse.exception.CanalParseException: command : 'show master status' has an error!
Caused by: java.io.IOException: ErrorPacket [errorNumber=1227, fieldCount=-1, message=Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation, sqlState=42000, sqlStateMarker=#]
with command: show master status
at com.alibaba.otter.canal.parse.driver.mysql.MysqlQueryExecutor.query(MysqlQueryExecutor.java:61)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.query(MysqlConnection.java:105)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser.findEndPosition(MysqlEventParser.java:653)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser.findEndPositionWithMasterIdAndTimestamp(MysqlEventParser.java:389)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser.findStartPositionInternal(MysqlEventParser.java:435)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser.findStartPosition(MysqlEventParser.java:366)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:186)
at java.lang.Thread.run(Thread.java:748)

解决办法:
授权 canal 链接 MySQL 账号具有作为 MySQL slave 的权限, 如果已有账户可直接 grant

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;

https://github.com/alibaba/canal/wiki/QuickStart

在Mysql中,保存用户权限有三个地方
(1)已经建立的数据库连接中。grant,revoke或直接改db,不影响已经建立连接的权限情况
(2)Mysql服务器内存中的acl_users 数组
(3)db的mysql.user表中

flush privileges 命令会清空 acl_users 数组,然后从 mysql.user 表中读取数据重新加载,重新构造一个 acl_users 数组。
也就是说,以数据表中的数据为准,会将全局权限内存数组重新加载一遍。

grant 语句会同时修改数据表和内存,判断权限的时候使用的是内存数据。
因此,规范地使用 grant 和 revoke 语句,是不需要随后加上 flush privileges 语句的。
flush privileges 语句本身会用数据表的数据重建一份内存权限数据,所以在权限数据可能存在不一致的情况下再使用。
而这种不一致往往是由于直接用 DML 语句操作系统权限表导致的,所以我们尽量不要使用这类语句。


表权限和列权限除了 db 级别的权限外,MySQL 支持更细粒度的表权限和列权限。
其中,表权限定义存放在表 mysql.tables_priv 中,列权限定义存放在表 mysql.columns_priv 中。
这两类权限,组合起来存放在内存的 hash 结构 column_priv_hash 中。
https://time.geekbang.org/column/article/82231?utm_source=pinpaizhuanqu&utm_medium=geektime&utm_campaign=guanwang&utm_term=guanwang&utm_content=0511


pid:8 nid:4 exception:canal:espro:com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`retl`.`xdual`
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`retl`.`xdual`
Caused by: java.io.IOException: ErrorPacket [errorNumber=1142, fieldCount=-1, message=SHOW command denied to user 'admin'@'10.0.2.3' for table 'xdual', sqlState=42000, sqlStateMarker=#]
with command: show create table `retl`.`xdual`
at com.alibaba.otter.canal.parse.driver.mysql.MysqlQueryExecutor.query(MysqlQueryExecutor.java:61)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.query(MysqlConnection.java:105)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.TableMetaCache.getTableMeta(TableMetaCache.java:170)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.getTableMeta(LogEventConvert.java:915)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEventForTableMeta(LogEventConvert.java:475)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEvent(LogEventConvert.java:496)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parseRowsEvent(LogEventConvert.java:487)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parse(LogEventConvert.java:125)
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.LogEventConvert.parse(LogEventConvert.java:67)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser.parseAndProfilingIfNecessary(AbstractEventParser.java:409)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3$1.sink(AbstractEventParser.java:209)
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:168)
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:271)
at java.lang.Thread.run(Thread.java:748)

 上面报错已解决。解决办法,给用户admin开启 show create table权限,问题解决。


https://blog.csdn.net/u014355034/article/details/87974990


otter的总体架构

node模块内嵌Canal,Canal监听数据库binlog中的变化传送给node的SETL模块 

 

otter强依赖于canal,并对canal的配置有一定的约束。也正是因为强约束,在node中集成了canal,canal作为node的线程运行,使用otter搭建mysql同步环境不需要先手工搭建canal。在开始进入搭建环节之前,建议先看下术语,除非很清楚了,不然相信我,你还是要回过头来看的。

Channel:同步通道,单向同步中一个Pipeline组成,在双向同步中有两个Pipeline组成
Pipeline:从源端到目标端的整个过程描述,主要由一些同步映射过程组成
DataMediaPair:根据业务表定义映射关系,比如源表和目标表,字段映射,字段组等
DataMedia : 抽象的数据介质概念,可以理解为数据表/mq队列定义
DataMediaSource : 抽象的数据介质源信息,补充描述DateMedia
ColumnPair : 定义字段映射关系
ColumnGroup : 定义字段映射组
Node : 处理同步过程的工作节点,对应一个jvm


otter的S/E/T/L stage阶段模型

说明:为了更好的支持系统的扩展性和灵活性,将整个同步流程抽象为Select/Extract/Transform/Load,这么4个阶段.
Select阶段: 为解决数据来源的差异性,比如接入canal获取增量数据,也可以接入其他系统获取其他数据等。

Extract/Transform/Load 阶段:类似于数据仓库的ETL模型,具体可为数据join,数据转化,数据Load的

https://github.com/alibaba/otter/wiki/Introduction

他们之间的关系为:

 

 

otter一共包含两个部分,manager(作为otter的配置中心和管理控制台应用)和node(作为otter的实际同步工作节点)。

网上和otter文档均提及需要先安装manager,我仔细看了下,是因为manager是被动连接的(很多应用的管理控制台是主动去连接服务的,otter则把所有的配置都存储在了manager中),node启动的时候会连接到manager获取同步相关的信息。
生成nid这一步倒没什么关系,事后不一致修改也可以。

manager配置
首先在计划保存otter配置信息的mysql数据库执行otter-manager-schema.sql脚本。

manager的配置文件主要是manager/conf/otter.properties,如下所示(下面列出了建议和需要修改的):

[root@v-03-01-00223 conf]# cat otter.properties 
## otter manager domain name
otter.domainName = 172.28.1.97      ## 建议改成所在服务器的ip,而不是默认的127.0.0.1,否则到时候启动的时候所有的连接指向的目标都是localhost,因为通常otter跑在linux环境,很多linux环境是没有图形化界面的,感觉这是个bug
## otter manager http port
otter.port = 8088     ## 如果非专用或者已经有了一些web应用在同一台服务器,建议改成其他的避免端口冲突,这里的端口号要和jetty.xml中的保持一致,这里也是,直接用个非8080端口就更友好了,比如weblogic 控制台7001,es控制台9200,rabbitmq控制台15672
## jetty web config xml
otter.jetty = jetty.xml

## otter manager database config
otter.database.driver.class.name = com.mysql.jdbc.Driver
otter.database.driver.url = jdbc:mysql://127.0.0.1:3308/otter  ## otter配置信息维护的数据库地址,库名一般为otter/otter_manager/manager
otter.database.driver.username = root
otter.database.driver.password = 123456

## otter communication port
otter.communication.manager.port = 1099   ## node和manager通信的接口,一般不用修改

## otter communication pool size
otter.communication.pool.size = 10

## default zookeeper address
otter.zookeeper.cluster.default = 127.0.0.1:2181    ## zk地址
## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000

## otter arbitrate connect manager config
otter.manager.address = ${otter.domainName}:${otter.communication.manager.port}

## should run in product mode , true/false
otter.manager.productionMode = true

## self-monitor enable or disable
otter.manager.monitor.self.enable = true
## self-montir interval , default 120s
otter.manager.monitor.self.interval = 120
## auto-recovery paused enable or disable
otter.manager.monitor.recovery.paused = true
# manager email user config
otter.manager.monitor.email.host = smtp.gmail.com
otter.manager.monitor.email.username = 
otter.manager.monitor.email.password = 
otter.manager.monitor.email.stmp.port = 465

上述配置修改之后,就可以启动manager了。


[root@v-03-01-00223 bin]# pwd
/usr/local/app/manager/bin

./startup.sh

查看日志

tail -fn 100 ../logs/manager.log

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
2018-07-03 14:59:49.002 [] INFO com.alibaba.otter.manager.deployer.OtterManagerLauncher - ## start the manager server.
2018-07-03 14:59:57.420 [] INFO com.alibaba.otter.manager.deployer.JettyEmbedServer - ##Jetty Embed Server is startup!
2018-07-03 14:59:57.420 [] INFO com.alibaba.otter.manager.deployer.OtterManagerLauncher - ## the manager server is running now ......

接下去就可以验证manager了。

用浏览器打开http://172.18.1.97:8088/

 

 

 

端口建议不要修改。

机器添加完成以后,机器管理的列表中第一列就是nid(这个就是到时候要保存到node/conf/nix文件中的值),如下:

 

https://github.com/alibaba/otter/wiki/Node_Quickstart

 

上述三种类型的节点配置完成后,manager前期的配置就完成了。

manager配置完成之后,需要先启动相应的node节点,node节点启动之后,就可以配置真正的同步任务了。

node配置
首先

cd NODE_HOME/conf
echo 1 > nid    #这个1 是在manager中添加node时,生成的序号

node配置文件otter.properties(除otter.manager.address外,可以使用默认值)
otter arbitrate & node connect manager config , 修改为正确的manager服务地址
otter.manager.address = 127.0.0.1:1099

 

 

启动node

cd NODE_HOME/bin

./startup.sh

[root@v-03-01-00223 node]# more node.log
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
2018-07-03 15:13:09.364 [main] INFO com.alibaba.otter.node.deployer.OtterLauncher - INFO ## the otter server is running now ......

此时再查看manager控制台的机器管理,可以发现机器状态为已启动,如下:

 

 https://github.com/alibaba/otter/wiki/Node_Quickstart

  

 

点位可以通过在主库执行show master status和select unix_timestamp()得到。

 


otter高可用
对外开源部分HA这一块基本上没有比较完善的。对于canal连接到db主从切换,可以参考:https://www.cnblogs.com/f-zhao/p/7681960.html,已经讲到位了。
如果是半同步模式或者基于GTID的话,没有必要回退60s。

在otter中配置canal的主从切换依赖于groupKey,目前还没有测,后面测了再补充。

https://www.cnblogs.com/zhjh256/p/9261725.html

 

node解压修改配置启动

机器名称:可以随意定义,方便自己记忆即可
机器ip:对应node节点将要部署的机器ip,如果有多ip时,可选择其中一个ip进行暴露. (此ip是整个集群通讯的入口,实际情况千万别使用127.0.0.1,否则多个机器的node节点会无法识别)
机器端口:对应node节点将要部署时启动的数据通讯端口,建议值:2088
下载端口:对应node节点将要部署时启动的数据下载端口,建议值:9090
外部ip :对应node节点将要部署的机器ip,存在的一个外部ip,允许通讯的时候走公网处理。
zookeeper集群:为提升通讯效率,不同机房的机器可选择就近的zookeeper集群.

2020-12-08 10:19:59.881 [] WARN  com.alibaba.dubbo.remoting.transport.AbstractClient -  [DUBBO] client reconnect to 172.26.9.117:2088 find error . url: dubbo://172.26.9.117:208
8/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&thread
s=50, dubbo version: 2.5.3, current host: 10.10.30.121
com.alibaba.dubbo.remoting.RemotingException: Failed connect to server /172.26.9.117:2088 from NettyClient 10.10.30.121 using dubbo version 2.5.3, cause: Connect wait timeout:
1000ms.
        at com.alibaba.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:282) ~[dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.AbstractClient$1.run(AbstractClient.java:145) ~[dubbo-2.5.3.jar:2.5.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_181]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
2020-12-08 10:20:03.880 [] WARN  com.alibaba.dubbo.remoting.transport.AbstractClient -  [DUBBO] client reconnect to 172.26.9.117:2088 find error . url: dubbo://172.26.9.117:208
8/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&thread
s=50, dubbo version: 2.5.3, current host: 10.10.30.121

 

 


node这种设计,是为解决单机部署多实例而设计的,允许单机多node指定不同的端口

# nid配置 (将环境准备中添加机器后获取到的序号,保存到conf目录下的nid文件,比如我添加的机器对应序号为1)
echo 1 > conf/nid

manager说明及参数

otter系统自带了manager,所以简化了一些admin管理上的操作成本,比如可以通过manager发布同步任务配置,接收同步任务反馈的状态信息等。

同步配置管理
  1. 添加数据源
  2. canal解析配置
  3. 添加数据表
  4. 同步任务
同步状态查询
  1. 查询延迟
  2. 查询吞吐量
  3. 查询同步进度
  4. 查询报警&异常日志
用户权限:
  1. ADMIN : 超级管理员
  2. OPERATOR : 普通用户,管理某个同步任务下的同步配置,添加数据表,修改canal配置等
  3. ANONYMOUS : 匿名用户,只能进行同步状态查询的操作.
https://github.com/alibaba/otter/wiki/Adminguide

具体配置参数

 

 

 


channel参数
  1. 同步一致性. ==>

    基于数据库反查 (简单点说,就是强制反查数据库,从binlog中拿到pk,直接反查对应数据库记录进行同步,回退到几天前binlog进行消费时避免同步老版本的数据时可采用)
    基于当前日志变更 (基于binlog/redolog解析出来的字段变更值进行同步,不做数据库反查,推荐使用)

    小结:
    基于数据库反查(根据binlog反查数据库),基于当前日志变更(binlog数据)。针对数据库反查,在延迟比较大时比较有效,可将最新的版本快速同步到目标,但会对源库有压力.

  2. 同步模式. ==> 行记录模式,列记录模式。

    行模式 (兼容otter3的处理方案,改变记录中的任何一个字段,触发整行记录的数据同步,在目标库执行merge sql)
    列模式 (基于log中的具体变更字段,按需同步)

    小结:
    行记录模式特点:如果目标库不存在记录时,执行插入。
    列记录模式主要是变更哪个字段,只会单独修改该字段。在双A同步时,为减少数据冲突,建议选择列记录模式
    双A:双主 且 会同时更改 一条记录。

  3. 是否开启数据一致性. ==> 请查看数据一致性文档:Otter数据一致性
    选择 是,则会有下面两个选项:

             a.  一致性算法:单向回环补救【当前的选项】 , 时间交集补救【目前开源版本暂未实现】

             b.  一致性反查数据库延迟阀值(s) 
https://github.com/alibaba/otter/wiki/Manager%E9%85%8D%E7%BD%AE%E4%BB%8B%E7%BB%8D

补充:

需要处理一致性的业务场景:

(1)多地修改 (双A机房)
(2)同一记录,同时变更

同一记录定义:具体到某一张表,某一条pk,某一字段
同时变更定义:A地写入的数据在B地还未可见的一段时间范围

使用 单向回环补救 策略 来解决 数据一致性,设置  一致性反查数据库延迟阀值(s)  的原因:
解决方案:
反查数据库同步 (以数据库最新版本同步,解决交替性,
比如设置一致性反查数据库延迟阀值为60秒,即当同步过程中发现数据延迟超过了60秒,就会基于PK反查一次数据库,拿到当前最新值进行同步,减少交替性的问题)

https://github.com/alibaba/otter/wiki/Otter%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7


同步简要信息 (点击同步列表上的通道Channel名字的链接)

 

说明:
延迟时间 = 数据库同步到目标库成功时间 - 数据库源库产生变更时间, 单位秒. (由对应node节点定时推送配置)
最后同步时间 = 数据库同步到目标库最近一次的成功时间 (当前同步关注的相关表,同步到目标库的最后一次成功时间)
最后位点时间 = 数据binlog消费最后一次更新位点的时间 (和同步时间区别:一个数据库可能存在别的表的变更,不会触发同步时间变更,但会触发位点时间变更)

正常情况下,最后位点时间>=最后同步时间

pipeline参数
  1. 并行度. ==> 查看文档:Otter调度模型,主要是并行化调度参数.(滑动窗口大小)

  2. 数据反查线程数. ==> 如果选择了同步一致性反查数据库,在反查数据库时的并发线程数大小

  3. 数据载入线程数. ==> 在目标库执行并行载入算法时并发线程数大小

  4. 文件载入线程数. ==> 数据带文件同步时处理的并发线程数大小

  5. 主站点. ==> 双A同步中的主站点设置

  6. 消费批次大小. ==> 获取canal数据的batchSize参数

  7. 获取批次超时时间. ==> 获取canal数据的timeout参数

pipeline 高级设置
  1. 【开启中】使用batch. ==> 是否使用jdbc batch提升效率,部分分布式数据库系统不一定支持batch协议

  2. 【未开启】跳过Select异常
  3. 【未开启】跳过load异常. ==> 比如同步时出现目标库主键冲突,开启该参数后,可跳过数据库执行异常

  4. 【ZOOKEEPER】仲裁器调度模式. ==> 查看文档:Otter调度模型

  5. 【Stick】负载均衡算法. ==> 查看文档:Otter调度模型

  6. 【自动选择】传输模式. ==> 多个node节点之间的传输方式,RPC或HTTP. HTTP主要就是使用aria2c,如果测试环境不装aria2c,可强制选择为RPC

  7. 【开启中】记录selector日志. ==> 是否记录简单的canal抓取binlog的情况

  8. 【开启中】记录selector详细日志. ==> 是否记录canal抓取binlog的数据详细内容

  9. 【开启中】记录load日志. ==> 是否记录otter同步数据详细内容

  10. 【未开启】dryRun模式. ==> 只记录load日志,不执行真实同步到数据库的操作

  11. 【未开启】支持ddl同步. ==> 是否同步ddl语句

  12. 【开启中】是否跳过ddl异常. ==> 同步ddl出错时,是否自动跳过

  13. 【未开启】文件重复同步对比 ==> 数据带文件同步时,是否需要对比源和目标库的文件信息,如果文件无变化,则不同步,减少网络传输量.

  14. 【未开启】文件传输加密 ==> 基于HTTP协议传输时,对应文件数据是否需要做加密处理

  15. 【未开启】启用公网同步 ==> 每个node节点都会定义一个外部ip信息,如果启用公网同步,同步时数据传递会依赖外部ip.

  16. 【开启中】==> 自定义数据同步的内容

  17. 【未开启】跳过反查无记录数据 ==> 反查记录不存在时,是否需要进行忽略处理,不建议开启.

  18. 【开启中】启用数据表类型转化 ==> 源库和目标库的字段类型不匹配时,开启此功能,可自动进行字段类型转化

  19. 【开启中】兼容字段新增同步 ==> 同步过程中,源库新增了一个字段(必须无默认值),而目标库还未增加,是否需要兼容处理

  20. 自定义同步标记 ==> 级联同步中屏蔽同步的功能.

Canal参数【说明:主要是管理canal链接到mysql/oracle获取日志的相关参数等,业务方可不重点关注】
  1. 数据源信息
    单库配置: 10.20.144.34:3306;
    多库合并配置: 10.20.144.34:3306,10.20.144.35:3306; (逗号分隔)
    主备库配置:10.20.144.34:3306;10.20.144.34:3307; (分号分隔)

  2. connectionCharset ==> 获取binlog时指定的编码【与数据库的字符集不同,譬如数据库的字符集是UTF8MB4,但此处仍为UTF-8】

  3. 位点自定义设置 ==> 格式:{“journalName”:"",“position”:0,“timestamp”:0};
    指定位置:{“journalName”:"",“position”:0};
    指定时间:{“timestamp”:0};

  4. 内存存储batch获取模式 ==> MEMSIZE/ITEMSIZE,前者为内存控制,后者为数量控制.  针对MEMSIZE模式的内存大小计算 = 记录数 * 记录单元大小
    内存存储buffer记录数
    内存存储buffer记录单元大小

  5. 心跳SQL配置 ==> 可配置对应心跳SQL,如果配置 是否启用心跳HA,当心跳SQL检测失败后,canal就会自动进行主备切换.

Node参数
  1. 机器名称 ==> 自定义名称,方便记忆

  2. 机器ip ==> 机器外部可访问的ip,不能选择127.0.0.1

  3. 机器端口 ==> 和manager/node之间RPC通讯的端口

  4. 下载端口 ==> 和node之间HTTP通讯的端口

  5. 外部Ip ==> node机器可以指定多IP,通过pipeline配置决定是否启用

  6. zookeeper集群 ==> 就近选择zookeeper集群

Zookeeper集群参数
  1. 集群名字 ==> 自定义名称,方便记忆

  2. zookeeper集群 ==> zookeeper集群机器列表,逗号分隔,最后以分号结束

https://github.com/alibaba/otter/wiki/Manager%E9%85%8D%E7%BD%AE%E4%BB%8B%E7%BB%8D



映射关系列表【权重低的表先执行同步】

 

点击查看打开映射关系信息页面:

说明:
定义同步的源和目标的表信息 (注意:表名可以不同,可以定义数据库分库/分表)
Push权重 (对应的数字越大,同步会越后面得到同步,优先同步权重的数据)
FileResolver (数据关联文件的解析类,目前支持动态源码推送,在目标jvm里动态编译生效,不再需要起停同步任务)
EventProcessor (业务自定义的数据处理类,比如可以定义不需要同步status='ENABLE'的记录或者根据业务改变同步的字段信息 简单业务扩展,otter4新特性)
字段同步 (定义源和目标的字段映射,字段名和字段类型均可进行映射定义,类似于数据库视图定义功能 otter4新特性)
组合同步 (字段组的概念,字段组中的一个字段发生变更,会确保字段组中的3个字段一起同步到目标库 otter4新特性)
多个字段决定一个图片地址,变更文件字段中的任何一个字段,就会触发FileResolver类解析,从而可以确保基于字段同步模式,也可以保证FileResolver能够正常解析出文件 otter4重要的优化)

 

视图映射
如何进入视图编辑:

点击下一步后,进入视图编辑页面:

说明:
映射规则配置页面,可以选择视图模式为:include或exclude,代表正向匹配或者逆向排除.
视图配置页面,只支持存在的数据表(因为要获取数据表结构,所以.*等正则的数据表不支持配置该功能)
视图配置列表,左右选中列表会按照顺序进行对应,做映射时需按照顺序进行选择.
举个例子:
如果要排除表字段A的同步,则只需要选择为exclude模式,然后视图编辑页面选择左右皆选择A字段即可,点击保存.


字段组映射

首先解释一下,需要字段组同步的需求.

(1)文件同步. 一条记录对应的图片,可能会有一个或者多个字段,比如会有image_path,image_version来决定图片,所以我们可以定义这两个字段为一组,只要满足组内任意一个字段的变更,就会认为需要文件同步.
(2)数据上的组同步,比如国家,省份,城市,可能在数据库为三个字段. 如果是双A同步,两地同时修改这些字段,但业务上可能在A地修改了国家为美国,在B地修改为省份为浙江,然后一同步,最终就会变成美国,浙江这样的情况. 这种情况可以通过group来解决,将国家,省份,城市做一个group,组内任何一个字段发生了变更,其余字段会做为整体一起变更.
再来看一下配置:(点击视图编辑页面的下一步,即可进入)

说明:
也可不配置视图,单独配置字段组,此时可选择的字段即为当前所有字段(映射规则按照同名映射).


https://github.com/alibaba/otter/wiki/%E6%98%A0%E5%B0%84%E8%A7%84%E5%88%99%E9%85%8D%E7%BD%AE





吞吐量【1个点代表1分钟的数据】

 

 

 

说明:
数据记录统计 (insert/update/delete的变更总和,不区分具体的表,按表纬度的数据统计,可查看映射关系列表->每个映射关系右边的行为曲线链接)
文件记录统计

延迟时间

 

说明:
延迟时间的统计 = 数据库同步到目标库成功时间 - 数据库源库产生变更时间, 单位秒. (由对应node节点定时推送配置)


同步进度

说明:
mainstem状态: 代表canal模块当前的运行节点(也即是binlog解析的运行节点,解析会相对耗jvm内存)
position状态: 当前同步成功的最后binlog位点信息 (包含链接的是数据库ip/port,对应binlog的位置,对应binlog的变更时间此时间即是计算延迟时间的源库变更时间)
同步进度: 每个同步批次会有一个唯一标识,可根据该唯一标示进行数据定位,可以查看每个批次的运行时间,找出性能瓶颈点


监控管理

说明:
监控项目
同步延迟,position超时(位点超过多少时间没有更新) , 一般业务方关心这些即可
异常 (同步运行过程中出现的异常,比如oracle DBA关心oracle系统ORA-的异常信息,mysql DBA关心mysql数据库相关异常)
process超时(一个批次数据执行超过多少时间),同步时间超时(数据超过多少时间没有同步成功过)

阀值设置
1800@09:00-18:00 , 这例子是指定了早上9点到下午6点,报警阀值为1800.
发送对象
otterteam为otter团队的标识,阿里内部使用了dragoon系统监控报警通知,如果外部系统可实现自己的报警通知机制

 

日志记录

说明:
日志标题即为对应的监控规则定义的名字,可根据监控规则检索对应的日志记录
日志内容即为发送报警的信息
注意: otter4采用主动推送报警的模式,可以保证报警的及时性以及日志完整性(相比于日志文件扫描机制来说)

 https://github.com/alibaba/otter/wiki/Manager%E4%BD%BF%E7%94%A8%E4%BB%8B%E7%BB%8D

 
数据合并
  1. insert + insert -> insert (数据迁移+数据增量场景)
  2. insert + update -> insert (update字段合并到insert)
  3. insert + delete -> delete
  4. update + insert -> insert (数据迁移+数据增量场景)
  5. update + update -> update
  6. update + delete -> delete
  7. delete + insert -> insert
  8. delete + update -> update (数据迁移+数据增量场景)
  9. delete + delete -> delete
数据入库算法

入库算法采取了按pk hash并行载入+batch合并的优化

    1. 打散原始数据库事务,预处理数据,合并insert/update/delete数据(参见合并算法),然后按照table + pk进行并行(相同table的数据,先执行delete,后执行insert/update,串行保证,解决唯一性约束数据变更问题),相同table的sql会进行batch合并处理

    2. 提供table权重定义,根据权重定义不同支持"业务上类事务功能",并行中同时有串行权重控制.
      业务类事务描述:比如用户的一次交易付款的流程,先产生一笔交易记录,然后修改订单状态为已付款. 用户对这事件的感知,是通过订单状态的已付款,然后进行查询交易记录。
      所以,可以对同步进行一次编排: 先同步完交易记录,再同步订单状态。 (给同步表定义权重,权重越高的表相对重要,放在后面同步,最后达到的效果可以保证业务事务可见性的功能,快的等慢的. )

https://github.com/alibaba/otter/wiki/Otter%E6%95%B0%E6%8D%AE%E5%85%A5%E5%BA%93%E7%AE%97%E6%B3%95

 

数据on-Fly,尽可能不落地,更快的进行数据同步. (开启node loadBalancer算法,如果Node节点S+ETL落在不同的Node上,数据会有个网络传输过程);

node节点可以有failover / loadBalancer。
https://github.com/alibaba/otter/wiki/Introduction

 

otter学习(九)——常见报错处理

一、binlog文件被清理:Could not find first log file name in binary log index file
1.报错日志

2020-11-30 15:23:07.663 [destination = pathfinderpro , address = rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.R
dsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2020-11-30 15:23:07.665 [destination = pathfinderpro , address = rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.R
dsBinlogEventParserProxy - prepare to find start position just last position
 {"identity":{"slaveId":-1,"sourceAddress":{"address":"rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com","port":3306}},"postion":{"gtid":"","included":false,"journalName":"mysql-bin
.009472","position":95437745,"serverId":2339357115,"timestamp":1606133836000}}
2020-11-30 15:23:07.712 [destination = pathfinderpro , address = rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.R
dsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.009472,position=95437745,serverId=2339357115,gtid=,timestam
p=1606133836000] cost : 49ms , the next step is binlog dump
2020-11-30 15:23:07.736 [destination = pathfinderpro , address = rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] ERROR c.a.o.canal.parse.inbound.mys
ql.dbsync.DirectLogFetcher - I/O error while reading from client socket
java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = Could not find first log file name in binary log index file
        at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:102) ~[canal.parse-1.1.3.jar:na]
        at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:235) [canal.parse-1.1.3.jar:na]
        at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:257) [canal.parse-1.1.3.jar:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
2020-11-30 15:23:07.736 [destination = pathfinderpro , address = rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 , EventParser] ERROR c.a.o.c.p.inbound.mysql.rds.R
dsBinlogEventParserProxy - dump address rm-bp1lc206u47icpvi5.mysql.rds.aliyuncs.com/10.10.0.238:3306 has an error, retrying. caused by
java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = Could not find first log file name in binary log index file

2.现象
查看 同步进度Tab页,可以看到 pipeline的mainstem状态 一直处于定位中状态

3.问题排查
一般出现这个报错,都是由于运维同学误清空数据库binlog文件导致,我们按下述步骤确定是否由于binlog文件被清理:
首先,查看当前同步的binlog位点:
其次:登录数据库查看binlog文件信息(查询sql为:show master logs;):
通过上述比对,发现数据库的binlog被清理掉了,被清理前,otter标记的位点为mysql-bin.000245这个文件,被清理后,mysql重新从1开始生成binlog文件导致otter同步失败

4.处理办法:
1.清空掉otter的同步信息
2.检查canal的同步位点配置
3.重新启动otter同步


二、mysql大事务造成otter假死
1.报错日志
无报错日志

2.现象
channel状态正常,mainstem状态也是工作中,但是position信息里,position的信息一直不更新(超过半小时以上)

3.确认是否为大事务的方法
首先,登录对应仓库的数据库,先查询当前数据库的binlog文件跑到哪里了
查询sql为 show master logs;
然后查询position信息里卡住的binlog文件信息,判断是否产生了大事务
查询sql为show binlog events in '你要查询的binlog名称';(例如:show binlog events in 'mysql-bin.000235';)
(下图为查询mysql-bin.000235这个binlog文件,发现由于源库做切表处理,产生了大事务)

4.解决方法
从我们卡住的位点,依次往后面查询binlog,找到这个大事务终结的位点。然后手动更新canal的位点信息,删除同步记录,然后重启channel
a.清空掉otter的同步信息
b.检查canal的同步位点配置
c.重新启动otter同步

https://blog.csdn.net/u014355034/article/details/87974990

otter异常——zookeeper重新初始化

#跟运维的同事确认了下 测试环境的zookeeper前一天的确是出问题了(异常重启过),而且进入zookeeper查找异常的路径,的确不存在
至此,想到的是能否重新配置otter复制的mysqlbinlog文件和位置,让其接着同步呢?
但是web管理页面无法打开,无法配置。最后在github开源项目 otter的提问里面找到类似的问题:
https://github.com/alibaba/otter/issues/88

otter开发者回答到
otter会在zookeeper存储一些节点信息,更换zookeeper后,需要复制节点数据,或者删除数据库中的channel、pipeline等表的数据内容
或者访问 http://域名/system_reduction.htm,点击一键修复

尝试使用下面的连接 修复
访问 http://otter_manager_ip:port/system_reduction.htm,在页面上 会出现一个“ 一键补全 ”按钮,点击此按钮即可

再将所有canal的journalName和position 换成出问题之前的相对较近的一个位置即可
至此 实现了 更换zookeeper或者重新初始化zookeeper的目的

http://blog.itpub.net/27000195/viewspace-2099256/

webx问题集
https://www.iteye.com/blog/welision-998812

我把zookeeper换成集群,就启动不了了,请帮看下 #88
https://github.com/alibaba/otter/issues/88



【todo】manager日志报错:

2020-05-18 21:47:13.098 [] WARN  org.eclipse.jetty.servlet.ServletHandler - /URL=channelList.htm
com.alibaba.citrus.service.pipeline.PipelineException: Failed to invoke Valve[#3/3, level 3]: com.alibaba.citrus.turbine.pipeline.valve.RenderTemplateValve#747f6c5a:RenderTemplateValve
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:161) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PerformScreenValve.invoke(PerformScreenValve.java:80) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PerformActionValve.invoke(PerformActionValve.java:73) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.ChooseValve.invoke(ChooseValve.java:98) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.LoopValve.invokeBody(LoopValve.java:105) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.LoopValve.invoke(LoopValve.java:83) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.CheckCsrfTokenValve.invoke(CheckCsrfTokenValve.java:123) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.otter.manager.web.webx.valve.AuthContextValve.invoke(AuthContextValve.java:117) ~[manager.web-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.AnalyzeURLValve.invoke(AnalyzeURLValve.java:126) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.SetLoggingContextValve.invoke(SetLoggingContextValve.java:66) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PrepareForTurbineValve.invoke(PrepareForTurbineValve.java:52) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.impl.WebxControllerImpl.service(WebxControllerImpl.java:43) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.impl.WebxRootControllerImpl.handleRequest(WebxRootControllerImpl.java:53) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.support.AbstractWebxRootController.service(AbstractWebxRootController.java:165) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.WebxFrameworkFilter.doFilter(WebxFrameworkFilter.java:152) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.FilterBean.doFilter(FilterBean.java:147) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) ~[jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at com.alibaba.citrus.webx.servlet.SetLoggingContextFilter.doFilter(SetLoggingContextFilter.java:61) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.FilterBean.doFilter(FilterBean.java:147) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) ~[jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) [jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:559) [jetty-security-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) [jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.Server.handle(Server.java:365) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) [jetty-http-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) [jetty-http-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627) [jetty-io-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51) [jetty-io-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [jetty-util-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [jetty-util-8.1.7.v20120910.jar:8.1.7.v20120910]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: com.alibaba.citrus.service.template.TemplateNotFoundException: Could not find template "/screen/url=channelList"
        at com.alibaba.citrus.service.template.impl.TemplateServiceImpl.findTemplate(TemplateServiceImpl.java:279) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.template.impl.TemplateServiceImpl.writeTo(TemplateServiceImpl.java:224) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.RenderTemplateValve.renderTemplate(RenderTemplateValve.java:104) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.RenderTemplateValve.invoke(RenderTemplateValve.java:67) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        ... 53 common frames omitted

 

 

奇怪的报警:pid:321 delay_time:56792 seconds, but delay 1815 seconds no update #816
对于这种更新不频繁的库,开启了canal的心跳之后就没再出现过这个问题了。那么 开启canal的心跳,需要什么条件呢?
https://github.com/alibaba/otter/issues/816



[todo] 

2020-05-21 13:28:07.501 [] ERROR c.a.o.m.biz.config.pipeline.impl.PipelineServiceImpl - ERROR ## query pipelines has an exception!
2020-05-21 14:14:01.734 [] WARN  c.a.o.s.a.i.setl.zookeeper.termin.WarningTerminProcess - nid:null[1:channel:java.util.concurrent.ExecutionException: java.lang.StackOverflowError
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at com.alibaba.otter.shared.arbitrate.impl.manage.ChannelArbitrateEvent.termin(ChannelArbitrateEvent.java:273)
        at com.alibaba.otter.shared.arbitrate.impl.manage.ChannelArbitrateEvent.stop(ChannelArbitrateEvent.java:147)
        at com.alibaba.otter.shared.arbitrate.impl.manage.ChannelArbitrateEvent.stop(ChannelArbitrateEvent.java:133)
        at com.alibaba.otter.manager.biz.config.channel.impl.ChannelServiceImpl$3.doInTransactionWithoutResult(ChannelServiceImpl.java:433)
        at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
        at com.alibaba.otter.manager.biz.config.channel.impl.ChannelServiceImpl.switchChannelStatus(ChannelServiceImpl.java:378)
        at com.alibaba.otter.manager.biz.config.channel.impl.ChannelServiceImpl.stopChannel(ChannelServiceImpl.java:458)
        at com.alibaba.otter.manager.biz.monitor.impl.RestartAlarmRecovery.processRecovery(RestartAlarmRecovery.java:102)
        at com.alibaba.otter.manager.biz.monitor.impl.RestartAlarmRecovery.access$100(RestartAlarmRecovery.java:44)
        at com.alibaba.otter.manager.biz.monitor.impl.RestartAlarmRecovery$1.run(RestartAlarmRecovery.java:137)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.StackOverflowError
        at com.alibaba.fastjson.JSONException.<init>(JSONException.java:33)
        at com.alibaba.fastjson.parser.DefaultJSONParser.parseObject(DefaultJSONParser.java:646)
        at com.alibaba.fastjson.JSON.parseObject(JSON.java:350)
        at com.alibaba.fastjson.JSON.parseObject(JSON.java:318)
        at com.alibaba.fastjson.JSON.parseObject(JSON.java:281)
        at com.alibaba.fastjson.JSON.parseObject(JSON.java:381)
        at com.alibaba.fastjson.JSON.parseObject(JSON.java:361)
        at com.alibaba.otter.shared.common.utils.JsonUtils.unmarshalFromByte(JsonUtils.java:55)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.NormalTerminProcess.processDelete(NormalTerminProcess.java:153)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.NormalTerminProcess.doProcess(NormalTerminProcess.java:135)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.NormalTerminProcess.process(NormalTerminProcess.java:63)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:86)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)
        at com.alibaba.otter.shared.arbitrate.impl.setl.zookeeper.termin.ErrorTerminProcess.processChain(ErrorTerminProcess.java:96)

 

【todo】

2020-05-22 09:07:54.358 [DubboClientReconnectTimer-thread-1] WARN  com.alibaba.dubbo.remoting.transport.AbstractClient -  [DUBBO] client reconnect to 192.168.105.4:1099 find error . url: dubbo://192.168.105.4:1099/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&threads=50, dubbo version: 2.5.3, current host: 172.16.2.103
com.alibaba.dubbo.remoting.RemotingException: client(url: dubbo://192.168.105.4:1099/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&threads=50) failed to connect to server /192.168.105.4:1099, error message is:Connection refused
        at com.alibaba.dubbo.remoting.transport.netty.NettyClient.doConnect(NettyClient.java:124) ~[dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:280) ~[dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.AbstractClient$1.run(AbstractClient.java:145) ~[dubbo-2.5.3.jar:2.5.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_181]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_181]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[na:1.8.0_181]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) ~[netty-3.2.2.Final.jar:na]
        ... 3 common frames omitted
2020-05-22 09:07:54.358 [DubboClientReconnectTimer-thread-2] WARN  com.alibaba.dubbo.remoting.transport.AbstractClient -  [DUBBO] client reconnect to 192.168.105.4:1099 find error . url: dubbo://192.168.105.4:1099/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&threads=50, dubbo version: 2.5.3, current host: 172.16.2.103
com.alibaba.dubbo.remoting.RemotingException: client(url: dubbo://192.168.105.4:1099/endpoint?acceptEvent.timeout=50000&client=netty&codec=dubbo&connections=30&heartbeat=60000&iothreads=4&lazy=true&payload=8388608&send.reconnect=true&serialization=java&threads=50) failed to connect to server /192.168.105.4:1099, error message is:Connection refused
        at com.alibaba.dubbo.remoting.transport.netty.NettyClient.doConnect(NettyClient.java:124) ~[dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:280) ~[dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.AbstractClient$1.run(AbstractClient.java:145) ~[dubbo-2.5.3.jar:2.5.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_181]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_181]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_181]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[na:1.8.0_181]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276) ~[netty-3.2.2.Final.jar:na]
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) ~[netty-3.2.2.Final.jar:na]
        ... 3 common frames omitted
2020-05-22 09:30:28.339 [DubboServerHandler-127.0.0.1:2088-thread-48] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem is not run any in node
2020-05-22 10:26:55.574 [DubboServerHandler-127.0.0.1:2088-thread-48] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem is not run any in node
2020-05-22 10:55:00.020 [DubboServerHandler-127.0.0.1:2088-thread-48] WARN  c.a.o.shared.arbitrate.impl.setl.monitor.MainstemMonitor - mainstem is not run any in node


【todo】

2020-05-22 08:47:39.147 [] WARN  org.eclipse.jetty.servlet.ServletHandler - /log_record_list.htm
com.alibaba.citrus.service.pipeline.PipelineException: Failed to invoke Valve[#2/3, level 3]: com.alibaba.citrus.turbine.pipeline.valve.PerformTemplateScreenValve#686cf8ad:PerformTemplateScreenValve
       at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:161) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PerformActionValve.invoke(PerformActionValve.java:73) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.ChooseValve.invoke(ChooseValve.java:98) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.LoopValve.invokeBody(LoopValve.java:105) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.valve.LoopValve.invoke(LoopValve.java:83) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.CheckCsrfTokenValve.invoke(CheckCsrfTokenValve.java:123) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.otter.manager.web.webx.valve.AuthContextValve.invoke(AuthContextValve.java:117) ~[manager.web-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.AnalyzeURLValve.invoke(AnalyzeURLValve.java:126) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.SetLoggingContextValve.invoke(SetLoggingContextValve.java:66) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PrepareForTurbineValve.invoke(PrepareForTurbineValve.java:52) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invoke(PipelineImpl.java:210) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.impl.WebxControllerImpl.service(WebxControllerImpl.java:43) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.impl.WebxRootControllerImpl.handleRequest(WebxRootControllerImpl.java:53) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.support.AbstractWebxRootController.service(AbstractWebxRootController.java:165) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.WebxFrameworkFilter.doFilter(WebxFrameworkFilter.java:152) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.FilterBean.doFilter(FilterBean.java:147) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) ~[jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at com.alibaba.citrus.webx.servlet.SetLoggingContextFilter.doFilter(SetLoggingContextFilter.java:61) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.webx.servlet.FilterBean.doFilter(FilterBean.java:147) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) ~[jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) [jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:559) [jetty-security-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) [jetty-servlet-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.Server.handle(Server.java:365) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) [jetty-http-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) [jetty-http-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) [jetty-server-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627) [jetty-io-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51) [jetty-io-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [jetty-util-8.1.7.v20120910.jar:8.1.7.v20120910]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [jetty-util-8.1.7.v20120910.jar:8.1.7.v20120910]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: com.alibaba.citrus.webx.WebxException: Failed to execute screen: LogRecordList
        at com.alibaba.citrus.turbine.pipeline.valve.PerformScreenValve.performScreenModule(PerformScreenValve.java:126) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PerformScreenValve.invoke(PerformScreenValve.java:74) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.pipeline.impl.PipelineImpl$PipelineContextImpl.invokeNext(PipelineImpl.java:157) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        ... 51 common frames omitted
Caused by: org.springframework.dao.DataAccessResourceFailureException: SqlMapClient operation; SQL [];
--- The error occurred while applying a parameter map.
--- Check the getLogRecordCountWithPIdAndSearchKey-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
    

The last packet successfully received from the server was 32,759 milliseconds ago.  The last packet sent successfully to the server was 4,486 milliseconds ago.; nested exception is com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the getLogRecordCountWithPIdAndSearchKey-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 32,759 milliseconds ago.  The last packet sent successfully to the server was 4,486 milliseconds ago.
        at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:104) ~[spring-jdbc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) ~[spring-jdbc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) ~[spring-jdbc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:203) ~[spring-orm-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForObject(SqlMapClientTemplate.java:268) ~[spring-orm-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at com.alibaba.otter.manager.biz.config.record.dal.ibatis.IbatisLogRecordDAO.getCount(IbatisLogRecordDAO.java:81) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.biz.config.record.impl.LogRecordServiceImpl.getCount(LogRecordServiceImpl.java:125) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.web.home.module.screen.LogRecordList.execute(LogRecordList.java:53) ~[manager.web-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.web.home.module.screen.LogRecordList$$FastClassByCGLIB$$288abc70.invoke(<generated>) ~[cglib-nodep-2.2.jar:na]
        at net.sf.cglib.reflect.FastMethod.invoke(FastMethod.java:53) ~[cglib-nodep-2.2.jar:na]
        at com.alibaba.citrus.service.moduleloader.impl.adapter.MethodInvoker.invoke(MethodInvoker.java:70) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.service.moduleloader.impl.adapter.DataBindingAdapter.executeAndReturn(DataBindingAdapter.java:41) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        at com.alibaba.citrus.turbine.pipeline.valve.PerformScreenValve.performScreenModule(PerformScreenValve.java:111) ~[citrus-webx-all-3.2.0.jar:3.2.0]
        ... 53 common frames omitted
Caused by: com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the getLogRecordCountWithPIdAndSearchKey-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 32,759 milliseconds ago.  The last packet sent successfully to the server was 4,486 milliseconds ago.    
       at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryWithCallback(MappedStatement.java:201) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryForObject(MappedStatement.java:120) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:518) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:493) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:106) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at org.springframework.orm.ibatis.SqlMapClientTemplate$1.doInSqlMapClient(SqlMapClientTemplate.java:270) ~[spring-orm-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:200) ~[spring-orm-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        ... 62 common frames omitted
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 32,759 milliseconds ago.  The last packet sent successfully to the server was 4,486 milliseconds ago.
        at sun.reflect.GeneratedConstructorAccessor54.newInstance(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_181]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_181]
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3556) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3897) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1192) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4]
        at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4]
        at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:185) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.sqlExecuteQuery(MappedStatement.java:221) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryWithCallback(MappedStatement.java:189) ~[ibatis-sqlmap-2.3.4.726.jar:na]
        ... 68 common frames omitted
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3008) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3466) ~[mysql-connector-java-5.1.40.jar:5.1.40]
        ... 80 common frames omitted

 

 

异常及解决办法:
背景:启动一个nid=5的node节点,但没有在manger上配置这个node接口
报了下面的错:

2020-05-28 15:54:24.851 [] ERROR c.a.otter.manager.biz.config.node.impl.NodeServiceImpl - ERROR ## couldn't query any node by nodeIds:[5]
2020-05-28 15:54:24.852 [] ERROR c.a.otter.manager.biz.config.node.impl.NodeServiceImpl - ERROR ## query nodes has an exception!
2020-05-28 15:54:24.853 [] ERROR c.a.o.m.b.r.interceptor.RemoteExceptionLoggerInterceptor - log exception message:
com.alibaba.otter.manager.biz.common.exceptions.ManagerException: com.alibaba.otter.manager.biz.common.exceptions.ManagerException: couldn't query any node by nodeIds:[5]
        at com.alibaba.otter.manager.biz.config.node.impl.NodeServiceImpl.listByIds(NodeServiceImpl.java:189) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.biz.config.node.impl.NodeServiceImpl.findById(NodeServiceImpl.java:147) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.biz.config.node.impl.NodeServiceImpl.findById(NodeServiceImpl.java:47) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.biz.remote.impl.ConfigRemoteServiceImpl.onFindNode(ConfigRemoteServiceImpl.java:157) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.otter.manager.biz.remote.impl.ConfigRemoteServiceImpl$$FastClassByCGLIB$$3f77feba.invoke(<generated>) ~[cglib-nodep-2.2.jar:na]
        at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191) ~[cglib-nodep-2.2.jar:na]
        at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.aop.framework.adapter.ThrowsAdviceInterceptor.invoke(ThrowsAdviceInterceptor.java:124) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
        at com.alibaba.otter.manager.biz.remote.impl.ConfigRemoteServiceImpl$$EnhancerByCGLIB$$88ecacc9.onFindNode(<generated>) ~[cglib-nodep-2.2.jar:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_181]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_181]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_181]
        at com.alibaba.otter.shared.communication.core.impl.AbstractCommunicationEndpoint.acceptEvent(AbstractCommunicationEndpoint.java:72) ~[shared.communication-4.2.18-SNAPSHOT.jar:na]
        at com.alibaba.dubbo.common.bytecode.Wrapper0.invokeMethod(Wrapper0.java) [na:2.5.3]
        at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:108) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:84) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:170) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:52) [dubbo-2.5.3.jar:2.5.3]
        at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:82) [dubbo-2.5.3.jar:2.5.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: com.alibaba.otter.manager.biz.common.exceptions.ManagerException: couldn't query any node by nodeIds:[5]
        at com.alibaba.otter.manager.biz.config.node.impl.NodeServiceImpl.listByIds(NodeServiceImpl.java:173) ~[manager.biz-4.2.18-SNAPSHOT.jar:na]
        ... 27 common frames omitted

解决办法:
在manger配置页中配置一个序号为5的node 。

代码:

   public List<Node> listByIds(Long... identities) {
        List<Node> nodes = new ArrayList<Node>();
        try {
            List<NodeDO> nodeDos = null;
            if (identities.length < 1) {
                nodeDos = nodeDao.listAll();
                if (nodeDos.isEmpty()) {
                    logger.debug("DEBUG ## couldn't query any node, maybe hasn't create any channel.");
                    return nodes;
                }
            } else {
                nodeDos = nodeDao.listByMultiId(identities);
                if (nodeDos.isEmpty()) {
                    String exceptionCause = "couldn't query any node by nodeIds:" + Arrays.toString(identities);
                    logger.error("ERROR ## " + exceptionCause);
                    throw new ManagerException(exceptionCause);
                }
            }
            // 验证zk的node信息
            List<Long> nodeIds = arbitrateManageService.nodeEvent().liveNodes();
            for (NodeDO nodeDo : nodeDos) {
                if (nodeIds.contains(nodeDo.getId())) {
                    nodeDo.setStatus(NodeStatus.START);
                } else {
                    nodeDo.setStatus(NodeStatus.STOP);
                }
            }

            nodes = doToModel(nodeDos);
        } catch (Exception e) {
            logger.error("ERROR ## query nodes has an exception!");
            throw new ManagerException(e);
        }

        return nodes;
    }

com.alibaba.otter.manager.biz.config.node.impl.NodeServiceImpl#listByIds
同时node在启动时,也会报错:

Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.alibaba.otter.node.deployer.OtterLauncher.main(OtterLauncher.java:39)
Caused by: com.alibaba.otter.shared.common.model.config.ConfigException: ERROR ##
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'otterController' defined in URL [jar:file:/home/admin/node/lib/node.etl-4.2.17.jar!/spring/otter-node-common.xml]: Initialization of bean failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'arbitrateEventService' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Initialization of bean failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Cannotresolve reference to bean 'extractRpcEvent' while setting bean property 'delegate' with key [TypedStringValue: value [RPC], target type [null]]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractRpcEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent]: Constructor threw exception; nested exception is com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:527)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:609)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:469)
        at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
        at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
        at com.alibaba.otter.node.etl.OtterContextLocator$1.<init>(OtterContextLocator.java:39)
        at com.alibaba.otter.node.etl.OtterContextLocator.<clinit>(OtterContextLocator.java:39)
        at com.alibaba.otter.node.deployer.OtterLauncher.main(OtterLauncher.java:39)
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'arbitrateEventService' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Initialization of bean failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Cannot resolve reference to bean 'extractRpcEvent' while setting bean property 'delegate' with key [TypedStringValue: value [RPC], target type [null]]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractRpcEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent]: Constructor threw exception; nested exception is com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:527)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireByName(AbstractAutowireCapableBeanFactory.java:1136)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1086)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:517)
        ... 13 more
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Cannot resolve reference to bean 'extractRpcEvent' while setting bean property 'delegate' with key [TypedStringValue: value [RPC], target type [null]]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractRpcEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent]: Constructor threw exception; nested exception is com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveManagedMap(BeanDefinitionValueResolver.java:378)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:161)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1360)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1118)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:517)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireByName(AbstractAutowireCapableBeanFactory.java:1136)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1086)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:517)
        ... 21 more
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'extractRpcEvent' defined in URL [jar:file:/home/admin/node/lib/shared.arbitrate-4.2.17.jar!/spring/otter-arbitrate-event.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent]: Constructor threw exception; nested exception is com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:997)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:943)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322)
        ... 35 more
Caused by: org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent]: Constructor threw exception; nested exception is com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:162)
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:76)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:990)
        ... 43 more
Caused by: com.google.common.collect.ComputationException: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!
        at com.google.common.collect.MapMaker$ComputingMapAdapter.get(MapMaker.java:889)
        at com.alibaba.otter.shared.arbitrate.impl.zookeeper.ZooKeeperClient.getInstance(ZooKeeperClient.java:61)
        at com.alibaba.otter.shared.arbitrate.impl.zookeeper.ZooKeeperClient.getInstance(ZooKeeperClient.java:54)
        at com.alibaba.otter.shared.arbitrate.impl.setl.rpc.ExtractRpcArbitrateEvent.<init>(ExtractRpcArbitrateEvent.java:45)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:147)
        ... 45 more
Caused by: com.alibaba.otter.shared.common.model.config.ConfigException: nid:5 in manager[otter-manager.zkh-uat.svc.cluster.local:1099]is not found!

 

 

2020-05-29 09:05:07.002 [DbLoadAction] ERROR com.alibaba.otter.node.etl.load.loader.db.DbLoadAction - ##load phase two failed!
com.alibaba.otter.node.etl.load.exception.LoadException: org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException forSQL [insert into `zaf-security-admin2`.`t_customer_sales_force`(`f_partner` , `f_customer_id` , `f_vkorg` , `f_vtweg` , `f_spart` , `f_customer_service_staff_id` , `f_saler_id` , `f_state_code` , `f_csf_id`) values (? , ? , ? , ? , ? , ? , ? , ? , ?) on duplicate key update `f_partner`=values(`f_partner`) , `f_customer_id`=values(`f_customer_id`) , `f_vkorg`=values(`f_vkorg`) , `f_vtweg`=values(`f_vtweg`) , `f_spart`=values(`f_spart`) , `f_customer_service_staff_id`=values(`f_customer_service_staff_id`) , `f_saler_id`=values(`f_saler_id`) , `f_state_code`=values(`f_state_code`) , `f_csf_id`=values(`f_csf_id`)]; SQL state [HY000]; error code [1364]; Field 'f_cso_id' doesn't have a default value; nested exception is java.sql.SQLException: Field 'f_cso_id' doesn't have a default value
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:83)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:603)
        at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:812)
        at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:868)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$DbLoadWorker$2.doInTransaction(DbLoadAction.java:625)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$DbLoadWorker.doCall(DbLoadAction.java:617)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$DbLoadWorker.call(DbLoadAction.java:545)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.doTwoPhase(DbLoadAction.java:462)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.doLoad(DbLoadAction.java:275)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.load(DbLoadAction.java:161)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$$FastClassByCGLIB$$d932a4cb.invoke(<generated>)
        at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191)
        at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618)
        at com.alibaba.otter.node.etl.load.loader.db.DbLoadAction$$EnhancerByCGLIB$$80fd23c2.load(<generated>)
        at com.alibaba.otter.node.etl.load.loader.db.DataBatchLoader$2.call(DataBatchLoader.java:198)
        at com.alibaba.otter.node.etl.load.loader.db.DataBatchLoader$2.call(DataBatchLoader.java:189)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Field 'f_cso_id' doesn't have a default value
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3970)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3906)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
        at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
        at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009)
        at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5098)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1994)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:818)
        at org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:1)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:587)
        ... 21 more
:-----------------
- PairId: 104 , TableId: 125 , EventType : I , Time : 1590667984000
- Consistency :  , Mode :
-----------------
---Pks
        EventColumn[index=5,columnType=-5,columnName=f_csf_id,columnValue=244012,isNull=false,isKey=true,isUpdate=true]
---oldPks

---Columns
        EventColumn[index=7,columnType=12,columnName=f_partner,columnValue=A79347,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=8,columnType=-5,columnName=f_customer_id,columnValue=2138596,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=9,columnType=12,columnName=f_vkorg,columnValue=1000,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=10,columnType=12,columnName=f_vtweg,columnValue=01,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=11,columnType=12,columnName=f_spart,columnValue=02,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=12,columnType=12,columnName=f_customer_service_staff_id,columnValue=8168,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=13,columnType=12,columnName=f_saler_id,columnValue=2533,isNull=false,isKey=false,isUpdate=true]
        EventColumn[index=17,columnType=4,columnName=f_state_code,columnValue=1,isNull=false,isKey=false,isUpdate=true]
---Sql
        insert into `zaf-security-admin2`.`t_customer_sales_force`(`f_partner` , `f_customer_id` , `f_vkorg` , `f_vtweg` , `f_spart` , `f_customer_service_staff_id` ,`f_saler_id` , `f_state_code` , `f_csf_id`) values (? , ? , ? , ? , ? , ? , ? , ? , ?) on duplicate key update `f_partner`=values(`f_partner`) , `f_customer_id`=values(`f_customer_id`) , `f_vkorg`=values(`f_vkorg`) , `f_vtweg`=values(`f_vtweg`) , `f_spart`=values(`f_spart`) , `f_customer_service_staff_id`=values(`f_customer_service_staff_id`) , `f_saler_id`=values(`f_saler_id`) , `f_state_code`=values(`f_state_code`) , `f_csf_id`=values(`f_csf_id`)

 

Channel管理  >  Pipeline管理  >  映射关系列表
一个映射关系中,不能出现出现将相同数据源的一张表,分别映射到大于1个数据源的的表中。否则会出现数据异常:数据不同正常同步,但manager和node节点 都没有报错信息

2020-07-31 15:59:51.251 [pipelineId = 4,taskName = ExtractWorker] ERROR com.alibaba.otter.node.etl.extract.ExtractTask - [4] extractwork executor is error! data:EtlEventData[currNid=1,nextNid=1,desc=[MemoryPipeKey[identity=Identity[channelId=4,pipelineId=4,processId=1],time=1596182391249,dataType=DB_BATCH]],processId=1,startTime=1596182391249,endTime=<null>,firstTime=1596182391000,batchId=56,number=1,size=<null>,exts=<null>,pipelineId=4]
com.alibaba.otter.node.etl.extract.exceptions.ExtractException: eventData after viewExtractor has no pks , pls check! identity:Identity[channelId=4,pipelineId=4,processId=1], new eventData:EventData[tableId=29,tableName=product,schemaName=pathfinderdb,eventType=UPDATE,executeTime=1596182391000,oldKeys=[],keys=[],columns=[EventColumn[index=118,columnType=4,columnName=isNationalSales,columnValue=1,isNull=false,isKey=false,isUpdate=true]],size=1636,pairId=-1,sql=<null>,ddlSchemaName=<null>,syncMode=<null>,syncConsistency=<null>,remedy=false,hint=<null>,withoutSchema=false]
2020-07-31 15:59:51.255 [pipelineId = 4,taskName = ExtractWorker] WARN  c.a.o.s.a.i.setl.zookeeper.termin.WarningTerminProcess - nid:1[4:setl:com.alibaba.otter.node.etl.extract.exceptions.ExtractException: eventDataafter viewExtractor has no pks , pls check! identity:Identity[channelId=4,pipelineId=4,processId=1], new eventData:EventData[tableId=29,tableName=product,schemaName=pathfinderdb,eventType=UPDATE,executeTime=1596182391000,oldKeys=[],keys=[],columns=[EventColumn[index=118,columnType=4,columnName=isNationalSales,columnValue=1,isNull=false,isKey=false,isUpdate=true]],size=1636,pairId=-1,sql=<null>,ddlSchemaName=<null>,syncMode=<null>,syncConsistency=<null>,remedy=false,hint=<null>,withoutSchema=false]
]

 


Otter表同步时,无主键时会报错

                if (CollectionUtils.isEmpty(eventData.getKeys())) { // 无主键,报错
                    throw new ExtractException(
                                               String.format("eventData after viewExtractor has no pks , pls check! identity:%s, new eventData:%s",
                                                             dbBatch.getRowBatch().getIdentity().toString(),
                                                             eventData.toString()));
                }
com.alibaba.otter.node.etl.extract.extractor.ViewExtractor#extract


otter在zk上的存储路径

com.alibaba.otter.shared.arbitrate.impl.ArbitrateConstants

{

    /**
     * otter的根节点
     */
    public String NODE_OTTER_ROOT         = "/otter";

    /**
     * otter的node机器的根节点
     */
    public String NODE_NID_ROOT           = NODE_OTTER_ROOT + "/node";

    /**
     * otter中node节点的format格式,接受nid做为参数
     */
    public String NODE_NID_FORMAT         = NODE_NID_ROOT + "/{0}";

    /**
     * otter的channel的根节点
     */
    public String NODE_CHANNEL_ROOT       = NODE_OTTER_ROOT + "/channel";

    /**
     * otter中channel节点的format格式,接受channelId做为参数
     */
    public String NODE_CHANNEL_FORMAT     = NODE_CHANNEL_ROOT + "/{0}";

    /**
     * otter中pipeline节点的format格式,接受channelId,pipelineId做为参数
     */
    public String NODE_PIPELINE_FORMAT    = NODE_CHANNEL_FORMAT + "/{1}";

    /**
     * otter的remedy的根节点
     */
    public String NODE_REMEDY_ROOT        = NODE_PIPELINE_FORMAT + "/remedy";

    /**
     * otter的process的根节点
     */
    public String NODE_PROCESS_ROOT       = NODE_PIPELINE_FORMAT + "/process";

    /**
     * otter中process节点的format格式,接受channelId,pipelineId,processId做为参数
     */
    public String NODE_PROCESS_FORMAT     = NODE_PROCESS_ROOT + "/{2}";

    /**
     * otter的termin信号的根节点
     */
    public String NODE_TERMIN_ROOT        = NODE_PIPELINE_FORMAT + "/termin";

    /**
     * otter中termin节点的format格式,接受channelId,pipelineId,processId做为参数
     */
    public String NODE_TERMIN_FORMAT      = NODE_TERMIN_ROOT + "/{2}";

    /**
     * otter的lock根节点
     */
    public String NODE_LOCK_ROOT          = NODE_PIPELINE_FORMAT + "/lock";

    /**
     * otter的load的lock节点
     */
    public String NODE_LOCK_LOAD          = "load";

    /**
     * 主导线程的状态节点,为pipeline的子节点
     */
    public String NODE_MAINSTEM           = "mainstem";

    /**
     * select完成状态的节点,为process的子节点
     */
    public String NODE_SELECTED           = "selected";

    /**
     * extract完成状态的节点,为process的子节点
     */
    public String NODE_EXTRACTED          = "extracted";

    /**
     * transform完成状态的节点,为process的子节点
     */
    public String NODE_TRANSFORMED        = "transformed";

    /**
     * load完成状态的节点,为process的子节点
     */
    public String NODE_LOADED             = "loaded";

    /**
     * 在logback的配置文件中定义好的按照各个pipeline进行日志文件输出的键值.
     */
    public String splitPipelineLogFileKey = "otter";
}

 

 

Otter远程调试
https://www.cnblogs.com/yanshaoshuai/p/12060938.html

克隆并编译otter
https://www.cnblogs.com/yanshaoshuai/p/12060468.html

Canal和Otter介绍和使用
https://www.cnblogs.com/yanshaoshuai/p/11987253.html

Canal和Otter讨论二(原理与实践)
https://www.cnblogs.com/yanshaoshuai/p/11987281.html




 

posted @ 2020-05-15 22:10  沧海一滴  阅读(4064)  评论(1编辑  收藏  举报