[Storm] java.io.FileNotFoundException: File '../stormconf.ser' does not exist

阅读目录

问题背景
问题原因
可能解决办法

This bug will kill supervisors

Affects Version/s: 0.9.2-incubating, 0.9.3, 0.9.4

Fix Version/s: 0.10.0, 0.9.5

问题背景

最近发现刚搭起的Storm集群，没过多久，Supervisor 便悄然死去了一大半。查看死去Supervisor的log，发现java.io.FileNotFoundException: File '../stormconf.ser' does not exist异常。网上给出的答案大多是

将 { storm.local.dir } 目录下的文件清空，重启就好了。

但这是指标不治本，即时重启可以跑起来，可是为什么会出现这个问题，依然不知道。

然后才发现线STORM-130解决了这个问题。该问题的重现场景：

1) Run a storm cluster with atleast 2 supervisors with 4 slots each
2) Deploy a topology that uses 4 workers, topology will be distributed with each supervisor having two workers each
3) kill one of the supervisor lets say supervisor1
4) wait till topology re-balances to occupy 4 workers on supervisor2
5) now bring up supervisor1, It goes through the cycle of cleaning up old topology code
6) nimbus re-balances topology which triggers supervisor.sync-process method
7) sync-process tries to launch a worker for the topology whose code data is delete when the supervisor started causing it throw up following exception

回到顶部

问题原因

上面场景分析提到的 sync-process是supervisor运行的一个函数。Supervisor会在后台运行这两个函数：

synchronize-supervisor: This is called whenever assignments in Zookeeper change and also every 10 seconds.
- Downloads code from Nimbus for topologies assigned to this machine for which it doesn't have the code yet.
- Writes into local filesystem what this node is supposed to be running. It writes a map from port -> LocalAssignment. LocalAssignment contains a topology id as well as the list of task ids for that worker.
sync-processes: Reads from the LFS what synchronize-supervisor wrote and compares that to what's actually running on the machine. It then starts/stops worker processes as necessary to synchronize.

从描述中可以看出，synchronized-supervisor 和 sync-process 两个函数是通过 LFS 进行同步。The key reason is "synchronize-supervisor" which responsible for download file and remove file thread and "sync-processes" which responsible for start worker process thread is Asynchronous.

in synchronize-supervisor read assigment information from zk, supervisor download necessary file from nimbus and write local state. In aother thread sync-processes funciton read local state to launch workor process, when the worker process has not start ,synchronize-supervisor function is called again topology's assignment information has changed (cased by rebalance,or worker time out etc) worker assignment to this supervisor has move to another supervisor, synchronize-supervisor remove the unnecessary file (jar file and ser file etc.) , after this, worker launched by " sync-processes" ,ser file was not exsit , this issue occur.

回到顶部

可能解决办法

换一个storm
调整参数
- Change "synchronize-supervisor" thread loop time to a longger than 10(default time) sec, such as 30 sec。
- supervisor.worker.timeout.secs: 30 -> 5

References:

https://issues.apache.org/jira/browse/STORM-130
http://storm.apache.org/documentation/Lifecycle-of-a-topology.html

posted @ 2015-11-26 12:05 看起来很好吃阅读(1409) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 互联网不景气了那就玩玩嵌入式吧，用纯.NET开发并制作一个智能桌面机器人（四）：结合BotSharp
· 一个基于 .NET 开源免费的异地组网和内网穿透工具
· 《HelloGitHub》第 108 期
· Windows桌面应用自动更新解决方案SharpUpdater5发布
· 我的家庭实验室服务器集群硬件清单

2025年3月

日

一

二

三

四

五

六

Emma

[Storm] java.io.FileNotFoundException: File '../stormconf.ser' does not exist

问题背景

问题原因

可能解决办法

最新随笔

随笔分类 (47)

随笔档案 (48)

最新评论