[Erlang29]进程收到不是期望的消息时怎么办?
最近在项目中升级了第三方库,导致本应用gen_server中A进程中:
handle_info({add,X},Sum) ->
{noreply,Sum+X};
结果这么简单的一个工作居然不工作了,一查,发现是第三方库进程B发来的消息由
{add,X}------->{plus,X}
在A进程中存着大量的{plus,X}信息,使得这个进程的内存占用也变得异常的大。
简单修复后,不经意在想:
到底怎么处理那些我们期望不到的消息呢?(这种错误根本就是对方不应该发这种消息给我,根本就不是我自己的错呀)
1.你可以记录这些消息:
1a.) as info 1b.) as warning 1c.) as error
2.对消息进行记数,然后对记数做怎样的处理? 3.直接忽略它们,不做trace?
4.让server直接crash掉?
GOOGLE得到MailList里面非常有意思的投票:你是怎么处理与自己进程无关的消息的?
里面的观点大部分都集中于要看该进程的性质:
1.根据进程的生命周期: 如果是短生命周期(生产用完马上销毁)的进程(short-lived),你可以使用不同日志等级来记录这些消息,这将有利于发现真正的问题,不要把问题定位到接收者上,要多关注是谁发给他的,为什么会发到一个错的进程中,这样会造成什么结果?
2. 根据进程的数量: 如果这类进程数量庞大,就直接忽略了,不做trace,避免对日志IO造成压力。
3.根据进程的重要性: 如果是可以处理calls/casts/monitors/exits 消息的进程,我们应该记录下什么消息让他们异常?而其它的大多数进程,我们应该直接忽略它,不做trace。
4. "Let it crash" 哲学: 无论怎样,就是要让其Crash。
以上的”Let it crash"难免有点太牵强,这个错又不是来源于自己,为什么要让别人的错影响到自己,其它的观点很有道理,但Fred(Learn some Erlang的作者)总结非常精彩,分享如下:
I tend to go the log route. There isn't a super good reason, but the way I think about it is a bit of probability. When do I send messages to the wrong process? A few ideas are: - Manual debugging - Typoes - A Refactoring gone bad - Initial design got messed up - Erroneous third-party code that doesn't come from my precise development right away. Then the question is what are the consequences I want. - Manual debugging: do nothing, I'm poking around - Typoes: I have to know about these ASAP - Refactoring gone bad: I have to know about these ASAP - Initial design got messed up: Something has to be loud and bad - Third party code: I want the third party to suffer. For these reasons, I tend to take the following approach: - In handle_call/3, I log the event with a string a bit like "mod=MYMOD at=handle_call warning=unexpected_call msg=~p" and then return `{noreply, State}` to force the caller to crash after a timeout. It's their fault, not mine. - In other callbacks, just log similar messages, replacing at=handle_call with at=handle_cast|info and warning=unexpected_cast|info. I can, from time to time, look at logs for 'warning=unexpected_*' in logs and see if something is going weird. If it's something happening rarely, I'm gonna have traces, but without the weird failures (unless it's a call). If it's something frequent, bugs will either show themselves differently, the log volume will be very high, and so on. It tends to give me what I need given the circumstances. It's not always as loud as I'd expect (except for calls, which is ideal), but it tends to give me enough visibility for the occasional stray message, without compromising service levels.
具体的不明消息可分为:
--手动Debugging
--手写拼错误
-- 重构代码时出错
-- 原本的设计混乱
-- 第三方代码库引出来错
具体的一种方法可以:
在handle_call/3里面会记录下然后返回:
handlle_call(Msg,_From,State) -> log:("mod=~p at=handle_call warning=unexpected_call msg=~p",[?MOdule,Msg]), {noreply, State};
强制调用者Crash,这是他们的错,不能怪我.然后反复查看Log中是不是存在'warning=unexpected_*'信息。
Running Dialyzer on my codebase:
写下来是好习惯: Notes