在梳理Erlang/OTP相关的内容时,我发现无论如何都无法避开proc_lib模块,说它是OTP的基础模块一点不为过.
proc_lib模块的功能:This module is used to start processes adhering to the OTP Design Principles.即proc_lib用来启动符合OTP设计原则的进程.OTP设计原则是什么?请移步这里:http://www.cnblogs.com/me-sa/archive/2011/11/20/erlang0015.html OTP的behavior都是使用proc_lib实现创建新进程,所以我们说这个模块是OTP的基石一点都不为过.上文已经提到过,我们也可以直接使用这个模块来创建符合OTP原则的进程.
proc_lib暴露出来的方法较少,我们先看一下有一个整体印象:
用proc_lib创建的进程有什么与众不同
1. 会多一些信息
与直接使用spawn创建进程(后面我们称之为"普通erlang进程")相比,使用proc_lib初始化进程会多一些信息,比如注册名,进程的父进程信息,初始化调用的函数等等.下面是两种方式创建进程后查看到的进程运行时元数据:
=PROGRESS REPORT==== 21-Nov-2011::21:18:49 ===
application: sasl
started_at: 'demo@192.168.1.123'
Eshell V5.8.4 (abort with ^G)
(demo@192.168.1.123)1> Fun = fun() -> receive a-> 1/0 after infinity -> ok end end .
#Fun<erl_eval.20.21881191>
(demo@192.168.1.123)2> P = spawn(Fun).
<0.48.0>
(demo@192.168.1.123)3> erlang:process_info(P).
[{current_function,{erl_eval,receive_clauses,8}},
{initial_call,{erlang,apply,2}},
{status,waiting}, {message_queue_len,0},
{messages,[]}, {links,[]},
{dictionary,[]}, {trap_exit,false},
{error_handler,error_handler}, {priority,normal},
{group_leader,<0.29.0>}, {total_heap_size,233},
{heap_size,233}, {stack_size,10}, {reductions,18},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,0}]},
{suspending,[]}]
(demo@192.168.1.123)4> P2 = proc_lib:spawn(Fun).
<0.51.0>
(demo@192.168.1.123)5> erlang:process_info(P2).
[{current_function,{erl_eval,receive_clauses,8}},
{initial_call,{proc_lib,init_p,3}},
{status,waiting},
{message_queue_len,0}, {messages,[]},
{links,[]}, {dictionary,[{'$ancestors',[<0.45.0>]},
{'$initial_call',{erl_eval,'-expr/5-fun-1-',0}}]},
{trap_exit,false}, {error_handler,error_handler},
{priority,normal}, {group_leader,<0.29.0>},
{total_heap_size,233}, {heap_size,233},
{stack_size,14}, {reductions,25},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,0}]},
{suspending,[]}]
(demo@192.168.1.123)6>
我们可以挑一个监控树中的进程看一下它的元数据,使用这个就可以erlang:process_info(whereis(rex)). 对比一下增加了哪些信息?紧接着的问题就是,这些信息是在什么时候怎样加入到进程的?我们挑选一段proc_lib的典型代码看:
spawn(M,F,A) when is_atom(M), is_atom(F), is_list(A) ->
Parent = get_my_name(),
Ancestors = get_ancestors(),
erlang:spawn(?MODULE, init_p, [Parent,Ancestors,M,F,A]).
%下面是相关的方法的实现
get_my_name() ->
case proc_info(self(),registered_name) of
{registered_name,Name} -> Name;
_ -> self()
end.
get_ancestors() ->
case get('$ancestors') of
A when is_list(A) -> A;
_ -> []
end.
proc_info(Pid,Item) when node(Pid) =:= node() ->
process_info(Pid,Item);
proc_info(Pid,Item) ->
case lists:member(node(Pid),nodes()) of
true ->
check(rpc:call(node(Pid), erlang, process_info, [Pid, Item]));
_ ->
hidden
end.
check({badrpc,nodedown}) -> undefined;
check({badrpc,Error}) -> Error;
check(Res) -> Res.
2.进程退出时的不同处理
普通Erlang进程只有退出原因是normal的时候才会被认为是正常退出,使用proc_lib启动的进程退出原因是shutdown或者{shutdown,Term}的时候也被认为是正常退出.因为应用程序(监控树)停止而导致的进程终止,进程退出的原因会标记为shutdown.使用proc_lib创建的进程退出的原因不是normal也不是shutdown的时候,就会创建一个进程崩溃报告,这个会写入默认的SASL的事件handler,错误报告会在只有在启动了SASL的时候才能看到.如何启动SASL?请移步这里查看:http://www.cnblogs.com/me-sa/archive/2011/11/20/erlang0016.html 崩溃报告包含了进程初始化写入的信息.
来吧,咱们现在就动手搞崩一个进程看看:
=PROGRESS REPORT==== 21-Nov-2011::20:47:56 === application: sasl %方便查看我们这里启动SASL并直接把结果输出在Shell中 started_at: 'demo@192.168.1.123' Eshell V5.8.4 (abort with ^G) (demo@192.168.1.123)1> Fun = fun() -> receive a-> 1/0 after infinity -> ok end end . %接收到消息a之后会执行1/0,进程就会崩溃报错 #Fun<erl_eval.20.21881191> (demo@192.168.1.123)2> P= spawn(Fun). %先创建一个普通的Erlang进程 <0.48.0> (demo@192.168.1.123)3> P!a. %发消息搞崩它 a (demo@192.168.1.123)4> %shell输出下面的错误信息 =ERROR REPORT==== 21-Nov-2011::20:48:50 === Error in process <0.48.0> on node 'demo@192.168.1.123' with exit value: {badarith,[{erlang,'/',[1,0]}]} (demo@192.168.1.123)4> P2= proc_lib:spawn(Fun). %使用proc_lib创建一个进程 <0.51.0> (demo@192.168.1.123)5> P2!a. %发消息搞崩它 a (demo@192.168.1.123)6> =CRASH REPORT==== 21-Nov-2011::20:49:09 === %这里就是包含更多信息的CRASH REPORT crasher: initial call: erl_eval:-expr/5-fun-1-/0 pid: <0.51.0> registered_name: [] exception error: bad argument in an arithmetic expression in operator '/'/2 called as 1 / 0 ancestors: [<0.45.0>] messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 233 stack_size: 24 reductions: 114 neighbours: (demo@192.168.1.123)6>
http://learnyousomeerlang.com/errors-and-processes 上关于进程退出的例子:
- Exception source:
spawn_link(fun() -> ok end)
- Untrapped Result: - nothing -
- Trapped Result: {'EXIT', <0.61.0>, normal}
- The process exited normally, without a problem. Note that this looks a bit like the result of
catch exit(normal)
, except a PID is added to the tuple to know what processed failed.- 创建进程,进程创建之后马上就退出了,如果没有trap什么消息都不会出现,如果trap能够接收到一条进程正常退出的消息.
- Exception source:
spawn_link(fun() -> exit(reason) end)
- Untrapped Result: ** exception exit: reason
- Trapped Result: {'EXIT', <0.55.0>, reason}
- The process has terminated for a custom reason. In this case, if there is no trapped exit, the process crashes. Otherwise, you get the above message.
- 进程因为特定的原因退出,如果trap能够得到退出进程的PID信息.
- Exception source:
spawn_link(fun() -> exit(normal) end)
- Untrapped Result: - nothing -
- Trapped Result: {'EXIT', <0.58.0>, normal}
- This successfully emulates a process terminating normally. In some cases, you might want to kill a process as part of the normal flow of a program, without anything exceptional going on. This is the way to do it.
- 不会调用exit就是异常退出,exit可以是正常退出,这里就演示了这个情况
- Exception source:
spawn_link(fun() -> 1/0 end)
- Untrapped Result: Error in process <0.44.0> with exit value: {badarith, [{erlang, '/', [1,0]}]}
- Trapped Result: {'EXIT', <0.52.0>, {badarith, [{erlang, '/', [1,0]}]}}
- The error (
{badarith, Reason}
) is never caught by atry ... catch
block and bubbles up into an'EXIT'. At this point, it behaves exactly the same asexit(reason)
did, but with a stack trace giving more details about what happened.- 进程出现异常 Trap前后没有太大区别只是格式化了
- Exception source:
spawn_link(fun() -> erlang:error(reason) end)
- Untrapped Result: Error in process <0.47.0> with exit value: {reason, [{erlang, apply, 2}]}
- Trapped Result: {'EXIT', <0.74.0>, {reason, [{erlang, apply, 2}]}}
- Pretty much the same as with
1/0
. That's normal,erlang:error/1
is meant to allow you to do just that.- 还记得erlang:error exit 和throw的区别吗?
- Exception source:
spawn_link(fun() -> throw(rocks) end)
- Untrapped Result: Error in process <0.51.0> with exit value: {{nocatch, rocks}, [{erlang, apply, 2}]}
- Trapped Result: {'EXIT', <0.79.0>, {{nocatch, rocks}, [{erlang, apply, 2}]}}
- Because the
throw
is never caught by atry ... catch
, it bubbles up into an error, which in turn bubbles up into an EXIT. Without trapping exit, the process fails. Otherwise it deals with it fine.- 抛出去了但是没有catch的逻辑
And that's about it for usual exceptions. Things are normal: everything goes fine. Exceptional stuff happens: processes die, different signals are sent around.
Then there's
exit/2
. This one is the Erlang process equivalent of a gun. It allows a process to kill another one from a distance, safely. Here are some of the possible calls:
- Exception source:
exit(self(), normal)
- Untrapped Result: ** exception exit: normal
- Trapped Result: {'EXIT', <0.31.0>, normal}
- When not trapping exits,
exit(self(), normal)
acts the same asexit(normal)
. Otherwise, you receive a message with the same format you would have had by listening to links from foreign processes dying.- Exception source:
exit(spawn_link(fun() -> timer:sleep(50000) end), normal)
- Untrapped Result: - nothing -
- Trapped Result: - nothing -
- This basically is a call to
exit(Pid, normal)
. This command doesn't do anything useful, because a process can not be remotely killed with the reasonnormal
as an argument.- Exception source:
exit(spawn_link(fun() -> timer:sleep(50000) end), reason)
- Untrapped Result: ** exception exit: reason
- Trapped Result: {'EXIT', <0.52.0>, reason}
- This is the foreign process terminating for reason itself. Looks the same as if the foreign process called
exit(reason)
on itself.- Exception source:
exit(spawn_link(fun() -> timer:sleep(50000) end), kill)
- Untrapped Result: ** exception exit: killed
- Trapped Result: {'EXIT', <0.58.0>, killed}
- Surprisingly, the message gets changed from the dying process to the spawner. The spawner now receives
killed
instead ofkill
. That's becausekill
is a special exit signal. More details on this later.- Exception source:
exit(self(), kill)
- Untrapped Result: ** exception exit: killed
- Trapped Result: ** exception exit: killed
- Oops, look at that. It seems like this one is actually impossible to trap. Let's check something.
- Exception source:
spawn_link(fun() -> exit(kill) end)
- Untrapped Result: ** exception exit: killed
- Trapped Result: {'EXIT', <0.67.0>, kill}
- Now that's getting confusing. When another process kills itself with
exit(kill)
and we don't trap exits, our own process dies with the reasonkilled
. However, when we trap exits, things don't happen that way.如果想干掉的进程自己处于一个死循环中,没有机会接受消息,那该如何处理呢?kill就是为这种场景设计的,kill会被设计为一种特殊的信号,不能被trap, 这样来保证想干掉的进程真的能被干掉.kill是干掉进程的杀手锏,万不得已还有最后一招.
由于设计上kill不能被trap,所以其他进程接收到kill的reason时会被转换成killed.
2014-8-26 8:49:22 这里补充一个测试代码:
19> self(). <0.81.0> 20> [ spawn_link(fun()-> receive die->exit(order_to_die) end end) || P<-lists:seq(1,10)]. [<0.84.0>,<0.85.0>,<0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>, <0.90.0>,<0.91.0>,<0.92.0>,<0.93.0>] 21> process_info(self()). [{current_function,{erl_eval,do_apply,6}}, {initial_call,{erlang,apply,2}}, {status,running}, {message_queue_len,0}, {messages,[]}, {links,[<0.86.0>,<0.90.0>,<0.92.0>,<0.93.0>,<0.91.0>, <0.88.0>,<0.89.0>,<0.87.0>,<0.84.0>,<0.85.0>,<0.30.0>]}, {dictionary,[]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.26.0>}, {total_heap_size,3573}, {heap_size,2586}, {stack_size,24}, {reductions,9040}, {garbage_collection,[{min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,65535}, {minor_gcs,23}]}, {suspending,[]}] 22> exit(pid(0,92,0),normal). true 23> process_info(pid(0,92,0)). [{current_function,{prim_eval,'receive',2}}, {initial_call,{erlang,apply,2}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, {links,[<0.81.0>]}, {dictionary,[]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.26.0>}, {total_heap_size,233}, {heap_size,233}, {stack_size,9}, {reductions,17}, {garbage_collection,[{min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,65535}, {minor_gcs,0}]}, {suspending,[]}] 24> pid(0,92,0)!die. ** exception exit: order_to_die 25> self(). <0.98.0>
Eshell V6.0 (abort with ^G) 1> self(). <0.33.0> 2> process_flag(trap_exit,true). false 3> process_info(self()). [{current_function,{erl_eval,do_apply,6}}, {initial_call,{erlang,apply,2}}, {status,running}, {message_queue_len,0}, {messages,[]}, {links,[<0.27.0>]}, {dictionary,[]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.26.0>}, {total_heap_size,987}, {heap_size,987}, {stack_size,24}, {reductions,1557}, {garbage_collection,[{min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,65535}, {minor_gcs,0}]}, {suspending,[]}] 4> [ spawn_link(fun()-> receive die->exit(order_to_die) end end) || P<-lists:seq(1,10)]. [<0.38.0>,<0.39.0>,<0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>, <0.44.0>,<0.45.0>,<0.46.0>,<0.47.0>] 5> exit(pid(0,41,0),over). true 6> self(). <0.33.0> 7> flush(). Shell got {'EXIT',<0.41.0>,over} ok 8> is_process_alive(pid(0,44,0)). true 9> process_info(pid(0,44,0)). [{current_function,{prim_eval,'receive',2}}, {initial_call,{erlang,apply,2}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, {links,[<0.33.0>]}, {dictionary,[]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.26.0>}, {total_heap_size,233}, {heap_size,233}, {stack_size,9}, {reductions,17}, {garbage_collection,[{min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,65535}, {minor_gcs,0}]}, {suspending,[]}] 10>
使用proc_lib启动进程 start/start_link
gen_server的start方法文档是这样描述的:
The gen_server process calls Module:init/1 to initialize. To ensure a synchronized start-up procedure,start_link/3,4 does not return until Module:init/1 has returned.
gen_server执行start/start_link的时候是一个同步的方式,其底层实现就是使用的proc_lib创建一个进程并等待其启动完成.我们先看一段proc_lib的典型代码:
start(M, F, A, Timeout) when is_atom(M), is_atom(F), is_list(A) ->
Pid = ?MODULE:spawn(M, F, A),
sync_wait(Pid, Timeout).
可以看到在创建了进程之后,执行了一个sync_wait的方法实现同步等待,很容易猜到这个方法的实现:
sync_wait(Pid, Timeout) ->
receive
{ack, Pid, Return} ->
Return;
{'EXIT', Pid, Reason} -> %如果调用start_link方式创建进程而且创建的进程在调用init_ack之前就死掉了,如果调用进程做了退出捕获(trap_exit)
{error, Reason} %就会返回{error,Reason}
after Timeout ->
unlink(Pid),
exit(Pid, kill),
flush(Pid),
{error, timeout} %如果指定了Time参数,这个方法就会等待Time毫秒等待新进程调用init_ack,超时了还没有调用就会返回{error,timeout}并将新进程干掉.
end.
可以想到,进程启动完成后肯定会有一个发送响应消息动作结束当前等待,这里也有现成的方法可以用: init_ack
init_ack(Parent, Return) ->
Parent ! {ack, self(), Return},
ok.
-spec init_ack(term()) -> 'ok'.
init_ack(Return) ->
[Parent|_] = get('$ancestors'),
init_ack(Parent, Return).
2012-3-31 12:26:35 更新
看一个hotwheel的例子tolbrino-hotwheels-8dca95a\src\janus_acceptor.erl:
acceptor_init(Parent, Port, Module) ->
State = #state{
parent = Parent,
port = Port,
module = Module
},
error_logger:info_msg("Listening on port ~p~n", [Port]),
case (catch do_init(State)) of
{ok, ListenSocket} ->
proc_lib:init_ack(State#state.parent, {ok, self()}),
acceptor_loop(State#state{listener = ListenSocket});
Error ->
proc_lib:init_ack(Parent, Error),
error
end.
查看进程init_call与进程崩溃报告格式化
proc_lib提供了两个方法来查看进程的init函数
initial_call(Process) -> {Module,Function,Args} | false
translate_initial_call(Process) -> {Module, Function, Arity}
我们执行proc_lib:initial_call(whereis(rex)).查看一下rpc模块的初始化方法,结果是:{rpc,init,['Argument__1']}
这里出于节省内存的考虑并没有保存实际的参数值而是使用原子'Argument__1'代替.如果初始化参数中包含fun,查看一下获得的结果仅仅是告诉你这是一个几个参数的fun并没有保存fun,之所以没有保存是因为一方面影响升级另一方面浪费内存.看下面的代码:
(demo@192.168.1.123)57> Fun =fun() -> receive X -> X after infinity -> ok end end.
#Fun<erl_eval.20.67289768>
(demo@192.168.1.123)58> P =spawn(Fun).
<0.11746.25>
(demo@192.168.1.123)59> proc_lib:initial_call(P).
false
(demo@192.168.1.123)60> P2 =proc_lib:spawn(Fun).
<0.11749.25>
(demo@192.168.1.123)61> proc_lib:initial_call(P2).
{erl_eval,'-expr/5-fun-1-',[]}
(demo@192.168.1.123)63> proc_lib:translate_initial_call(P).
{proc_lib,init_p,5}
(demo@192.168.1.123)64> proc_lib:translate_initial_call(P2).
{erl_eval,'-expr/5-fun-1-',0}
proc_lib提供了format函数来格式化进程崩溃报告,大家也可以操练一下.
进程Hibernate
我会在gen_fsm里面提到进程hibernate,本文暂且略过.
明天还要熬夜,今天早点休息,晚安,各位!
P.S @淘宝褚霸 昨天微博上对我说“c语言和系统功力才是最主要的,这个搞明白了,erlang顺手搞定”,记录于此,铭记在心。