1. 程式人生 > >系統技術非業餘研究 » ”Erlang supervisor 極其白痴的 Bug“的澄清

系統技術非業餘研究 » ”Erlang supervisor 極其白痴的 Bug“的澄清

2008-05-26的時候, 著名的Trustno1發表了這篇文章 http://www.iteye.com/topic/197097 抱怨Erlang supervisor 極其白痴的一個bug.

今天 @淘李福 同學重新提起這個事情:

翻到一個老帖子: http://www.iteye.com/topic/197097
現在是 R14 ,程式碼還是那樣,我覺得是不是我們理解錯了,shutdown屬於normal退出

由於該帖子關閉評論, 所以我在這裡澄清下,這個不是bug!

前幾天我重新讀了下init.erl的程式碼,是特地的設計,目的是在系統init:stop的時候為了讓kernel程序包括supervisor tree有個正常退出的機會。

我們來看下init:stop 的程式碼:
erts/preloaded/src/init.erl

...
%%% -------------------------------------------------
%%% Stop the system.
%%% Reason is: restart | reboot | stop
%%% According to reason terminate emulator or restart
%%% system using the same init process again.
%%% -------------------------------------------------

stop(Reason,State) ->
    BootPid = State#state.bootpid,
    {_,Progress} = State#state.status,
    State1 = State#state{status = {stopping, Progress}},
    clear_system(BootPid,State1),
    do_stop(Reason,State1).

do_stop(restart,#state{start = Start, flags = Flags, args = Args}) ->
    boot(Start,Flags,Args);
do_stop(reboot,_) ->
    halt();
do_stop(stop,State) ->
    stop_heart(State),
    halt();
do_stop({stop,Status},State) ->
    stop_heart(State),
    halt(Status).

clear_system(BootPid,State) ->
    Heart = get_heart(State#state.kernel),
    shutdown_pids(Heart,BootPid,State),
    unload(Heart).
stop_heart(State) ->
    case get_heart(State#state.kernel) of
        false ->
            ok;
        Pid ->
            %% As heart survives a restart the Parent of heart is init.
            BootPid = self(),
            %% ignore timeout
            shutdown_kernel_pid(Pid, BootPid, self(), State)
    end.

shutdown_pids(Heart,BootPid,State) ->
    Timer = shutdown_timer(State#state.flags),
    catch shutdown(State#state.kernel,BootPid,Timer,State),
    kill_all_pids(Heart), % Even the shutdown timer.
    kill_all_ports(Heart),
    flush_timout(Timer).

get_heart([{heart,Pid}|_Kernel]) -> Pid;
get_heart([_|Kernel])           -> get_heart(Kernel);
get_heart(_)                    -> false.

shutdown([{heart,_Pid}|Kernel],BootPid,Timer,State) ->
    shutdown(Kernel, BootPid, Timer, State);
shutdown([{_Name,Pid}|Kernel],BootPid,Timer,State) ->
    shutdown_kernel_pid(Pid, BootPid, Timer, State),
    shutdown(Kernel,BootPid,Timer,State);
shutdown(_,_,_,_) ->
    true.


%%
%% A kernel pid must handle the special case message
%% {'EXIT',Parent,Reason} and terminate upon it!
%%
shutdown_kernel_pid(Pid, BootPid, Timer, State) ->
    Pid ! {'EXIT',BootPid,shutdown},
    shutdown_loop(Pid, Timer, State, []).

...

系統會先用exit(Pid,kill)殺掉非kernel型別的程序,然後再用Pid ! {‘EXIT’,BootPid,shutdown},殺掉shutdown_kernel_pid。

這句話是重點: A kernel pid must handle the special case message and terminate upon it!
那麼什麼是kernel程序呢?

看下bin/start.script

...
{kernelProcess,heart,{heart,start,[]}},
     {kernelProcess,error_logger,{error_logger,start_link,[]}},
     {kernelProcess,application_controller,
         {application_controller,start,
             [{application,kernel,
...

這些帶kernelProcess標籤的程序都是, 特別是application!

到此為止,我們能很好的理解supervisor.erl中的這二句話了:

do_restart(_, shutdown, Child, State) ->   
    NState = state_del_child(Child, State),   
    {ok, NState};   
do_restart(transient, Reason, Child, State) ->   
    report_error(child_terminated, Reason, Child, State#state.name),   
    restart(Child, State);  

小結:不要輕易懷疑別人,特別是跑了20年以上的系統!

玩得開心!

Post Footer automatically generated by wp-posturl plugin for wordpress.