1. 程式人生 > >系統技術非業餘研究 » Erlang open_port極度影響效能的因素

系統技術非業餘研究 » Erlang open_port極度影響效能的因素

Erlang的port相當於系統的IO,打開了Erlang世界通往外界的通道,可以很方便的執行外部程式。 但是open_port的效能對整個系統來講非常的重要,我就帶領大家看看open_port影響效能的因素。

首先看下open_port的文件:

{spawn, Command}

Starts an external program. Command is the name of the external program which will be run. Command runs outside the Erlang work space unless an Erlang driver with the name Command is found. If found, that driver will be started. A driver runs in the Erlang workspace, which means that it is linked with the Erlang runtime system.

When starting external programs on Solaris, the system call vfork is used in preference to fork for performance reasons, although it has a history of being less robust. If there are problems with using vfork, setting the environment variable ERL_NO_VFORK to any value will cause fork to be used instead.

For external programs, the PATH is searched (or an equivalent method is used to find programs, depending on operating system). This is done by invoking the shell och certain platforms. The first space separated token of the command will be considered as the name of the executable (or driver). This (among other things) makes this option unsuitable for running programs having spaces in file or directory names. Use {spawn_executable, Command} instead if spaces in executable file names is desired.

open_port一個外部程式的時候流程大概是這樣的:beam.smp先vfork, 子程序呼叫child_setup程式,做進一步的清理操作。 清理完成後才真正exec我們的外部程式。

再來看下open_port實現的程式碼:

// sys.c:L1352
static ErlDrvData spawn_start(ErlDrvPort port_num, char* name, SysDriverOpts* opts)
{
...
#if !DISABLE_VFORK
    int no_vfork;
    size_t no_vfork_sz = sizeof(no_vfork);

    no_vfork = (erts_sys_getenv("ERL_NO_VFORK",
                                (char *) &no_vfork,
                                &no_vfork_sz) >= 0);
#endif
...
else { /* Use vfork() */
        char **cs_argv= erts_alloc(ERTS_ALC_T_TMP,(CS_ARGV_NO_OF_ARGS + 1)*
                                   sizeof(char *));
        char fd_close_range[44];                  /* 44 bytes are enough to  */
        char dup2_op[CS_ARGV_NO_OF_DUP2_OPS][44]; /* hold any "%d:%d" string */
                                                  /* on a 64-bit machine.    */

        /* Setup argv[] for the child setup program (implemented in                                                                                                     
           erl_child_setup.c) */
        i = 0;
        if (opts->use_stdio) {
            if (opts->read_write & DO_READ){
                /* stdout for process */
                sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 1);
                if(opts->redir_stderr)
                    /* stderr for process */
                    sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 2);
            }
            if (opts->read_write & DO_WRITE)
                /* stdin for process */
                sprintf(&dup2_op[i++][0], "%d:%d", ofd[0], 0);
        } else {        /* XXX will fail if ofd[0] == 4 (unlikely..) */
            if (opts->read_write & DO_READ)
                sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 4);
            if (opts->read_write & DO_WRITE)
                sprintf(&dup2_op[i++][0], "%d:%d", ofd[0], 3);
        }
        for (; i < CS_ARGV_NO_OF_DUP2_OPS; i++)
            strcpy(&dup2_op[i][0], "-");
        sprintf(fd_close_range, "%d:%d", opts->use_stdio ? 3 : 5, max_files-1);

        cs_argv[CS_ARGV_PROGNAME_IX] = child_setup_prog;
        cs_argv[CS_ARGV_WD_IX] = opts->wd ? opts->wd : ".";
        cs_argv[CS_ARGV_UNBIND_IX] = erts_sched_bind_atvfork_child(unbind);
        cs_argv[CS_ARGV_FD_CR_IX] = fd_close_range;
        for (i = 0; i < CS_ARGV_NO_OF_DUP2_OPS; i++)
            cs_argv[CS_ARGV_DUP2_OP_IX(i)] = &dup2_op[i][0];
        if (opts->spawn_type == ERTS_SPAWN_EXECUTABLE) {
            int num = 0;
            int j = 0;
            if (opts->argv != NULL) {
                for(; opts->argv[num] != NULL; ++num)
                    ;
            }
            cs_argv = erts_realloc(ERTS_ALC_T_TMP,cs_argv, (CS_ARGV_NO_OF_ARGS + 1 + num + 1) * sizeof(char *));
            cs_argv[CS_ARGV_CMD_IX] = "-";
            cs_argv[CS_ARGV_NO_OF_ARGS] = cmd_line;
            if (opts->argv != NULL) {
                for (;opts->argv[j] != NULL; ++j) {
                    if (opts->argv[j] == erts_default_arg0) {
                        cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = cmd_line;
                    } else {
                        cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = opts->argv[j];
                    }
                }
            }
            cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = NULL;
        } else {
            cs_argv[CS_ARGV_CMD_IX] = cmd_line; /* Command */
            cs_argv[CS_ARGV_NO_OF_ARGS] = NULL;  
        }
        DEBUGF(("Using vfork\n"));
        pid = vfork();

	if (pid == 0) {
	    /* The child! */

	    /* Observe!                                                                                                      
             * OTP-4389: The child setup program (implemented in                                                             
             * erl_child_setup.c) will perform the necessary setup of the                                                    
             * child before it execs to the user program. This because                                                       
             * vfork() only allow an *immediate* execve() or _exit() in the                                                  
             * child.                                                                                                        
             */
            execve(child_setup_prog, cs_argv, new_environ);
	    _exit(1);
        }
        erts_free(ERTS_ALC_T_TMP,cs_argv);
...
}

在支援vfork的系統下,比如說linux,除非禁止,預設會採用vfork來執行child_setup來呼叫外部程式。
看下vfork的文件:

vfork() differs from fork() in that the parent is suspended until the child makes a call to execve(2) or _exit(2). The child shares all memory
with its parent, including the stack, until execve() is issued by the child. The child must not return from the current function or call
exit(), but may call _exit().

vfork的時候beam.smp整個程序會被阻塞,所以這裡是個很重要的效能影響點。

我們再看下erl_child_setup.c的程式碼:

// erl_child_setup.c:111
// 1.  取消繫結
if (strcmp("false", argv[CS_ARGV_UNBIND_IX]) != 0)
	if (erts_unbind_from_cpu_str(argv[CS_ARGV_UNBIND_IX]) != 0)
            return 1;
// 2.  複製控制代碼
 for (i = 0; i < CS_ARGV_NO_OF_DUP2_OPS; i++) {
        if (argv[CS_ARGV_DUP2_OP_IX(i)][0] == '-'
            && argv[CS_ARGV_DUP2_OP_IX(i)][1] == '\0')
            break;
        if (sscanf(argv[CS_ARGV_DUP2_OP_IX(i)], "%d:%d", &from, &to) != 2)
            return 1;
        if (dup2(from, to) < 0)
            return 1;
    }
// 3. 關閉控制代碼     
if (sscanf(argv[CS_ARGV_FD_CR_IX], "%d:%d", &from, &to) != 2)
        return 1;
    for (i = from; i <= to; i++)
        (void) close(i);

// 4. 呼叫外部程式
if (erts_spawn_executable) {
        if (argv[CS_ARGV_NO_OF_ARGS + 1] == NULL) {
            execl(argv[CS_ARGV_NO_OF_ARGS],argv[CS_ARGV_NO_OF_ARGS],
                  (char *) NULL);
        } else {
            execv(argv[CS_ARGV_NO_OF_ARGS],&(argv[CS_ARGV_NO_OF_ARGS + 1]));
        }
    } else {
        execl("/bin/sh", "sh", "-c", argv[CS_ARGV_CMD_IX], (char *) NULL);
    }
...

這是一個非常流程多的過程,而且1,2,3這三個步驟都非常的耗時。 特別是3對於一個繁忙的IO伺服器來講,會開啟大量的控制代碼,可能都有幾十萬,關閉這麼多的控制代碼會是個災難。

我們來演習下這個流程和具體的效能數字:
首先我們設計個open_port的場景,伺服器開啟768個socke控制代碼,再執行cat外部程式。

$ cat demo.erl
-module(demo).
-compile(export_all).

start()->
    _ = [gen_udp:open(0) || _ <- lists:seq(1,768)],
    Port = open_port({spawn, "/bin/cat"}, [in, out, {line, 128}]),
    port_close(Port),
    ok.

我們再準備個stap指令碼,用來分析這些行為和效能數字:

$ cat demo.stp
global t0, t1, t2

probe process("beam.smp").function("spawn_start") {
        printf("spawn %\s\n", user_string($name))
        t0 = gettimeofday_us()
}

probe process("beam.smp").statement("*@sys.c:1607") {
        t1 = gettimeofday_ns()
}

probe process("beam.smp").statement("*@sys.c:1627") {
        printf("vfork take %d ns\n", gettimeofday_ns() - t1);
}

probe process("child_setup").function("main") {
        t2 = gettimeofday_us()
}

probe process("child_setup").statement("*@erl_child_setup.c:111") {
        t3 = gettimeofday_us()
        printf("spawn take %d us, child_setup take %d us\n", t3 - t0, t3 - t2) 
}

probe syscall.execve {
        printf("%s, arg %s\n", name, argstr)
}

probe syscall.fork {
        printf("%s, arg %s\n", name, argstr)
}

probe begin {
        println(")");

我們在一個終端下執行stap指令碼觀察行為:

$ erlc demo.erl
$ PATH=otp/bin/x86_64-unknown-linux-gnu/:$PATH sudo stap demo.stp
)
fork, arg 
execve, arg otp/bin/erl 
fork, arg 
fork, arg 
fork, arg 
execve, arg /bin/sed "s/.*\\///"
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/erlexec 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/beam.smp "--" "-root" "/home/chuba/otp" "-progname" "erl" "--" "-home" "/home/chuba" "--"
clone, arg .
..
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
spawn inet_gethost 4 
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec inet_gethost 4 " "3:327679" "8:1" "9:0" "-"
vfork take 8487 ns
spawn take 173707 us, child_setup take 94535 us
execve, arg /bin/sh "-c" "exec inet_gethost 4 "
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/inet_gethost "4"
fork, arg 
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
spawn /bin/cat
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec /bin/cat" "3:327679" "2312:1" "2313:0" "-"
vfork take 5298 ns
spawn take 180974 us, child_setup take 101646 us
execve, arg /bin/sh "-c" "exec /bin/cat"
execve, arg /bin/cat 
spawn /bin/cat
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec /bin/cat" "3:327679" "3080:1" "3081:0" "-"
vfork take 8929 ns
spawn take 169569 us, child_setup take 90163 us
execve, arg /bin/sh "-c" "exec /bin/cat"
execve, arg /bin/cat 
...

在另外一個終端下執行我們的測試案例:

$ otp/bin/erl
Erlang R14B04 (erts-5.8.5) [/source] [64-bit] [smp:16:16] [rq:16] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5  (abort with ^G)
1> demo:start().
ok
2> demo:start().
ok
3> 

我們可以看到二次執行的開銷差不多:
vfork take 8929 ns
spawn take 169569 us, child_setup take 90163 us

從實驗得來的數字來看:
vfork需要阻塞beam.smp 8個us時間,而整個spawn下來要169ms, 其中 child_setup關閉控制代碼等等花了90ms, 數字無情的告訴我們這些效能殺手不容忽視。

解決方案:
1. 改用fork避免阻塞beam.smp, erl -env ERL_NO_VFORK 1
2. 減少檔案控制代碼,如果確實需要大量的open_port讓另外一個專注的節點來做。

祝玩得開心!

Post Footer automatically generated by wp-posturl plugin for wordpress.