1. 程式人生 > >gen_tcp接受連結時enfile的問題分析及解決

gen_tcp接受連結時enfile的問題分析及解決

最近我們為了安全方面的原因,在RDS伺服器上做了個代理程式把普通的MYSQL TCP連線變成了SSL連結,在測試的時候,皓庭同學發現Tsung發起了幾千個TCP連結後Erlang做的SSL PROXY老是報告gen_tcp:accept返回{error, enfile}錯誤。針對這個問題,我展開了如下的調查:

首先man accept手冊,確定enfile的原因,因為gen_tcp肯定是呼叫accept系統呼叫的:

    EMFILE The per-process limit of open file descriptors has been reached.
    ENFILE The system limit on the total number of open files has been reached.

從文件來看是由於系統的檔案控制代碼數用完了,我們順著來調查下:

$ uname -r
2.6.18-164.el5
$ cat /proc/sys/fs/file-nr
2040    0       2417338
$ ulimit -n
65535

由於我們微調了系統的檔案控制代碼,具體參考這裡 老生常談: ulimit問題及其影響, 這些引數看起來非常的正常。
先看下net/socket.c程式碼:

static int sock_alloc_fd(struct file **filep)
{
        int fd;
 
        fd = get_unused_fd();
        if (likely(fd >= 0)) {
                struct file *file = get_empty_filp();
 
                *filep = file;
                if (unlikely(!file)) {
                        put_unused_fd(fd);
                        return -ENFILE;
                }
        } else
                *filep = NULL;
        return fd;
}
 
static int __sock_create(int family, int type, int protocol, struct socket **res, int kern)
{
...
/*                                                                                                                                                                     
 *      Allocate the socket and allow the family to set things up. if                                                                                                  
 *      the protocol is 0, the family is instructed to select an appropriate                                                                                           
 *      default.                                                                                                                                                       
 */
 
        if (!(sock = sock_alloc())) {
                if (net_ratelimit())
                        printk(KERN_WARNING "socket: no more sockets\n");
                err = -ENFILE;          /* Not exactly a match, but its the                                                                                            
                                           closest posix thing */
                goto out;
        }
...
}
 
asmlinkage long sys_accept(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen)
{
        struct socket *sock, *newsock;
        struct file *newfile;
        int err, len, newfd, fput_needed;
        char address[MAX_SOCK_ADDR];
 
        sock = sockfd_lookup_light(fd, &err, &fput_needed);
        if (!sock)
                goto out;
 
        err = -ENFILE;                
        if (!(newsock = sock_alloc()))
                goto out_put;
...
}

從程式碼來看,會返回ENFILE都是由於socket控制代碼分配不出來了,我們還是本著懷疑的態度來寫個stap指令碼來再次驗證下:

$ cat enfile.stp
probe kernel.function("kmem_cache_alloc").return,
          kernel.function("get_empty_filp").return{
  if($return == 0) { print_backtrace();exit();}
}
probe kernel.function("sock_alloc_fd").return {
  if($return < 0) { print_backtrace(); exit();}
}
probe syscall.accept.return {
  if($return == -23) {print_backtrace(); exit();}
}
probe begin {
println(":~");
}
$ sudo stap enfile.stp
:~

gen_tcp:accept報告{error, enfile}的時候,也沒看到stap報異常,基本上可以排除作業系統的原因了,那麼我們現在回到gen_tcp的實現來看。
gen_tcp是個port, 具體實現在erts/emulator/drivers/common/inet_drv.c,我們來看下有ENFILE的地方:

/* Copy a descriptor, by creating a new port with same settings                                                                                                        
 * as the descriptor desc.                                                                                                                                             
 * return NULL on error (ENFILE no ports avail)                                                                                                                        
 */
static tcp_descriptor* tcp_inet_copy(tcp_descriptor* desc,SOCKET s,
                                     ErlDrvTermData owner, int* err)
{
...
    /* The new port will be linked and connected to the original caller */
    port = driver_create_port(port, owner, "tcp_inet", (ErlDrvData) copy_desc);
    if ((long)port == -1) {
        *err = ENFILE;
        FREE(copy_desc);
        return NULL;
    }
...
}

當 driver_create_port 失敗的時候,gen_tcp返回ENFILE,看起來這次找對地方了。我們繼續看下 driver_create_port的實現:
看下erts/emulator/beam/io.c:

/*                                                                                                                                                                     
 * Driver function to create new instances of a driver                                                                                                                 
 * Historical reason: to be used with inet_drv for creating                                                                                                            
 * accept sockets inorder to avoid a global table.                                                                                                                     
 */
ErlDrvPort
driver_create_port(ErlDrvPort creator_port_ix, /* Creating port */
                   ErlDrvTermData pid,    /* Owner/Caller */
                   char* name,            /* Driver name */
                   ErlDrvData drv_data)   /* Driver data */
{
...
    rp = erts_pid2proc(NULL, 0, pid, ERTS_PROC_LOCK_LINK);
    if (!rp) {
        erts_smp_mtx_unlock(&erts_driver_list_lock);
        return (ErlDrvTermData) -1;   /* pid does not exist */
    }
    if ((port_num = get_free_port()) < 0) {
        errno = ENFILE;
        erts_smp_proc_unlock(rp, ERTS_PROC_LOCK_LINK);
        erts_smp_mtx_unlock(&erts_driver_list_lock);
        return (ErlDrvTermData) -1;
    }
 
    port_id = make_internal_port(port_num);
    port = &erts_port[port_num & erts_port_tab_index_mask];
...
}

get_free_port()<0的時候就返回ENFILE錯誤。
那我們看下port總的數目是如何設定的:

/* initialize the port array */
void init_io(void)
{
…
 if (erts_sys_getenv("ERL_MAX_PORTS", maxports, &maxportssize) == 0)
        erts_max_ports = atoi(maxports);
    else
        erts_max_ports = sys_max_files();
 
    if (erts_max_ports > ERTS_MAX_PORTS)
        erts_max_ports = ERTS_MAX_PORTS;
    if (erts_max_ports < 1024)
        erts_max_ports = 1024;
 
    if (erts_use_r9_pids_ports) {
        ports_bits = ERTS_R9_PORTS_BITS;
        if (erts_max_ports > ERTS_MAX_R9_PORTS)
            erts_max_ports = ERTS_MAX_R9_PORTS;
    }
 
    port_extra_shift = erts_fit_in_bits(erts_max_ports – 1);
    port_num_mask = (1 << ports_bits) – 1;
…
}

第一步:如果設定了ERL_MAX_PORTS環境變數,那麼就按照使用者設定的,否則就和ulimit -n 一樣大。
第二部:這個值不能大於ERTS_MAX_PORTS或者小於1024.

好了,我們基本上明白這個問題的原因了: erts_max_ports設定的太小.

我們再來驗證下:
gdb attach到我們的程序下
(gdb) p erts_max_ports
$1 = 4096
原來是port設定有問題,導致上面的現象,看起來很繞的,Erlang的設計者認為PORT資源(相當於作業系統的IO資源)短缺如同作業系統的檔案控制代碼短缺一樣,達到system_limit就應該出ENFILE錯誤!

解決方案是: erl -env ERTS_MAX_PORTS NNNN 搞大點就好。

順便再來強調下Erlang伺服器幾個關鍵的引數,來源:http://www.ejabberd.im/tuning,對伺服器的設定很有幫助。

    This page lists several tricks to tune your ejabberd and Erlang installation for maximum performance gains. Remark that some of the described options are experimental.

    Erlang Ports Limit: ERL_MAX_PORTS
    Erlang consumes one port for every connection, either from a client or from another Jabber server. The option ERL_MAX_PORTS limits the number of concurrent connections and can be specified when starting ejabberd:

    erl -s ejabberd -env ERL_MAX_PORTS 5000 …

    Maximum Number of Erlang Processes: +P
    Erlang consumes a lot of lightweight processes. If there is a lot of activity on ejabberd so that the maximum number of proccesses is reached, people will experiment greater latency times. As these processes are implemented in Erlang, and therefore not related to the operating system processes, you do not have to worry about allowing a huge number of them.

    erl -s ejabberd +P 250000 …

    ERL_FULLSWEEP_AFTER: Maximum number of collections before a forced fullsweep
    The ERL_FULLSWEEP_AFTER option shrinks the size of the Erlang process after RAM intensive events. Note that this option may downgrade performance. Hence this option is only interesting on machines that host other services (webserver, mail) on which ejabberd does not receive constant load.

    erl -s ejabberd -env ERL_FULLSWEEP_AFTER 0 …

    Kernel Polling: +K true

    The kernel polling option requires that you have support for it in your kernel. By default, Erlang currently supports kernel polling under FreeBSD, Mac OS X, and Solaris. If you use Linux, check this newspost. Additionaly, you need to enable this feature while compiling Erlang.

    From Erlang documentation -> Basic Applications -> erts -> erl -> System Flags:

    +K true|false

    Enables or disables the kernel poll functionality if the emulator has kernel poll support. By default the kernel poll; functionality is disabled. If the emulator doesn't have kernel poll support and the +K flag is passed to the emulator, a warning is issued at startup.

    If you meet all requirements, you can enable it in this way:

    erl -s ejabberd +K true …

    Mnesia Tables to Disk
    By default, ejabberd uses Mnesia as its database. In Mnesia you can configure each table in the database to be stored on RAM, on RAM and on disk, or only on disk. You can configure this in the web interface: Nodes -> 'mynode' -> DB Management. Modification of this option will consume some memory and CPU time.
    Number of Concurrent ETS and Mnesia Tables: ERL_MAX_ETS_TABLES
    The number of concurrent ETS and Mnesia tables is limited. When the limit is reached, errors will appear in the logs:

    ** Too many db tables **

    You can safely increase this limit when starting ejabberd. It impacts memory consumption but the difference will be quite small.

    erl -s ejabberd -env ERL_MAX_ETS_TABLES 20000 …

小結:很多問題是很繞的,要多方面考慮驗證。

祝玩得開心!