系統技術非業餘研究 » 網路棧記憶體不足引發程序掛起問題

阿新 • • 發佈：2019-01-13

我們知道TCP socket有傳送緩衝區和接收緩衝區，這二個緩衝區都可以透過setsockopt設定SO_SNDBUF，SO_RCVBUF來修改，但是這些值設多大呢？這些值和協議棧的記憶體控制相關的值什麼關係呢？
我們來解釋下：

$ sysctl net|grep mem
net.core.wmem_max = 131071
net.core.rmem_max = 131071
net.core.wmem_default = 124928
net.core.rmem_default = 124928
net.core.optmem_max = 20480
net.ipv4.igmp_max_memberships = 20
net.ipv4.tcp_mem = 4631520 6175360 9263040
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.udp_mem = 4631520 6175360 9263040
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096

下面的圖很好的解釋了上面的問題：

這裡要記住的是：TCP協議棧記憶體是不可交換實體記憶體，用一位元組少一位元組。
也正是由於這一點，作業系統出廠的時候上面的預設的記憶體設定都不算太大。對於一個不是網路密集型的伺服器問題不大，但是對於如承擔C1M連結的伺服器來講，問題就來了。我們在實踐中會發現tcp服務經常超時，有時候超過100ms. 那麼這個問題如何定位呢？

我們知道當協議棧缺少記憶體的時候會呼叫sk_stream_wait_memory等待其他程序釋放出記憶體，所以這個函式的等待時間就是我們的程序被阻塞的時間。
下面我們來驗證下：

$ cat /usr/share/doc/systemtap-1.6/examples/network/sk_stream_wait_memory.stp
#!/usr/bin/stap
# Simple probe to detect when a process is waiting for more socket send
# buffer memory. Usually means the process is doing writes larger than the
# socket send buffer size or there is a slow receiver at the other side.
# Increasing the socket's send buffer size might help decrease application
# latencies, but it might also make it worse, so buyer beware.
#
# Typical output: timestamp in microseconds: procname(pid) event
#
# 1218230114875167: python(17631) blocked on full send buffer
# 1218230114876196: python(17631) recovered from full send buffer
# 1218230114876271: python(17631) blocked on full send buffer
# 1218230114876479: python(17631) recovered from full send buffer

probe kernel.function("sk_stream_wait_memory")
{
        printf("%u: %s(%d) blocked on full send buffer\n",
                gettimeofday_us(), execname(), pid())
}

probe kernel.function("sk_stream_wait_memory").return
{
        printf("%u: %s(%d) recovered from full send buffer\n",
                gettimeofday_us(), execname(), pid())
}
$ sudo stap sk_stream_wait_memory.stp  
1218230114875167: python(17631) blocked on full send buffer
1218230114876196: python(17631) recovered from full send buffer
1218230114876271: python(17631) blocked on full send buffer
1218230114876479: python(17631) recovered from full send buffer

如果我們觀察到了程序由於缺少記憶體被阻塞，那麼是時候調整協議棧的記憶體限制了。

小結：網路很複雜，需要定量分析！

祝玩得開心。

Post Footer automatically generated by wp-posturl plugin for wordpress.

系統技術非業餘研究 » 網路棧記憶體不足引發程序掛起問題

系統技術非業餘研究 » 網路棧記憶體不足引發程序掛起問題

系統技術非業餘研究 » Linux快取記憶體使用率調查

系統技術非業餘研究 » Linux Used記憶體到底哪裡去了？

系統技術非業餘研究 » iotop統計linux下per程序的IO活動

系統技術非業餘研究 » dropwatch 網路協議棧丟包檢查利器

系統技術非業餘研究 » Erlang R15的記憶體delayed dealloc特性對訊息密集型程式的影響

系統技術非業餘研究 » gcc mudflap 用來檢測記憶體越界的問題

系統技術非業餘研究 » Linux系統記憶體相關資訊獲取

系統技術非業餘研究 » Erlang 網路密集型伺服器的瓶頸和解決思路

系統技術非業餘研究 » erlsnoop erlang訊息監聽器(除錯erlang網路程式利器,支援最新的R13B04)

系統技術非業餘研究 » Erlang網路多程序模型的實驗

系統技術非業餘研究 » R13B03 binary vheap有助減少binary記憶體壓力

系統技術非業餘研究 » blktrace未公開選項網路儲存擷取資料

系統技術非業餘研究 » 調研核心呼叫棧方便的工具 kmalloc

系統技術非業餘研究 » Linux IO協議棧框圖

系統技術非業餘研究 » 巧用Netcat方便網路程式開發

系統技術非業餘研究 » systemtap函式呼叫棧資訊不齊的原因和解決方法

系統技術非業餘研究 » qperf測量網路頻寬和延遲

系統技術非業餘研究 » 詳解伺服器記憶體頻寬計算和使用情況測量

系統技術非業餘研究 » nicstat 網路流量統計利器

系統技術非業餘研究 » 網路棧記憶體不足引發程序掛起問題

相關推薦