模擬網路狀態的利器 TC
本文主要介紹了可以模擬出多種複雜的網際網路傳輸效能的工具——TC,及具體的模擬方法。
上篇文章回顧: ofollow,noindex"> Nginx請求處理流程你瞭解嗎?
在日常生產環境中,如何判斷網路執行狀況是否正常是一個讓大家比較耗神的一件事情,因為我們往往被某些不太友好的人以所謂的“網路問題”甩鍋至此並開始了我們洗白的經歷,今天給大家介紹一個分析網路狀態的好幫手——TC。
說到TC,我們就不得不談談Netem(Network Emulator),Netem是Linux2.6及以上核心版本提供的一個網路模擬功能模組。該功能模組可可以用來在效能良好的區域網環境中,模擬出複雜的網際網路傳輸效能。例如:低頻寬、傳輸延遲、丟包等等等情況。
TC是Linu系統中的一個使用者工具,全名為Traffic Control(流量控制)。TC可以用來控制Netem模組的工作模式,也就是說如果想使用Netem需要至少兩個條件,一是核心中的Netem模組被啟用,另一個是要對應的使用者態工具TC,它們之間的關係你可以理解為netfilter框架和iptables的關係。
下面就讓我們一起來看看TC的有用之處(其實TC有很多功能,我們今天只介紹模擬網路環境的用處),我們先了解一下如下引數代表的意義再開始實驗。
Add:表示為指定網絡卡新增Netem配置。
Change:表示修改已經存在的Netem配置到新的值。
Replace:表示替換已經存在的Netem配置的值。
del:表示刪除網絡卡上的Netem配置。
1
模擬延遲傳輸
如果你想在一個局域網裡模擬遠距離傳輸的延遲可以用這個方法,比如實際使用者訪問網站延遲為 51 ms,而你測試環境網路互動只需要 1ms,那麼只要新增 50ms 額外延遲就行。
[root@tj1-vm-search020 ~]# tc qdisc add dev eth0 root netem delay 50ms [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=50.0 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=50.0 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=50.0 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=50.0 ms ^C --- tj1-vm-search020.kscn ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 50.037/50.044/50.063/0.223 ms
如果在網路中看到非常穩定的時延,很可能是某個地方加了定時器,因為網路線路很複雜,傳輸過程一定會有變化。因此實際情況網路延遲一定會有變化的,Netem 也考慮到這一點,提供了額外的引數來控制延遲的時間分佈。完整的引數列表為:
DELAY := delay TIME [ JITTER [ CORRELATION ]]] [ distribution { uniform | normal | pareto |paretonormal } ]
除了延遲時間 TIME 之外,還有三個可選引數:
-
JITTER:抖動,增加一個隨機時間長度,讓延遲時間出現在某個範圍。
-
CORRELATION:相關,下一個報文延遲時間和上一個報文的相關係數。
-
distribution:分佈,延遲的分佈模式。可以選擇的值有 uniform、normal、pareto 和 paretonormal。
先說說 JITTER,如果設定為 20ms,那麼報文延遲的時間在 50ms ± 20ms 之間,具體值隨機選擇:
[root@tj1-vm-search020 ~]# tc qdisc replace dev eth0 root netem delay 50ms 20ms [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=69.4 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=51.9 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=66.3 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=57.4 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=46.0 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=33.8 ms ^C --- tj1-vm-search020.kscn ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5007ms rtt min/avg/max/mdev = 33.877/54.178/69.446/12.063 ms
CORRELATION 指相關性,因為網路狀況是平滑變化的,短時間裡相鄰報文的延遲應該是近似的而不是完全隨機的。這個值是個百分比,如果為 100%,就退化到固定延遲的情況;如果是 0% 則退化到隨機延遲的情況。
[root@tj1-vm-search020 ~]# tc qdisc replace dev eth0 root netem delay 50ms 20ms 30% [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=47.6 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=58.3 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=47.4 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=33.8 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=61.0 ms ^C --- tj1-vm-search020.kscn ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 33.898/49.668/61.050/9.610 ms
報文的分佈和很多現實事件一樣都滿足某種統計規律,比如最常用的正態分佈。因此為了更逼近現實情況,可以使用 distribution 引數來限制它的延遲分佈模型。比如讓報文延遲時間滿足正態分佈:
[root@tj1-vm-search020 ~]#tc qdisc replace dev eth0 root netem delay 50ms 20ms distribution normal [root@tj1-vm-search019 ~]# ping tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=41.7 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=44.3 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=50.7 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=57.2 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=37.6 ms ^C --- tj1-vm-search020.kscn ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 37.675/46.350/57.249/6.912 ms
這樣的話,大部分的延遲會在平均值的一定範圍內,而很少接近出現最大值和最小值的延遲。
其他分佈方法包括:uniform、pareto 和 paretonormal,這些分佈方法感興趣的讀者可以自行了解。對於大多數情況,隨機在某個時間範圍裡延遲就能滿足需求的。
2
模擬丟包率
另一個常見的網路異常是因為丟包,丟包會導致重傳,從而增加網路鏈路的流量和延遲。Netem 的 loss 引數可以模擬丟包率,比如傳送的報文有 50% 的丟包率(為了容易用 ping 看出來,所以這個數字我選的很大,實際情況丟包率可能比這個小很多,比如 0.5%):
[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem loss 50% [root@tj1-vm-search019 ~]# ping -c 10 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.049 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.038 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=7 ttl=64 time=0.036 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.037 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=9 ttl=64 time=0.035 ms --- tj1-vm-search020.kscn ping statistics --- 10 packets transmitted, 5 received, 50% packet loss, time 9000ms rtt min/avg/max/mdev = 0.035/0.039/0.049/0.005 ms
可以從 icmp_seq 序號看出來大約有一半的報文丟掉了,和延遲類似丟包率也可以增加一個相關係數,表示後一個報文丟包概率和它前一個報文的相關性。
3
模擬包重複
報文重複和丟包的引數類似,就是重複率和相關性兩個引數,比如隨機產生 50% 重複的包:
[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem duplicate 50% [root@tj1-vm-search019 ~]# ping -c 10 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.039 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.044 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.045 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.050 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.033 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.037 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.033 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.038 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=0.037 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.036 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.039 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=7 ttl=64 time=0.029 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.030 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.034 ms (DUP!) 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=9 ttl=64 time=0.037 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=10 ttl=64 time=0.036 ms --- tj1-vm-search020.kscn ping statistics --- 10 packets transmitted, 10 received, +6 duplicates, 0% packet loss, time 9001ms rtt min/avg/max/mdev = 0.029/0.037/0.050/0.007 ms
4
模擬包損壞
報文損壞和報文重複的引數也類似,比如隨機產生 2% 損壞的報文(在報文的隨機位置造成一個位元的錯誤)。
[root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem corrupt 2% [root@tj1-vm-search019 ~]# pingtj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=0.043 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=0.040 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=0.033 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=0.034 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=5 ttl=64 time=0.033 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=0.043 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=8 ttl=64 time=0.039 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=10 ttl=64 time=0.056 ms wrong data byte #39 should be 0x27 but was 0xa7 #1610 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 a7 28 29 2a 2b 2c 2d 2e 2f #4830 31 32 33 34 35 36 37 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=11 ttl=64 time=0.046 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=12 ttl=64 time=0.036 ms Warning: time of day goes back (-4773815605012725725us), taking countermeasures. Warning: time of day goes back (-4773815605012725708us), taking countermeasures. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=13 ttl=64 time=0.000 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=15 ttl=64 time=0.045 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=16 ttl=64 time=0.043 ms ^C --- tj1-vm-search020.kscn ping statistics --- 16 packets transmitted, 13 received, 18% packet loss, time 15001ms rtt min/avg/max/mdev = 0.000/0.037/0.056/0.014 ms
5
模擬包亂序
網路傳輸並不能保證順序,傳輸層 TCP 會對報文進行重組保證順序,所以報文亂序對應用的影響比上面的幾種問題要小。
報文亂序和前面的引數不太一樣,因為上面的報文問題都是獨立的。針對單個報文做操作就行,而亂序則牽涉到多個報文的重組。模擬報亂序一定會用到延遲(因為模擬亂序的本質就是把一些包延遲傳送),Netem 有兩種方法可以做。
第一種是固定的每隔一定數量的報文就亂序一次。
# 每 5 個報文(第 5、10、15…報文)會正常傳送,其他的報文延遲 50ms。 [root@tj1-vm-search020 ~]# tc qdisc change dev eth0 root netem reorder 50% gap 3 delay 50ms [root@tj1-vm-search019 ~]# ping-i 0.01 tj1-vm-search020.kscn | more PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=10.5 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=50.0 ms wrong data byte #21 should be 0x15 but was 0x5 #1610 11 12 13 14 5 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f #4830 31 32 33 34 35 36 37 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=2 ttl=64 time=50.0 ms
要想看到 ping 報文的亂序,我們要保證傳送報文的間隔小於報文的延遲時間 50ms,這裡用 -i 0.01 把傳送間隔設定為 10ms。
第二種方法的亂序是相對隨機的,使用概率來選擇亂序的報文。
$ tc qdisc change dev enp0s5 root netem reorder 50% 15% delay 300ms [root@tj1-vm-search019 ~]# ping-i 0.01 tj1-vm-search020.kscn PING tj1-vm-search020.kscn (10.38.167.17) 56(84) bytes of data. 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=6 ttl=64 time=11.5 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=4 ttl=64 time=51.5 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=3 ttl=64 time=71.5 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=1 ttl=64 time=111 ms 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=13 ttl=64 time=85.0 ms wrong data byte #51 should be 0x33 but was 0x23 #1610 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f #4830 31 32 23 34 35 36 37 64 bytes from tj1-vm-search020.kscn (10.38.167.17): icmp_seq=12 ttl=64 time=105 ms
50% 的報文會正常傳送,其他報文(1-50%)延遲 300ms 傳送,這裡選擇的延遲很大是為了能夠明顯看出來亂序的結果。
結語
本文介紹了TC在模擬網路狀態的幾種應用場景,實際上,TC作為Linux提供的高階流量控制工具,還有很多高階用法,諸入SHAPING(限制)、SCHEDULING(排程)、POLICING(策略)、DROPPING(丟棄)、QDISC(排隊規則)、CLASS(類)、FILTER(過濾器)。本文無法盡述,僅希望能給大家帶來一些基礎認識,激發大家深入瞭解TC。