理解 CPU 利用率

CPU · 發表 2018-11-05 08:21:29

摘要：從 top 命令說起在 Linux shell 上執行top 命令，可以看到這樣一行 CPU 利用率的資料： %Cpu(s):0.1 us,0.0 sy,0.0 ni, 99.9 id,0.0 wa,0.0 hi,0.0 si...

從 top 命令說起

在 Linux shell 上執行top 命令，可以看到這樣一行 CPU 利用率的資料：

%Cpu(s):0.1 us,0.0 sy,0.0 ni, 99.9 id,0.0 wa,0.0 hi,0.0 si,0.0 st

這裡引用一下ofollow,noindex"> top 命令的 Linux man-pages 裡面的介紹：

us, user: time running un-niced user processes

sy, system: time running kernel processes

ni, nice: time running niced user processes

id, idle: time spent in the kernel idle handler

wa, IO-wait: time waiting for I/Ocompletion

hi: time spent servicingh ardwarei nterrupts

si: time spent servicings oftwarei nterrupts

st: timest olen from this vm by the hypervisor

/proc/stat

簡單介紹一下 Linux 計算 CPU 利用率的基本方法。

/proc/stat 儲存的是系統的一些統計資訊。在我的機器上的某一時刻，內容如下：

[linjinhe@localhost ~]$ cat /proc/stat 
cpu117450 5606 72399 476481991 1832 0 2681 0 0 0
cpu0 31054 90 19055 119142729 427 0 1706 0 0 0
cpu1 22476 3859 18548 119155098 382 0 272 0 0 0
cpu2 29208 1397 19750 119100548 462 0 328 0 0 0
cpu3 34711 258 15045 119083615 560 0 374 0 0 0
intr 41826673 113 102 0 0 0 0 0 0 1 134 0 0 186 0 0 0 81 0 0 256375 0 0 0 29 0 602 143 16 442 94859 271462 25609 4618 8846 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 58634924
btime 1540055659
processes 5180062
procs_running 1
procs_blocked 0
softirq 49572367 5 22376247 238452 3163482 257166 0 4492 19385190 0 414733

我們只關注第一行cpu （下面第二行是我加上的註釋）。

cpu117450 560672399476481991183202681000
(us)(ni)(sy)(id)(wa)(hi)(si)(st) (guest) (guest_nice)

前面一節，對於 CPU 利用率描述，Linux man-pages 用的都是 time（time running， time spent，time stolen ）這個單詞。這裡的統計資料，其實就是 CPU 從系統啟動至當前，各項（us, sy, ni, id, wa, hi, si, st）佔用的時間，單位是 jiffies。通過sysconf(_SC_CLK_TCK) 可以獲得 1 秒被分成多少個 jiffies 。一般是 100，即 1 jiffies == 0.01 s。（st、guest、guest_nice 和虛擬化/虛擬機器相關，如果這些值太高，說明虛擬化的實現或者宿主機有問題。不是本文關注的重點。）

計算 CPU 使用率的基本原理就是從/proc/stat 進行取樣和計算。最簡單的方法，一秒取樣一次/proc/stat ，如：

第 N 秒取樣得到cpu_total1 = us1 + ni1 + sy1 + id1 + wa1 + hi1 + si1 + st1 + guest1 + guest_nice1
第 N+1 秒取樣得到cpu_total2 = us2 + ni2 + sy2 + id2 + wa2 + hi2 + si2 + st2 + guest2 + guest_nice2
us 的佔比為(us2 - us1) / (cpu_total2 - cpu_total1) 。

nice

nice - run a program with modified scheduling priority

nice 是一個可以修改程序排程優先順序的命令，具體可以參考man-pages 。在 Linux 中，一個程序有一個 nice 值，代表的是這個程序的排程優先順序。

越 nice （nice 值越大）的程序，排程優先順序越低。怎麼理解這句話？程序排程本質上是程序間對 CPU 這一有限資源的爭搶，越 nice 的程序，越會“謙讓”，所以它的獲得 CPU 的機會就越低。

上面的 CPU 利用率裡面，將使用者態程序使用的 CPU 分成 niced 和 un-niced 兩部分，沒什麼本質差別。平時很少遇到要使用nice 命令的場景（我個人從來沒遇到過）。

理解 us

知道了us 代表的意義後，簡單寫個程式碼控制一下us 。

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>

void* CpuUsWorker(void* arg)
{
uint64_t i = 0;
while (true)
{
i++;
}
return nullptr;
}

void CpuUs(int n)
{
std::vector<pthread_t> pthreads(n);
for (int i = 0; i < n; i++)
{
assert(pthread_create(&pthreads[i], nullptr, CpuUsWorker, nullptr) == 0);
}

for (const auto& tid : pthreads)
{
assert(pthread_join(tid, nullptr) == 0);
}
}

int main(int argc, char** argv)
{
if (argc != 2)
{
fprintf(stderr, "Usage: %s threads\n", argv[0]);
return -1;
}
CpuUs(std::stoi(argv[1]));
return 0;
}

測試的機器是 4 個核。程式碼比較簡單，一個執行緒可以跑滿一個核。下面是我的測試結果：

./cpu_us 1
%Cpu(s): 25.0 us,0.0 sy,0.0 ni, 75.0 id,0.0 wa,0.0 hi,0.0 si,0.0 st

./cpu_us 2
%Cpu(s): 50.0 us,0.1 sy,0.0 ni, 49.9 id,0.0 wa,0.0 hi,0.0 si,0.0 st

./cpu_us 3
%Cpu(s): 75.1 us,0.0 sy,0.0 ni, 24.9 id,0.0 wa,0.0 hi,0.0 si,0.0 st

理解 ni

ni
nice

下面是我的測試結果，可以看出 ni 變成 25%，符合預期。

nice ./cpu_us 1
%Cpu(s):0.1 us,0.0 sy, 25.0 ni, 74.9 id,0.0 wa,0.0 hi,0.0 si,0.0 st

下面是top 命令顯示的nice ./cpu_us 1 的程序資訊，NI 這一列就是nice 值，其值為 10。

PIDUSERPRNIVIRTRESSHR S%CPU %MEMTIME+ COMMAND
6905 linjinhe301023024844700 S 100.00.00:03.06 cpu_us

下面是./cpu_us 1 的程序資訊，其值為 0。

PIDUSERPRNIVIRTRESSHR S%CPU %MEMTIME+ COMMAND
6901 linjinhe20023024844700 S 100.00.00:12.36 cpu_us

理解 sy

一般情況下，如果sy 過高，說明程式呼叫 Linux 系統呼叫的開銷很大。這個也可以簡單寫個程式驗證一下。

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>

void* NoopWorker(void* arg)
{
return nullptr;
}

void* CpuSyWorker(void* arg)
{
while (true)
{
pthread_t tid;
assert(pthread_create(&tid, nullptr, NoopWorker, nullptr) == 0);
assert(pthread_detach(tid) == 0);
}
}

void CpuSy(int n)
{
std::vector<pthread_t> pthreads(n); 
for (int i = 0; i < n; i++)
{
assert(pthread_create(&pthreads[i], nullptr, CpuSyWorker, nullptr) == 0);
}
for (const auto& tid : pthreads)
{
assert(pthread_join(tid, nullptr) == 0);
}
}

int main(int argc, char** argv)
{
if (argc != 2)
{
fprintf(stderr, "Usage: %s threads\n", argv[0]);
return -1;
}

CpuSy(std::stoi(argv[1]));
}

測試結果：

./cpu_sy 1
%Cpu(s):8.8 us, 59.3 sy,0.0 ni, 31.3 id,0.0 wa,0.0 hi,0.6 si,0.0 st

大量的系統呼叫讓sy 飆升。不同的系統呼叫開銷不一樣，pthread_create 的開銷比較大。

理解 wa

wa 這一項，連相關的Linux man-pages 都說它不太靠譜 。所以千萬不要看到wa 很高就覺得系統的 I/O 有問題。

The CPU will not wait for I/O to complete; iowait is the time that a task is waiting for I/O to complete. When a CPU goes into idle state for outstanding task I/O, another task will be scheduled on this CPU.
On a multi-core CPU, the task waiting for I/O to complete is not running on any CPU, so the iowait of each CPU is difficult to calculate.
The value in this field may decrease in certain conditions.

說一下我的理解：

假設有個單核的系統。CPU 並不會真的“死等” I/O。此時的 CPU 實際是 idle 的，如果有其它程序可以執行，則執行其它程序，此時 CPU 時間就不算入 iowait。如果此時系統沒有其它程序需要執行，則 CPU 需要“等”這次 I/O 完成才可以繼續執行，此時“等待”的時間算入 iowait。
對於多核系統，如果有 iowait，要算給哪個CPU？這是個問題。
wa 高，不能說明系統的 I/O 有問題。如果整個系統只有簡單任務不停地進行 I/O，此時的wa 可能很高，而系統磁碟的 I/O 也遠遠沒達到上限。
wa 低，也不能說明系統的 I/O 沒問題。假設機器進行大量的 I/O 任務把磁碟頻寬打得慢慢的，同時還有計算任務把 CPU 也跑得滿滿的。此時wa 很低，但系統 I/O 壓力很大。

#include <pthread.h>
#include <stdio.h>
#include <assert.h>
#include <vector>
#include <string>
#include <fcntl.h>
#include <unistd.h>

void* CpuWaWorker(void* arg)
{
std::string filename = "test_" + std::to_string(pthread_self());
int fd = open(filename.c_str(), O_CREAT | O_WRONLY);
assert(fd > 0);
while (true)
{
assert(write(fd, filename.c_str(), filename.size()) > 0);
assert(write(fd, "\n", 1) > 0);
assert(fsync(fd) == 0);
}
return nullptr;
}

void CpuWa(int n)
{
std::vector<pthread_t> pthreads(n);
for (int i = 0; i < n; i++)
{
assert(pthread_create(&pthreads[i], nullptr, CpuWaWorker, nullptr) == 0);
}

for (const auto& tid : pthreads)
{
assert(pthread_join(tid, nullptr));
}
}

int main(int argc, char** argv)
{
if (argc != 2)
{
fprintf(stderr, "Usage: %s threads\n", argv[0]);
return -1;
}
CpuWa(std::stoi(argv[1]));
return 0;
}

./cpu_wa 10
%Cpu(s):0.3 us,6.3 sy,0.0 ni, 50.0 id, 41.1 wa,0.0 hi,2.3 si,0.0 s

在上面這個例子中，我用多個執行緒不停地進行小 I/O 把wa 的值刷上去了，但是其實佔用的 I/O 頻寬很小，我的測試機是 SSD 的，此時的 I/O 壓力並不大。

再看一個例子：

./cpu_wa 10
./cpu_us 3
%Cpu(s): 75.3 us,3.5 sy,0.0 ni,8.2 id, 10.3 wa,0.0 hi,2.7 si,0.0 st

可以看到，明明同樣執行了./cpu_wa 10 ， wa 竟然因為同時進行./cpu_us 3 而降下來！！！參考上面第 4 點。

理解 si 和 hi

系統呼叫會觸發軟中斷，所以在上面的一些例子執行時，si 也會有所變化，如：

./cpu_wa 10
%Cpu(s):0.3 us,6.3 sy,0.0 ni, 50.0 id, 41.1 wa,0.0 hi,2.3 si,0.0 s

網絡卡收到資料包後，網絡卡驅動會通過軟中斷通知 CPU。這裡用iperf 網路效能測試工具做一下實驗。

$ iperf -s -i 1# 服務端

$ iperf -c 192.168.1.4 -i 1 -t 60 # 客戶端，可以開幾個 terminal 執行多個客戶端，這樣 si 的變化才會比較明顯

%Cpu(s):1.7 us, 74.1 sy,0.0 ni,8.0 id,0.0 wa,0.0 hi, 16.2 si,0.0 st

硬體中斷的話，暫時找不到什麼測試方法，實際應用中應該也比較少遇到。

理解 st

st 和虛擬化相關，這裡說說我的理解。

利用虛擬化技術，一臺 32 CPU 核心的物理機，可以創建出幾十上百個單 CPU 核心的虛擬機器。這在公有云場景下，簡稱“超賣”。

大部分情況下，物理伺服器的資源有大量是閒置的。此時，“超賣”並不會造成明顯影響。

當很多虛擬機器的 CPU 壓力變大，此時物理機的資源明顯不足，就會造成各個虛擬機器之間相互競爭、相互等待。

st 就是用來衡量被 Hypervisor “偷去” 給其它虛擬機器使用的 CPU。這個值越高，說明這臺物理伺服器的資源競爭越激烈。

（雲廠商會不會把他們的核心給改了，把st 改成 0 不讓你發現這種情況？）

理解 id

CPU 空閒，感覺這個從應用層的角度沒什麼難理解的。

這裡推薦一篇文章What's a CPU to do when it has nothing to do? ，有興趣的可以看一下。

（本文完）