系統技術非業餘研究 » Linux TASK_IO_ACCOUNTING功能以及如何使用

阿新 • • 發佈：2019-01-13

在過去我們瞭解系統IO的情況大多數是通過iostat來獲取的，這個粒度只能精確到每個裝置。通常我們會想了解每個程序,執行緒層面發起了多少IO，在Linux 2.6.20之前除了用systemtap這樣的工具來實現是沒有其他方法的，因為系統沒有暴露這方面的統計。 disktop per裝置per應用層面的IO讀寫統計，可以參考我之前寫的，見這裡.

透過lxr的程式碼確認，在Linux 2.6.20以後引入了TASK_IO_ACCOUNTING功能，通過把每個執行緒和程序的io活動通過/proc/pid/io匯出大大方便了使用者，這裡需要注意的是RHEL 5U4基於2.6.18核心但是他們backport了這個功能，並由此催生了相應的瞭解per程序Io活動的工具如pidstat和iotop, 這兩個軟體工作的時候截圖如下：

pidstat可以看到帶層次執行緒IO活動

iotop能看到扁平執行緒IO活動

通過strace來了解到這二個軟體關於IO活動部分輸入源都是/proc/pid/io，讓我們來了解下這個檔案：

# cat /proc/self/io
rchar: 1956
wchar: 0
syscr: 7
syscw: 0
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0

這個檔案後三個引數是IO記賬功能新新增的，我們來了解下他們的意義,摘抄從man pidstat:

kB_rd/s
Number of kilobytes the task has caused to be read from disk per second.

kB_wr/s
Number of kilobytes the task has caused, or shall cause to be written to disk per second.

kB_ccwr/s
Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty page-
cache. In this case, some IO which another task has been accounted for will not be happening.

接著我們再來看下核心如何統計這三個值的，在RHEL 5U4原始碼數下簡單的grep下：

[linux-2.6.18.x86_64]$ grep -rin task_io_account_ .
./block/ll_rw_blk.c:3286:               task_io_account_read(bio->bi_size);
./include/linux/task_io_accounting_ops.h:8:static inline void task_io_account_read(size_t bytes)
./include/linux/task_io_accounting_ops.h:13:static inline void task_io_account_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:18:static inline void task_io_account_cancelled_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:30:static inline void task_io_account_read(size_t bytes)
./include/linux/task_io_accounting_ops.h:34:static inline void task_io_account_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:38:static inline void task_io_account_cancelled_write(size_t bytes)
./fs/direct-io.c:671:           task_io_account_write(len);
./fs/cifs/file.c:2221:                  task_io_account_read(bytes_read);
./fs/buffer.c:965:                              task_io_account_write(PAGE_CACHE_SIZE);
./fs/buffer.c:3400:                     task_io_account_cancelled_write(PAGE_CACHE_SIZE);
./mm/truncate.c:47:             task_io_account_cancelled_write(PAGE_CACHE_SIZE);
./mm/page-writeback.c:649:                                      task_io_account_write(PAGE_CACHE_SIZE);
./mm/readahead.c:180:           task_io_account_read(PAGE_CACHE_SIZE);

可以看出統計力度還是比較粗的。

同時Io記賬相關的proc匯出位於 fs/proc/base.c:

#ifdef CONFIG_TASK_IO_ACCOUNTING
static int do_io_accounting(struct task_struct *task, char *buffer, int whole)
{
 ...  
        return sprintf(buffer,
                        "rchar: %llu\n"
                        "wchar: %llu\n"
                        "syscr: %llu\n"
                        "syscw: %llu\n"
                        "read_bytes: %llu\n"
                        "write_bytes: %llu\n"
                        "cancelled_write_bytes: %llu\n",
                        rchar, wchar, syscr, syscw,
                        ioac.read_bytes, ioac.write_bytes,
                        ioac.cancelled_write_bytes);
}

簡單的分析了下TASK_IO_ACCOUNTING運作方式，對了解每個程序的IO活動還是很有幫助的。另外再羅嗦下在RHEL 5U4是可以用這個功能的。

./configs/kernel-2.6.18-x86_64-xen.config:43:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64.config:45:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64-debug.config:45:CONFIG_TASK_DELAY_ACCT=y

預設這個特性是開的。

祝玩得開心！

後記： taskstats.c還支援netlink匯出任務的pid,tgid已經註冊和反註冊cpumask. Iotop用到了這個特性。

sendto(3, “\34\0\0\0\26\0\1\0\216\1\0\0\30\357\377\377\1\0\0\0\10\0\1\0\324\5\0\0”, 28, 0, NULL, 0) = 28
recvfrom(3, “l\1\0\0\26\0\0\0\216\1\0\0\30\357\377\377\2\1\0\0X\1\4\0\10\0\1\0\324\5\0\0″…, 16384, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 364

謝謝 kinwin同學指出！

Post Footer automatically generated by wp-posturl plugin for wordpress.

系統技術非業餘研究 » Linux TASK_IO_ACCOUNTING功能以及如何使用

系統技術非業餘研究 » Linux TASK_IO_ACCOUNTING功能以及如何使用

系統技術非業餘研究 » Linux下新系統呼叫sync_file_range

系統技術非業餘研究 » Linux檔案預讀分析以及評估對系統的影響

系統技術非業餘研究 » Linux快取記憶體使用率調查

系統技術非業餘研究 » Linux下方便的socket讀寫檢視器（socktop）

系統技術非業餘研究 » Linux下誰在消耗我們的cache

系統技術非業餘研究 » Linux系統記憶體相關資訊獲取

系統技術非業餘研究 » Linux下誰在切換我們的程序

系統技術非業餘研究 » Linux下pstack的實現

系統技術非業餘研究 » Linux下試驗大頁面對映（MAP_HUGETLB）

系統技術非業餘研究 » Linux 2.6.38 User

系統技術非業餘研究 » Linux IO協議棧框圖

系統技術非業餘研究 » Linux下非同步IO(libaio)的使用以及效能

系統技術非業餘研究 » Linux下Fio和Blktrace模擬塊裝置的訪問模式

系統技術非業餘研究 » Linux下方便的塊裝置檢視工具lsblk

系統技術非業餘研究 » Linux Used記憶體到底哪裡去了？

系統技術非業餘研究 » Linux下pipe使用注意事項

系統技術非業餘研究 » Linux常用效能調優工具索引

系統技術非業餘研究 » Linux下如何知道檔案被那個程序寫

系統技術非業餘研究 » gen_tcp傳送緩衝區以及水位線問題分析

系統技術非業餘研究 » Linux TASK_IO_ACCOUNTING功能以及如何使用

相關推薦