1. 程式人生 > >核心除錯神器SystemTap — 更多功能與原理(三)

核心除錯神器SystemTap — 更多功能與原理(三)

a linux trace/probe tool.

使用者空間

SystemTap探測使用者空間程式需要utrace的支援,3.5以上的核心版本預設支援。

對於3.5以下的核心版本,需要自己打相關補丁。

需要:

debugging information for the named program

utrace support in the kernel

(1) Begin/end

探測點:

程序/執行緒建立時

程序/執行緒結束時

process.begin

process("PATH").begin

process(PID).begin

process.thread.begin

process("PATH").thread.begin

process(PID).thread.begin

process.end

process("PATH").end

process(PID).end

process.thread.end

process("PATH").thread.end

process(PID).thread.end

(2) Syscall

探測點:

系統呼叫開始

系統呼叫返回

process.syscall

process("PATH").syscall

process(PID).syscall

process.syscall.return

process("PATH").syscall.return

process(PID).syscall.return

可用的程序上下文變數:

$syscall // 系統呼叫號

$argN ($arg1~$arg6) // 系統呼叫引數

$return // 系統呼叫返回值

(3) Function/statement

探測點:

函式入口處

函式返回處

檔案中某行

函式中的某個標籤

process("PATH").function("NAME")

process("PATH").statement("*@FILE.c:123")

process("PATH").function("*").return

process("PATH").function("myfunc").label("foo")

(4) Absolute variant

探測點:

程序的虛擬地址

process(PID).statement(ADDRESS).absolute

A non-symbolic probe point uses raw, unverified virtual addresses and provide no $variables.

The target PID parameter must identify a running process and ADDRESS must identify a valid instruction address.

This is a guru mode probe.

(5) Target process

探測點:

動態連結庫中的函式(比如glibc)

Target process mode (invoked with stap -c CMD or -x PID) implicitly restricts all process.* probes to the given child

process.

If PATH names a shared library, all processes map that shared library can be probed.

If dwarf debugging information is installed, try using a command with this syntax:

probe process("/lib64/libc-2.8.so").function("...") { ... }

(6) Instruction probes

探測點:

單條指令

指令塊

process("PATH").insn

process(PID).insn

process("PATH").insn.block

process(PID).insn.block

The .insn probe is called for every single-stepped instruction of the process described by PID or PATH.

The .insn.block probe is called for every block-stepped instruction of the process described by PID or PATH.

Using this feature will significantly slow process execution.

統計一個程序執行了多少條指令:

stap -e 'global steps; probe process("/bin/ls").insn {steps++}; probe end {printf("Total instruction: %d\n", steps)}' \

    -c /bin/ls

(7) 使用

gcc -g3 -o test test.c

stap -L 'process("./test").function("*")' // 顯示程式中的函式和變數

除錯等級:

Request debugging information and also use level to specify how much information. The default level is 2.

Level 0 produces no debug information at all. Thus, -g0 negates -g.

Level 1 produces minimal information, enough for making backtraces in parts of the program that you don't

plan to debug. This includes descriptions of functions and external variables, but no information about local

variables and no line numbers.

Level 3: includes extra information, such as all the macro definitions present in the program.

高階功能

(1) 自建指令碼庫

A tapset is just a script that designed for reuse by installation into a special directory.

Systemtap attempts to resolve references to global symbols (probes, functions, variables) that are not defined

within the script by a systematic search through the tapset library for scripts that define those symbols.

A user may give additional directories with the -I DIR option.

構建自己的庫:

1. 建立庫目錄mylib,新增兩個庫檔案

time-default.stp

function __time_value() {
	return gettimeofday_us()
}

time-common.stp

global __time_vars

function timer_begin(name) {
	__time_vars[name] = __time_value()
}

function timer_end(name) {
	return __time_value() - __time_vars[name]
}

2. 編寫應用指令碼

tapset-time-user.stp

probe begin {
	timer_begin("bench")
	for(i=0; i<1000; i++) ;
	printf("%d cycles\n", timer_end("bench"))
	exit()
}

3. 執行

stap -I mylib/ tapset-time-user.stp

(2) 探測點重新命名

主要用於在探測點之上提供一個抽象層。

Probe point aliases allow creation of new probe points from existing ones.

This is useful if the new probe points are named to provide a higher level of abstraction.

格式:

probe new_name = existing_name1, existing_name2[, ..., existing_nameN]

{

    prepending behavior

}

例項:

probe syscallgroup.io = syscall.open, syscall.close,
	  	     syscall.read, syscall.write
{
	groupname = "io"
}

probe syscallgroup.process = syscall.fork, syscall.execve
{
	groupname = "process"
}

probe syscallgroup.*
{
	groups[execname() . "/" . groupname]++
}

global groups

probe end
{
	foreach (eg in groups+)
		printf("%s: %d\n", eg, groups[eg])
}


(3) 嵌入C程式碼

SystemTap provides an "escape hatch" to go beyond what the language can safely offer.

嵌入的C程式碼段用%{和%}括起來,執行指令碼時要加-g選項。

提供一個THIS巨集,可以用於獲取函式引數和儲存函式返回值。

例項:

%{
#include <linux/sched.h>
#include <linux/list.h>
%}

function process_list()
%{
	struct task_struct *p;
	struct list_head *_p, *_n;

	printk("%-20s%-10s\n", "program", "pid");

	list_for_each_safe(_p, _n, &current->tasks) {
		p = list_entry(_p, struct task_struct, tasks);
		printk("%-20s%-10d\n", p->comm, p->pid);
	}
%}

probe begin {
	process_list()
	exit()
}

stap -g embeded-c.stp

dmesg可看到打印出的所有程序。

C程式碼用%{ ... %}括起來,可以是獨立的一個段,可以作為函式的一部分,也可以只是一個表示式。

(4) 已有指令碼庫

SystemTap預設提供了非常強大的指令碼庫,主要類別如下:

Context Functions

Timestamp Functions

Time utility functions

Shell command functions

Memory Tapset

Task Time Tapset

Secheduler Tapset

IO Scheduler and block IO Tapset

SCSI Tapset

TTY Tapset

Interrupt Request (IRQ) Tapset

Networking Tapset

Socket Tapset

SNMP Information Tapset

Kernel Process Tapset

Signal Tapset

Errno Tapset

Device Tapset

Directory-entry (dentry) Tapset

Logging Tapset

Queue Statistics Tapset

Random functions Tapset

String and data retrieving functions Tapset

String and data writing functions Tapset

Guru tapsets

A collection of standard string functions

Utility functions for using ansi control chars in logs

SystemTap Translator Tapset

Network File Storage Tapsets

Speculation

實現原理

(1) SystemTap指令碼的執行流程

pass1

During the parsing of the code, it is represented internally in a parse tree.

Preprocessing is performed during this step, and the code is checked for semantic and syntax errors.

pass2

During the elaboration step, the symbols and references in the SystemTap script are resolved.

Also, any tapsets that are referenced in the SystemTap script are imported.

Debug data that is read from the DWARF(a widely used, standardized debugging data format) information,

which is produced during kernel compilation, is used to find the addresses for functions and variables

referenced in the script, and allows probes to be placed inside functions.

pass3

Takes the output from the elaboration phase and converts it into C source code.

Variables used by multiple probes are protected by locks. Safety checks, and any necessary locking, are

handled during the translation. The code is also converted to use the Kprobes API for inserting probe points

into the kernel.

pass4

Once the SystemTap script has been translated into a C source file, the code is compiled into a module that

can be dynamically loaded and executed in the kernel.

pass5

Once the module is built, SystemTap loads the module into the kernel.

When the module loads, an init routine in the module starts running and begins inserting probes into their

proper locations. Hitting a probe causes execution to stop while the handler for that probe is called.

When the handler exits, normal execution continues. The module continues waiting for probes and executing

handler code until the script exits, or until the user presses Ctrl-c, at which time SystemTap removes the

probes, unloads the module, and exits.

Output from SystemTap is transferred from the kernel through a mechanism called relayfs, and sent to STDOUT.

(2) 從使用者空間和核心空間來看SystemTap指令碼的執行

(3) kprobes

斷點指令(breakpoint instruction):__asm INT 3,機器碼為CC。

斷點中斷(INT3)是一種軟中斷,當執行到INT 3指令時,CPU會把當時的程式指標(CS和EIP)壓入堆疊儲存起來,

然後通過中斷向量表呼叫INT 3所對應的中斷例程。

INT是軟中斷指令,中斷向量表是中斷號和中斷處理函式地址的對應表。

INT 3即觸發軟中斷3,相應的中斷處理函式的地址為:中斷向量表地址 + 4 * 3。

A Kprobe is a general purpose hook that can be inserted almost anywhere in the kernel code.

To allow it to probe an instruction, the first byte of the instruction is replaced with the breakpoint

instruction for the architecture being used. When this breakpoint is hit, Kprobe takes over execution,

executes its handler code for the probe, and then continues execution at the next instruction.

(4) 依賴的核心特性

kprobes/jprobes

return probes

reentrancy

colocated (multiple)

relayfs

scalability (unlocked handlers)

user-space probes