1. 程式人生 > >使用systemtap除錯Linux核心

使用systemtap除錯Linux核心

最近在公司看Linux核心的nmi死鎖檢測功能的實現機制,當然,是因為它變了,所以我才看的,簡單來說就是在紅帽的某牛提交了一個核心patch:new nmi_watchdog using perf events,這個patch已經被合入到核心主線2.6.38版本,所以使用自該版本開始後核心的Linux系統,其/proc/interrupts顯示的中斷數不再按每秒1000次的頻率增長。關於new nmi_watchdog問題,本文不再多說,後續寫專篇文章,下面看使用systemtap除錯Linux核心的幾個案例,因為我最近就是通過這個手段來理解new nmi_watchdog的實現機制,相比利用printk或kgdb而言,使用systemtap更為簡單方便,效率也大大提高。

系統環境:

[[email protected] ~]# cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m

[[email protected] ~]# uname -a
Linux localhost.lenkydomain 3.6.11 #1 SMP Wed Feb 20 21:26:16 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
[[email protected] ~]# stap -V
Systemtap translator/driver (version 2.1/0.152, non-git sources)
Copyright © 2005-2012 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
enabled features: LIBRPM LIBSQLITE3 NSS BOOST_SHARED_PTR TR1_UNORDERED_MAP NLS

案例一,判斷函式的真實執行路徑,比如這個函式:

static inline void x86_assign_hw_event(struct perf_event *event,
				struct cpu_hw_events *cpuc, int i)
{
	struct hw_perf_event *hwc = &event->hw;
hwc->idx = cpuc->assign[i];
hwc->last_cpu = smp_processor_id();
hwc->last_tag = ++cpuc->tags[i];

if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) {

a-> hwc->config_base = 0;
hwc->event_base = 0;
} else if (hwc->idx >= INTEL_PMC_IDX_FIXED) {
b-> hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - INTEL_PMC_IDX_FIXED);
hwc->event_base_rdpmc = (hwc->idx - INTEL_PMC_IDX_FIXED) | 1<<30;
} else {
c-> hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu_event_addr(hwc->idx);
hwc->event_base_rdpmc = hwc->idx;
}
}

我想知道nmi_watchdog的perf event走的是路徑a?路徑b?還是路徑c?
以前利用printk的矬做法是修改這個函式,在a、b、c分別插入printk(“aaa\n”);、printk(“bbb\n”);、printk(“ccc\n”);,然後需要重新編譯核心,重啟機器跑一次邏輯,再根據列印結果做判斷。
利用systemtap的做法是在上面三個路徑上分別下探測點,並設定執行語句為列印相應的字串即可。
比如在上面的路徑b處下探測點:

[[email protected] ~]# stap -ve 'probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824"){printf("bbb\n")}'

在另一個終端觸發nmi_watchdog設定邏輯:

[[email protected] ~]# echo 0 > /proc/sys/kernel/nmi_watchdog ; echo 1 > /proc/sys/kernel/nmi_watchdog ; 

可以看到探測點被執行到,並且列印了相關資訊,根據列印的資訊來看,探測點被多次執行到:

[[email protected] ~]# stap -ve 'probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824"){printf("bbb\n")}'
Pass 1: parsed user script and 90 library script(s) using 190140virt/24872res/2788shr/22676data kb, in 170usr/10sys/178real ms.
Pass 2: analyzed script: 2 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 358304virt/79680res/18160shr/63552data kb, in 280usr/20sys/310real ms.
Pass 3: translated to C into "/tmp/stapGGJa26/stap_b1c9e6f6ba4bf7f4d0a8eb727add532b_1128_src.c" using 358304virt/79840res/18312shr/63552data kb, in 10usr/0sys/5real ms.
Pass 4: compiled C into "stap_b1c9e6f6ba4bf7f4d0a8eb727add532b_1128.ko" in 1210usr/290sys/1525real ms.
Pass 5: starting run.
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb

三處都下探測點的完整執行情況:

[[email protected] perf_study]# cat x86_assign_hw_event.stp 
probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:821"){printf("aaa\n")}
probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824"){printf("bbb\n")}
probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:828"){printf("ccc\n")}
[[email protected] perf_study]# stap x86_assign_hw_event.stp 
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb
bbb

另外一個常見的情況是要判斷某個函式在某個邏輯中是否被跑到,以前的做法也是在函式的入口處加上列印語句,然後跑一遍功能邏輯,看是否有資訊打印出來,現在利用systemtap可以這樣:

[[email protected] perf_study]# stap -e 'probe kernel.function("x86_assign_hw_event"){printf("test\n")}'
test
test
test
test
test
test
test
test

案例二,獲取函式呼叫堆疊,示例:

[[email protected] perf_study]# cat bt.stp 
#stap -v bt.stp schedule

probe kernel.function(@1){
print("----------------START-------------------------\n")
printf("In process [%s]\n", execname())
print_regs()
print_backtrace()
print("----------------END-------------------------\n")
exit()
}

[[email protected] perf_study]# stap -v bt.stp x86_assign_hw_event
Pass 1: parsed user script and 90 library script(s) using 190148virt/24884res/2784shr/22684data kb, in 170usr/10sys/179real ms.
Pass 2: analyzed script: 1 probe(s), 4 function(s), 2 embed(s), 0 global(s) using 344464virt/50624res/3872shr/47224data kb, in 500usr/290sys/788real ms.
Pass 3: using cached /root/.systemtap/cache/13/stap_13ea16365226db9619f3f14ab2a27efc_2536.c
Pass 4: using cached /root/.systemtap/cache/13/stap_13ea16365226db9619f3f14ab2a27efc_2536.ko
Pass 5: starting run.
----------------START-------------------------
In process [swapper/0]
RIP: ffffffff81024f40
RSP: ffff88022fc03d18 EFLAGS: 00000086
RAX: 0000000000000021 RBX: ffff88022fc0c6e0 RCX: 0000000000000000
RDX: ffff88022fc169f8 RSI: ffff88022fc03d7c RDI: ffff8802215efc00
RBP: ffff88022fc03d58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000003 R12: 0000000000000000
R13: ffff88022fc0c6e0 R14: ffff88022fc0c6e0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000419248 CR3: 0000000222468000 CR4: 00000000000007f0
0xffffffff81024f40 : x86_pmu_enable+0x140/0x280 [kernel]
0xffffffff8110c7eb : perf_pmu_enable+0x2b/0x40 [kernel]
0xffffffff810247a9 : x86_pmu_commit_txn+0xa9/0xb0 [kernel]
0xffffffff8110ee8a : group_sched_in+0x13a/0x170 [kernel]
0xffffffff8110fb3d : __perf_event_enable+0x29d/0x2e0 [kernel]
0xffffffff8110d038 : remote_function+0x48/0x60 [kernel]
0xffffffff810b2371 : generic_smp_call_function_single_interrupt+0xa1/0x100 [kernel]
0xffffffff810383f7 : smp_call_function_single_interrupt+0x27/0x40 [kernel]
0xffffffff815276ca : call_function_single_interrupt+0x6a/0x70 [kernel]
----------------END-------------------------
Pass 5: run completed in 10usr/30sys/4564real ms.

案例三,列印某程式碼路徑上的變數值:

[[email protected] perf_study]# stap -e 'probe kernel.statement("[email protected]/x86/kernel/cpu/perf_event.c:824"){printf("%d\n", $hwc->idx)}'
33
33
33
33
33
33
33
33

如果有這樣的提示錯誤:

[[email protected] perf_study]# stap -e 'probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824"){printf("%d\n", $hwc->idx)}'
semantic error: not accessible at this address (0xffffffff81024f8e, dieoffset: 0x343357): identifier '$hwc' at <input>:1:81
        source: probe kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824"){printf("%d\n", $hwc->idx)}
                                                                                                ^

Pass 2: analysis failed. Try again with another ‘–vp 01’ option.

那麼可以用-L選項看一下到底有哪些可用變數:

[[email protected] perf_study]# stap -L 'kernel.statement("*@arch/x86/kernel/cpu/perf_event.c:824")'
kernel.statement("[email protected]/x86/kernel/cpu/perf_event.c:824") $hwc:struct hw_perf_event* $i:int $cpuc:struct cpu_hw_events* $event:struct perf_event*
kernel.statement("[email protected]/x86/kernel/cpu/perf_event.c:824") $n_running:int $cpuc:struct cpu_hw_events* $added:int

可以看到,是因為*匹配到兩處(應該是巨集的開啟與否導致程式碼行號有移動,但個人暫不確定具體原因是否如此),因此可以明確使用“[email protected]/x86/kernel/cpu/perf_event.c:824”。

案例四,獲知函式指標的具體指向,比如獲取下面event_init函式指標的具體指向:

struct pmu *perf_init_event(struct perf_event *event)
{
	struct pmu *pmu = NULL;
	int idx;
	int ret;
idx = srcu_read_lock(&amp;pmus_srcu);

rcu_read_lock();
pmu = idr_find(&amp;pmu_idr, event-&gt;attr.type);
rcu_read_unlock();
if (pmu) {
	event-&gt;pmu = pmu;
	ret = pmu-&gt;event_init(event);
	if (ret)
		pmu = ERR_PTR(ret);
	goto unlock;
}

list_for_each_entry_rcu(pmu, &amp;pmus, entry) {
	event-&gt;pmu = pmu;
	ret = pmu-&gt;event_init(event);
	if (!ret)
		goto unlock;

	if (ret != -ENOENT) {
		pmu = ERR_PTR(ret);
		goto unlock;
	}
}
pmu = ERR_PTR(-ENOENT);

unlock:
srcu_read_unlock(&pmus_srcu, idx);

return pmu;

}

[[email protected] perf_study]# cat perf_init_event.stp 
probe kernel.statement("[email protected]/events/core.c:5892"){
    addr = sprintf("%p", $pmu->event_init);
    print("5892-Function name:\n")
    print_stack(addr)
}

probe kernel.statement("[email protected]/events/core.c:5900"){
    addr = sprintf("%p", $pmu->event_init);
    print("5900-Function name:\n")
    print_stack(addr)
}
[[email protected] perf_study]# stap -v perf_init_event.stp 
Pass 1: parsed user script and 90 library script(s) using 190092virt/24736res/2784shr/22628data kb, in 160usr/20sys/178real ms.
Pass 2: analyzed script: 2 probe(s), 7 function(s), 3 embed(s), 0 global(s) using 357076virt/62224res/3896shr/59836data kb, in 450usr/290sys/743real ms.
Pass 3: translated to C into "/tmp/stapsgSbwn/stap_8b6901af9c04b3e4907826ce793aca76_4839_src.c" using 354588virt/65020res/6820shr/59836data kb, in 170usr/10sys/175real ms.
Pass 4: compiled C into "stap_8b6901af9c04b3e4907826ce793aca76_4839.ko" in 2610usr/440sys/2892real ms.
Pass 5: starting run.
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]
5900-Function name:
 0xffffffff81025510 : x86_pmu_event_init+0x0/0x220 [kernel]

即指向的是函式x86_pmu_event_init()。

轉載請保留地址:http://www.lenky.info/archives/2013/02/2209http://lenky.info/?p=2209