記錄一次gdb debug經歷
目錄
- 問題描述
- 檢視core檔案
- 使用gdb檢視core檔案
- 總結
問題描述
今天在寫程式碼時,執行時奔潰了。segment fault,而且是在程式退出main()函式後,才報的。
唯一的資訊是:Segmentation fault (core dumped)
簡直是一頭霧水。
檢視core檔案
系統預設是不會生成core檔案的,ulimit -c unlimited
把core檔案設為無限大。
使用gdb檢視core檔案
gdb ./example/sudoku_batch_test core
提示如下:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI___libc_free (mem=0x313030303030300a) at malloc.c:2951
2951 malloc.c: No such file or directory.
(gdb)
可以確定崩潰發生在malloc.c中。但是提示沒有malloc.c的原始碼。
首先安裝glibc的符號表,命令如下:
sudo apt-get install libc6-dbg
再來是安裝glibc的原始檔,命令如下:
sudo apt-get source libc6-dev
安裝完畢後在當前目錄下會多出一個glibc-2.23資料夾,該資料夾包含了glibc的原始碼。
原始碼準備就緒後,接著上面,在gdb命令提示符下輸入:
directory glibc-2.23/malloc/
將glibc-2.23/malloc/設為gdb原始碼搜尋目錄。結果如下:
warning: core file may not match specified executable file. [New LWP 24491] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./example/sudoku_batch_test ../example/test1000 127.0.0.1 1'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __GI___libc_free (mem=0x313030303030300a) at malloc.c:2951 2951 malloc.c: No such file or directory. (gdb) directory glibc-2.23/malloc/ Source directories searched: /root/work/melon/build/glibc-2.23/malloc:$cdir:$cwd (gdb)
現在我們就可以在gdb中檢視崩潰處的原始碼了,執行list
:
(gdb) l
warning: Source file is more recent than executable.
2946 if (mem == 0) /* free(0) has no effect */
2947 return;
2948
2949 p = mem2chunk (mem);
2950
2951 if (chunk_is_mmapped (p)) /* release mmapped memory. */
2952 {
2953 /* see if the dynamic brk/mmap threshold needs adjusting */
2954 if (!mp_.no_dyn_threshold
2955 && p->size > mp_.mmap_threshold
(gdb)
雖然知道了崩潰發生在2951行,但是貌似沒有更多有效的資訊。這時我想到了是不是可以看下函式的呼叫棧,或許會有資訊。
接著執行backtrace(或者bt):
(gdb) bt
#0 __GI___libc_free (mem=0x313030303030300a) at malloc.c:2951
#1 0x000000000048bc9d in melon::Coroutine::~Coroutine (this=0x1fc9120, __in_chrg=<optimized out>)
at /root/work/melon/src/Coroutine.cpp:56
#2 0x000000000048d099 in std::_Sp_counted_ptr<melon::Coroutine*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (
this=0x1fc8190) at /usr/include/c++/5/bits/shared_ptr_base.h:374
#3 0x00000000004630f1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1fc8190)
at /usr/include/c++/5/bits/shared_ptr_base.h:150
#4 0x0000000000461f32 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f07f4ff1770,
__in_chrg=<optimized out>) at /usr/include/c++/5/bits/shared_ptr_base.h:659
#5 0x00000000004749ed in std::__shared_ptr<melon::Coroutine, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
this=0x7f07f4ff1768, __in_chrg=<optimized out>) at /usr/include/c++/5/bits/shared_ptr_base.h:925
#6 0x0000000000474a39 in std::shared_ptr<melon::Coroutine>::~shared_ptr (this=0x7f07f4ff1768,
__in_chrg=<optimized out>) at /usr/include/c++/5/bits/shared_ptr.h:93
#7 0x00007f07f40915ff in __GI___call_tls_dtors () at cxa_thread_atexit_impl.c:155
#8 0x00007f07f4090f27 in __run_exit_handlers (status=0, listp=0x7f07f441b5f8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true) at exit.c:40
#9 0x00007f07f4091045 in __GI_exit (status=<optimized out>) at exit.c:104
#10 0x00007f07f4077837 in __libc_start_main (main=0x45f1c4 <main(int, char**)>, argc=4, argv=0x7ffcfb2ab218,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcfb2ab208)
at ../csu/libc-start.c:325
#11 0x000000000045ec89 in _start ()
這下問題找到了,首先線上程結束或者程式執行結束會呼叫__GI___call_tls_dtors函式來析構執行緒本地儲存。我確實用了thread_local關鍵字修飾Coroutine::Ptr變數。
從#1 0x000000000048bc9d in melon::Coroutine::~Coroutine
可知在melon::Coroutine類的解構函式中呼叫了free()導致奔潰。
這下問題基本明確了,我在Coroutine解構函式中會釋放stack_這個指標,
53 Coroutine::~Coroutine() {
54 LOG_DEBUG << "destroy coroutine:" << name_;
55 if (stack_) {
56 free(stack_);
57 }
58 }
有兩個建構函式,其中一個如下:
39 Coroutine::Coroutine()
40 :c_id_(++t_coroutine_id),
41 name_("Main-" + std::to_string(c_id_)),
42 cb_(nullptr),
43 state_(CoroutineState::INIT) {
44
45 if (getcontext(&context_)) {
46 LOG_ERROR << "getcontext: errno=" << errno
47 << " error string:" << strerror(errno);
58 }
59 }
因為大意犯了個非常低階的錯誤,這個建構函式沒有正確初始化statck_指標,將statck_初始化為nullptr後,問題就解決了。
總結
遇到這類問題,一般用gdb檢視core檔案都能定位到崩潰的位置,如果不是直接引發的,可以檢視函式呼叫棧,一般都能找到問題原