1. 程式人生 > >誰記錄了mysql error log中的超長資訊

誰記錄了mysql error log中的超長資訊

【問題】

最近檢視MySQL的error log檔案時,發現有很多伺服器的檔案中有大量的如下日誌,內容很長(大小在200K左右),從記錄的內容看,並沒有明顯的異常資訊。

有一臺測試伺服器也有類似的問題,為什麼會記錄這些資訊,是誰記錄的這些資訊,分析的過程比較周折。

Status information:

Current dir:

Running threads: 2452  Stack size: 262144

Current locks:

lock: 0x7f783f5233f0:

 

Key caches:

default

Buffer_size:       8388608

Block_size:           1024

Division_limit:        100

Age_limit:             300

blocks used:            10

not flushed:             0

w_requests:           6619

writes:                  1

r_requests:         275574

reads:                1235

 

handler status:

read_key:   32241480828

read_next:  451035381896

read_rnd     149361175

read_first:    1090473

write:      4838429521

delete        12155820

update:     3331297842

 

【分析過程】 

1、首先在官方文件中查到,當mysqld程序收到SIGHUP訊號量時,就會輸出類似的資訊,

On Unix, signals can be sent to processes. mysqld responds to signals sent to it as follows:

SIGHUP causes the server to reload the grant tables and to flush tables, logs, the thread cache, and the host cache. These actions are like various forms of the FLUSH statement. The server also writes a status report to the error log that has this format:

https://dev.mysql.com/doc/refman/5.6/en/server-signal-response.html

 

2、有別的程式在kill mysqld程序嗎,用systemtap指令碼監控kill命令

probe nd_syscall.kill

{

        target[tid()] = uint_arg(1);

        signal[tid()] = uint_arg(2);

}

 

probe nd_syscall.kill.return

{

        if (target[tid()] != 0) {

                printf("%-6d %-12s %-5d %-6d %6d\n", pid(), execname(),

                    signal[tid()], target[tid()], int_arg(1));

                delete target[tid()];

                delete signal[tid()];

        }

}

 

用下面命令測試,確實會在error log中記錄日誌

kill -SIGHUP 12455

 

從systemtap的輸出看到12455就是mysqld程序,被kill掉了,訊號量是1,對應的就是SIGHUP

不過在測試環境後面問題重現時,卻沒有抓到SIGHUP的訊號量。

 

FROM   COMMAND      SIG   TO     RESULT

17010  who          0     12153  1340429600

36681  bash         1     12455     642

 

3、看來並不是kill導致的,後面用gdb attach到mysqld程序上,在error log的三個入口函式sql_print_error,sql_print_warning,sql_print_information加上斷點

但是在問題重現時,程式並沒有停在斷點處

 

4、寫error log還有別的分支嗎,翻原始碼找到了答案,原來是通過mysql_print_status函式直接寫到error log中

 

void mysql_print_status()

{

  char current_dir[FN_REFLEN];

  STATUS_VAR current_global_status_var;

 

  printf("\nStatus information:\n\n");

  (void) my_getwd(current_dir, sizeof(current_dir),MYF(0));

  printf("Current dir: %s\n", current_dir);

  printf("Running threads: %u  Stack size: %ld\n",

         Global_THD_manager::get_instance()->get_thd_count(),

     (long) my_thread_stack_size);

  …

  puts("");

  fflush(stdout);

}

 

5、再次用gdb attach到mysqld程序上,在mysql_print_status函式上加斷點,在問題重現時,執行緒停在斷點處,通過ps的結果多次對比,判斷是pt-stalk工具執行時呼叫了mysql_print_status

 

6、從堆疊中看到dispatch_command呼叫了mysql_print_status,下面是具體的邏輯,當command=COM_DEBUG時就會執行到mysql_print_status

 

case COM_DEBUG:

    thd->status_var.com_other++;

    if (check_global_access(thd, SUPER_ACL))

      break;                /* purecov: inspected */

    mysql_print_status();

    query_logger.general_log_print(thd, command, NullS);

    my_eof(thd);

    break;

 

7、檢視pt-stalk的程式碼

 

if [ "$mysql_error_log" -a ! "$OPT_MYSQL_ONLY" ]; then

      log "The MySQL error log seems to be $mysql_error_log"

      tail -f "$mysql_error_log" >"$d/$p-log_error" &

      tail_error_log_pid=$!

 

      $CMD_MYSQLADMIN $EXT_ARGV debug

   else

      log "Could not find the MySQL error log"

 

在呼叫mysqladmin時使用了debug模式

debug         Instruct server to write debug information to log

 

8、在percona官網上搜到了相關的bug描述,目前bug還未修復,會在下個版本中3.0.13中修復。

https://jira.percona.com/browse/PT-1340

 

【解決方案】

定位到問題後,實際修復也比較簡單,將pt-stalk指令碼中$CMD_MYSQLADMIN $EXT_ARGV debug中的debug去掉就可以了,測試生效。

 

 

總結:

 

(1)  通過mysql_print_status函式直接寫到error log中 

(2)  執行mysqladmin debug

(3) 資源緊張,kill session等

 

 

 

Status information:

Current dir: /data/mysql/mysql3306/data/
Running threads: 7 Stack size: 262144
Current locks:
lock: 0x7fdcb0a44780:

lock: 0x7fdcaf0ea980:

lock: 0x1edb5a0:

..........

..........


Key caches:
default
Buffer_size: 8388608
Block_size: 1024
Division_limit: 100
Age_limit: 300
blocks used: 9
not flushed: 0
w_requests: 0
writes: 0
r_requests: 82
reads: 13


handler status:
read_key: 16981474
read_next: 33963080
read_rnd 6
read_first: 192
write: 21270
delete 0
update: 16981221

Table status:
Opened tables: 956
Open tables: 206
Open files: 13
Open streams: 0

Memory status:
<malloc version="1">
<heap nr="0">
<sizes>
<unsorted from="140586808432240" to="140585778669336" total="0" count="140585778669312"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<system type="current" size="0"/>
<system type="max" size="0"/>
<aspace type="total" size="0"/>
<aspace type="mprotect" size="0"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="0"/>
<system type="max" size="0"/>
<aspace type="total" size="0"/>
<aspace type="mprotect" size="0"/>
</malloc>

 

Events status:
LLA = Last Locked At LUA = Last Unlocked At
WOC = Waiting On Condition DL = Data Locked

Event scheduler status:
State : INITIALIZED
Thread id : 0
LLA : n/a:0
LUA : n/a:0
WOC : NO
Workers : 0
Executed : 0
Data locked: NO

Event queue status:
Element count : 0
Data locked : NO
Attempting lock : NO
LLA : init_queue:96
LUA : init_queue:104
WOC : NO
Next activation : never