應用與系統穩定性第四篇---單執行緒導致的空指標問題分析
一、概述
1.1 引言
一般的空指標問題, 往往是在多執行緒併發的情況下, 某個或多個臨界資源多執行緒併發讀寫導致異常的發生,但是下面的問題是發生在單執行緒之中,引起了system重啟。
1.2 錯誤Log
Process: system_server Build: xiaomi/vince/vince:7.1.2/N2G47H/7.10.17:user/release-keys 10-17 17:03:23.80015211669 E AndroidRuntime: *** FATAL EXCEPTION IN SYSTEM PROCESS: xx.fg 10-17 17:03:23.80015211669 E AndroidRuntime: java.lang.NullPointerException: Attempt to invoke virtual method 'boolean xx.app.AlertDialog.isChecked()' on a null object reference 10-17 17:03:23.80015211669 E AndroidRuntime:at com.xx.server.XxCompatModePackages$3.onClick(XxCompatModePackages.java:474) 10-17 17:03:23.80015211669 E AndroidRuntime:at com.xx.internal.app.AlertControllerImpl$ButtonHandler.handleMessage(SourceFile:178) 10-17 17:03:23.80015211669 E AndroidRuntime:at android.os.Handler.dispatchMessage(Handler.java:102) 10-17 17:03:23.80015211669 E AndroidRuntime:at android.os.Looper.loop(Looper.java:163) 10-17 17:03:23.80015211669 E AndroidRuntime:at android.os.HandlerThread.run(HandlerThread.java:61) 10-17 17:03:23.80015211669 E AndroidRuntime:at com.android.server.ServiceThread.run(ServiceThread.java:46)
1.3 初步分析
XxCompatModePackages.java中發生了NPE,立刻去看一下XxCompatModePackages.java中的程式碼。
465private void createDialog() { 466mAlertDialog = new AlertDialog.Builder(mContext) 467.setTitle(R.string.xx_screen_ratio_hint) 468.setMessage(R.string.xx_screen_ratio_hint_message) 469.setCheckBox(true, mContext.getResources() 470.getString(R.string.xx_screen_ratio_hint_dont_show_again)) 471.setPositiveButton(R.string.xx_screen_ratio_hint_ok, new DialogInterface.OnClickListener() { 472@Override 473public void onClick(DialogInterface dialog, int which) { 474Message.obtain(mHandler, MSG_DONT_SHOW_AGAIN, mAlertDialog.isChecked()).sendToTarget(); 475} 476}) 477.setNeutralButton(R.string.xx_screen_ratio_hint_go_to_settings, new DialogInterface.OnClickListener() { 478@Override 479public void onClick(DialogInterface dialog, int which) { 480Message.obtain(mHandler, MSG_DONT_SHOW_AGAIN, mAlertDialog.isChecked()).sendToTarget(); 481gotoMaxAspectSettings(); 482} 483}) 484.create(); 485 486mAlertDialog.setCanceledOnTouchOutside(false); 487mAlertDialog.getWindow().getAttributes().type = WindowManager.LayoutParams.TYPE_SYSTEM_DIALOG; 488 489mAlertDialog.setOnDismissListener(new DialogInterface.OnDismissListener() { 490@Override 491public void onDismiss(DialogInterface dialog) { 492mAlertDialog = null; 493} 494}); 495}
474行發生了NPE,那只有mAlertDialog為NULL了,搜尋mAlertDialog可以賦值為NULL的情況,mAlertDialog唯一被置空的機會是在mAlertDialog.setOnDismissListener的時候,且mAlertDialog用private修飾,也不存在外部修改的可能。
489mAlertDialog.setOnDismissListener(new DialogInterface.OnDismissListener() { 490@Override 491public void onDismiss(DialogInterface dialog) { 492mAlertDialog = null; 493} 494});
createDialog方法在handleOnAppLaunch中呼叫的
450private void handleOnAppLaunch(String packageName) { 451if (!isRestrictAspect(packageName) && getDefaultAspectType(packageName) == CustomizeUtil.TYPE_SUGGEST) { 452try { 453Slog.i(TAG, "launching suggest app: " + packageName); //輸出launching suggest app日誌 454if (mAlertDialog == null) { 455createDialog(); 456} 457 458mAlertDialog.show(); 459} catch (Exception e) { 460Slog.e(TAG, "error showing suggest dialog", e); 461} 462} 463}
mAlertDialog為null的時候就建立這個AlertDialog,不為null,直接show出來。注意每次在show的時候,都列印了“launching suggest app”這樣一行log。這個時候只有在去結合日誌分析FC的上下文了。
搜尋日誌中的launching suggest app
10-17 17:03:20.738 1521 1669 I XxCompatModePackages: launching suggest app: com.baidu.BaiduMap 10-17 17:03:20.811 1521 1738 D BaseXxPhoneWindowManager: keyCode:117 down:true eventTime:761166 downTime:761166 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:20.812 1521 1738 D BaseXxPhoneWindowManager: keyCode:117 down:false eventTime:761168 downTime:761168 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.018 1521 1738 D BaseXxPhoneWindowManager: keyCode:19 down:true eventTime:761373 downTime:761373 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.020 1521 1738 D BaseXxPhoneWindowManager: keyCode:19 down:false eventTime:761375 downTime:761375 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.121 1521 1738 D BaseXxPhoneWindowManager: keyCode:95 down:true eventTime:761476 downTime:761476 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.122 1521 1738 D BaseXxPhoneWindowManager: keyCode:95 down:false eventTime:761477 downTime:761477 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.226 1521 1738 I SplashScreenServiceDelegate: Empty check list, check all 10-17 17:03:21.228 1521 1738 I SplashScreenServiceDelegate: requestSplashScreen 2ms, com.xx.xxbbs 10-17 17:03:21.229 1521 1738 I ActivityManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.xx.xxbbs/.activity.WelcomeActivity} from uid 0 on display 0 10-17 17:03:21.238 2541 2776 I WtProcessController: mCurTask:13 10-17 17:03:21.275 1521 1737 I ActivityManager: Start proc 7289:com.xx.xxbbs/u0a120 for activity com.xx.xxbbs/.activity.WelcomeActivity caller=null 10-17 17:03:21.363 1521 1737 D BaseXxPhoneWindowManager: keyCode:22 down:true eventTime:761718 downTime:761718 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.365 1521 1737 D BaseXxPhoneWindowManager: keyCode:22 down:false eventTime:761720 downTime:761720 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.383 2541 2776 I WtProcessController: MOVE TO FOREGROUND: com.xx.xxbbs 10120 10-17 17:03:21.384 2541 2776 I StatusController: Last foreground:com.baidu.BaiduMap uid:10136 Current foreground:com.xx.xxbbs uid:10120 10-17 17:03:21.384 2541 2776 I WtProcessController: FOREGROUND INFO: name=com.xx.xxbbs uid=10120 pid=7289 TaskId:13 10-17 17:03:21.413 1521 1538 V UidProcStateHelper: process state changed:[7289,10120,2] 10-17 17:03:21.425 1521 1537 I ActiveServicesInjector: Low priority start of: ServiceRecord{1bce22 u0 com.xiaomi.market/.service.AppActiveStatService} 10-17 17:03:21.574 1521 1738 I SplashScreenServiceDelegate: Empty check list, check all 10-17 17:03:21.576 1521 1738 I SplashScreenServiceDelegate: requestSplashScreen 2ms, com.xiaomi.scanner 10-17 17:03:21.576 1521 1738 I ActivityManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.xiaomi.scanner/.app.ScanActivity} from uid 0 on display 0 10-17 17:03:21.583 2541 2776 I WtProcessController: mCurTask:14 10-17 17:03:21.695 1521 1738 D BaseXxPhoneWindowManager: keyCode:136 down:true eventTime:762050 downTime:762050 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.697 1521 1738 D BaseXxPhoneWindowManager: keyCode:136 down:false eventTime:762052 downTime:762052 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.798 1521 1738 D BaseXxPhoneWindowManager: keyCode:20 down:true eventTime:762153 downTime:762153 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.799 1521 1738 D BaseXxPhoneWindowManager: keyCode:20 down:false eventTime:762154 downTime:762154 policyFlags:2b000000 deviceId:-1 isScreenOn:true keyguardActive:false repeatCount:0 10-17 17:03:21.928 1521 1669 I XxCompatModePackages: launching suggest app: com.xx.xxbbs
發現了不少"launching suggest app",有兩次時間相差約1秒(第一條日誌和最後一條日誌),也就是說連續兩次呼叫了handleOnAppLaunch方法?初步猜測是第一次建立了mAlertDialog之後,monkey點選了取消,觸發了setOnDismissListener,但是onDismiss還沒有走完(通過Handler傳送Msg),緊接著第二次handleOnAppLaunch呼叫,這時候mAlertDialog不為null,不用建立,直接show出來,使用者再次點選對話方塊的button觸發setPositiveButton方法,此時第一次onDismiss剛好走完,發生了NP。但是我又想一想,雖然是前後彈了兩次Dialog,但是Dialog的建立和dismiss是在同一個執行緒中,主線的Message是一個接著一個處理的,理論上不會發生上面的問題。
二、進一步分析
嘗試寫復現DEMO,很簡單,我在onCreate中建立一個Dialog,然後在取消按鈕中把mAlertDialog賦值為null。
protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); createDialog(); } private void createDialog() { mAlertDialog = new AlertDialog.Builder(MainActivity.this) .setTitle("title") .setMessage("message") .setPositiveButton("確定", new DialogInterface.OnClickListener(){ @Override public void onClick(DialogInterface dialog, int which) { boolean showing = mAlertDialog.isShowing(); Log.d("MainActivity","showing"+showing); } }).setNegativeButton("取消", new DialogInterface.OnClickListener() { @Override public void onClick(DialogInterface dialog, int which) { mAlertDialog = null; } }).create(); mAlertDialog.show(); }
如果是上面的程式碼,即使在同一個執行緒中也可能出現空指標的情況,現在先按住取消按鈕不放,在按住確定按鈕,然後兩個同時鬆開,就可能出現空指標,基本上是一個必現的問題。繼續按照XxCompatModePackages.java裡面的邏輯寫一個。
void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); createDialog(); } private void createDialog() { mAlertDialog = new AlertDialog.Builder(MainActivity.this) .setTitle("title") .setMessage("message") .setPositiveButton("確定", new DialogInterface.OnClickListener(){ @Override public void onClick(DialogInterface dialog, int which) { boolean showing = mAlertDialog.isShowing(); Log.d("MainActivity","showing"+showing); } }).setNegativeButton("取消", new DialogInterface.OnClickListener() { @Override public void onClick(DialogInterface dialog, int which) { Log.d("MainActivity","取消"); boolean showing = mAlertDialog.isShowing(); } }).setOnDismissListener(new DialogInterface.OnDismissListener() { @Override public void onDismiss(DialogInterface dialog) { Log.d("MainActivity","onDismiss"); mAlertDialog = null; } }).create(); mAlertDialog.show(); }
這次怎麼也復現不了了,還是先看下程式碼吧。
三、深入分析
現在梳理一下Dialog的setPositiveButton和setNegativeButton流程,以setPositiveButton為例。
609public Builder setPositiveButton(@StringRes int textId, final OnClickListener listener) { 610P.mPositiveButtonText = P.mContext.getText(textId); 611P.mPositiveButtonListener = listener; 612return this; 613}
P是AlertParams物件,將所設定的文字和listener儲存在物件P的mPositiveButtonText和mPositiveButtonListener中,什麼時候會用呢,Dialog在onCreate中會呼叫apply,apply會來取這個引數。
996public void apply(AlertController dialog) { 997..... 1016if (mPositiveButtonText != null) { 1017dialog.setButton(DialogInterface.BUTTON_POSITIVE, mPositiveButtonText, 1018mPositiveButtonListener, null); 1019} 1020if (mNegativeButtonText != null) { 1021dialog.setButton(DialogInterface.BUTTON_NEGATIVE, mNegativeButtonText, 1022mNegativeButtonListener, null); 1023} 1024if (mNeutralButtonText != null) { 1025dialog.setButton(DialogInterface.BUTTON_NEUTRAL, mNeutralButtonText, 1026mNeutralButtonListener, null); 1027} ..... 1054} 1055
繼續看setButton的實現
339public void setButton(int whichButton, CharSequence text, 340DialogInterface.OnClickListener listener, Message msg) { 341 342if (msg == null && listener != null) { 343msg = mHandler.obtainMessage(whichButton, listener); 344} 345 346switch (whichButton) { 347 348case DialogInterface.BUTTON_POSITIVE: 349mButtonPositiveText = text; 350mButtonPositiveMessage = msg; 351break; 352 353case DialogInterface.BUTTON_NEGATIVE: 354mButtonNegativeText = text; 355mButtonNegativeMessage = msg; 356break; 357 358case DialogInterface.BUTTON_NEUTRAL: 359mButtonNeutralText = text; 360mButtonNeutralMessage = msg; 361break; 362 363default: 364throw new IllegalArgumentException("Button does not exist"); 365} 366}
將文字text和msg儲存在mButtonNeutralText和mButtonNeutralMessage中,最終用他們構造了一個監聽器mButtonHandler。
125private final View.OnClickListener mButtonHandler = new View.OnClickListener() { 126@Override 127public void onClick(View v) { 128final Message m; 129if (v == mButtonPositive && mButtonPositiveMessage != null) { 130m = Message.obtain(mButtonPositiveMessage); 131} else if (v == mButtonNegative && mButtonNegativeMessage != null) { 132m = Message.obtain(mButtonNegativeMessage); 133} else if (v == mButtonNeutral && mButtonNeutralMessage != null) { 134m = Message.obtain(mButtonNeutralMessage); 135} else { 136m = null; 137} 138 139if (m != null) { 140m.sendToTarget(); 141} 142 143// Post a message so we dismiss after the above handlers are executed 144mHandler.obtainMessage(ButtonHandler.MSG_DISMISS_DIALOG, mDialogInterface) 145.sendToTarget(); 146} 147};
無論是點選確定按鈕還是取消按鈕都需要設定這個監聽器mButtonHandler,因為Dialog在show出來的時候,Dialog解析xml佈局之後,裡面的確定按鈕和取消按鈕的監聽器就會被設定mButtonHandler。並且最後還發送一個型別為MSG_DISMISS_DIALOG的msg,為了在點選確定、取消、中立按鈕時候能夠預設關閉對話方塊。
梳理到這裡就明白了上面DEMO1發生FC的原因,在我同時鬆開取消按鈕與確定按鈕的時候,事實上這個時候執行緒的MessageQueue裡面有兩個msg沒有被處理,分別是取消按鈕對應的Message和確定按鈕對應的Message。當Looper來取這個訊息的時候,先取的是取消按鈕對應的Message,處理完後在取確定按鈕對應的Message,但是我們取消按鈕對應的監聽器回撥中把mAlertDialog賦值為null了,所以出現了NPE。那麼對於DEMO2,如果存在訊息佇列進入了三個Message,包括Dialog確定按鈕 onClick與取消按鈕onClick對應的Message,以及onDismiss 對應的Message,並且onDismiss對應的Message在兩個onClick 對應的Message之間,那麼也可能產生NullPointerException。基於手動難以復現的情況下,只有跑monkey了,結果每次都很快出現了NullPointerException。
// CRASH: onekeymonkey.wangjing.com.onekeymonkey (pid 11918) // Short Msg: java.lang.NullPointerException // Long Msg: java.lang.NullPointerException: Attempt to invoke virtual method 'boolean android.app.Dialog.isShowing()' on a null object reference // Build Label: Xiaomi/jason/jason:7.1.1/NMF26X/7.11.24:user/release-keys // Build Changelist: 7.11.24 // Build Time: 1511465058000 // java.lang.NullPointerException: Attempt to invoke virtual method 'boolean android.app.Dialog.isShowing()' on a null object reference // at onekeymonkey.wangjing.com.onekeymonkey.MainActivity$4.onClick(MainActivity.java:82) // at com.android.internal.app.AlertController$ButtonHandler.handleMessage(AlertController.java:166) // at android.os.Handler.dispatchMessage(Handler.java:102) // at android.os.Looper.loop(Looper.java:163) // at android.app.ActivityThread.main(ActivityThread.java:6384) // at java.lang.reflect.Method.invoke(Native Method) // at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:901) // at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:791) // ** Monkey aborted due to error.
四、結論與修復
以後出現了空指標的情況,在一個執行緒的情況下,需要確定一下是否使用了Handler訊息機制,對於我們這個問題,知道了rootcase,那麼這裡直接判斷空就可以了。