1. 程式人生 > >ANR原因分析及解決方法

ANR原因分析及解決方法

內容目錄



1       ANR是如何產生的

1.1       ANR產生條件

ANR的產生需要同時滿足三個條件:

  • 主執行緒:只有應用程式程序的主執行緒響應超時才會產生ANR;
  • 超時時間:產生ANR的上下文不同,超時時間也不同,但只要超過這個時間上限沒有響應就會產生ANR;
  • 輸入事件/特定操作:輸入事件是指按鍵、觸屏等裝置輸入事件,特定操作是指BroadcastReceiver和Service的生命週期中的各個函式呼叫。

產生ANR的上下文不同,導致ANR原因也不同,主要有以下三種情況:

  • 應用程序的主執行緒對輸入事件在5s內沒有處理完畢;
  • 應用程序的主執行緒在執行BroadcastRecevier的onReceive函式時10s內沒有處理完畢;
  • 應用程序的主執行緒在執行Service的各個生命週期函式時20s內沒有處理完畢;
1.2       引起ANR的根本原因

引起ANR的根本原因,總的來說可以歸納為兩類:

  • 應用程序自身引起的,比如:主執行緒阻塞、掛起、死迴圈,執行耗時操作等;
  • 其他程序引起的,比如:其他程序CPU佔用率過高,導致當前應用程序無法搶佔到CPU時間片。常見的問題如檔案讀寫頻繁,io程序CPU佔用率過高,導致當前應用出現ANR;

2       ANR原因分析方法

2.1       通過logcat日誌分析原因

舉一個開發過程中的ANR例項,《從雲盤上下載視訊到本地,進入下載中心刪除任務和檔案,進入檔案管理提示‘檔案管理無響應’》,在logcat日誌中輸出了以下ANR資訊:

01-21 23:58:40.265   975   991 E ActivityManager: ANR in com.letv.filemanager

01-21 23:58:40.265   975   991 E ActivityManager: Reason: Executing service com.letv.filemanager/.netstorage.service.DlnaService

01-21 23:58:40.265   975   991 E ActivityManager: Load: 5.62 / 4.3 / 3.55

01-21 23:58:40.265   975   991 E ActivityManager: CPU usage from 30331ms to 0ms ago:

01-21 23:58:40.265   975   991 E ActivityManager:   26% 1760/com.letv.airplay: 24% user + 1.6% kernel / faults: 1094 minor

01-21 23:58:40.265   975   991 E ActivityManager:   15% 787/cdnclient: 0% user + 15% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   13% 298/kswapd0: 0% user + 13% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   9.5% 1437/com.letv.dmr: 9% user + 0.5% kernel / faults: 572 minor

01-21 23:58:40.265   975   991 E ActivityManager:   9.1% 953/flush-179:0: 0% user + 9.1% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   5.8% 788/collector: 0.8% user + 4.9% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   5% 419/mmcqd: 0% user + 5% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   2.9% 814/ld-linux.so.3: 1.1% user + 1.8% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   2.5% 822/dd: 0.2% user + 2.2% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   1.4% 2533/com.letv.threeScreen: 0.7% user + 0.7% kernel / faults: 134 minor

01-21 23:58:40.265   975   991 E ActivityManager:   0.5% 817/logcat: 0% user + 0.4% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0.4% 7/events/0: 0% user + 0.4% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0.4% 1254/com.letv.t2.launcher: 0.4% user + 0% kernel / faults: 20 minor

01-21 23:58:40.265   975   991 E ActivityManager:   0.2% 975/system_server: 0% user + 0.1% kernel / faults: 34 minor 1 major

01-21 23:58:40.265   975   991 E ActivityManager:   0.2% 1452/com.letv.downloads: 0.1% user + 0.1% kernel / faults: 40 minor

01-21 23:58:40.265   975   991 E ActivityManager:   0.1% 808/virtualkeypad: 0% user + 0.1% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0.1% 1292/RTW_CMD_THREAD: 0% user + 0.1% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 753/jbd2/mmcblk9-8: 0% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 8/events/1: 0% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 177/bdi-default: 0% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 786/letvmanager: 0% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 1472/com.letv.filemanager: 0% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0% 1737/com.letv.medialog: 0% user + 0% kernel / faults: 48 minor

01-21 23:58:40.265   975   991 E ActivityManager: 62% TOTAL: 7.2% user + 10% kernel + 43% iowait + 1% softirq

01-21 23:58:40.265   975   991 E ActivityManager: CPU usage from 3195ms to 3733ms later:

01-21 23:58:40.265   975   991 E ActivityManager:   15% 975/system_server: 3.7% user + 11% kernel / faults: 1 minor

01-21 23:58:40.265   975   991 E ActivityManager:     15% 991/ActivityManager: 3.7% user + 11% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   7.5% 1760/com.letv.airplay: 7.5% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:     5.6% 2615/SocketListener(: 5.6% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   5.5% 419/mmcqd: 0% user + 5.5% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   3.7% 298/kswapd0: 0% user + 3.7% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   3.7% 788/collector: 0% user + 3.7% kernel

01-21 23:58:40.265   975   991 E ActivityManager:     3.7% 866/collector: 0% user + 3.7% kernel

01-21 23:58:40.265   975   991 E ActivityManager:     1.8% 865/collector: 0% user + 1.8% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   3.7% 814/ld-linux.so.3: 1.8% user + 1.8% kernel

01-21 23:58:40.265   975   991 E ActivityManager:     1.8% 1009/814Linux hotplu: 1.8% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager:     1.8% 1013/IRQThread_157: 0% user + 1.8% kernel

01-21 23:58:40.265   975   991 E ActivityManager:   0.3% 1254/com.letv.t2.launcher: 0.3% user + 0% kernel / faults: 1 minor

01-21 23:58:40.265   975   991 E ActivityManager:     0.3% 1254/etv.t2.launcher: 0.3% user + 0% kernel

01-21 23:58:40.265   975   991 E ActivityManager: 54% TOTAL: 2.9% user + 5.9% kernel + 45% iowait

我們可以通過ActivityManagerService的appNotResponding方法看一下ANR資訊是如何記錄到logcat日誌中的:

.\framework\base\services\Java\com\Android\server\am\ActivityManagerService.java

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,

            ActivityRecord parent, boolean aboveSystem, final String annotation) {

        ......

        synchronized (this) {

            // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.

            if (mShuttingDown) {

                //重啟時忽略ANR

                Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);

                return;

            } else if (app.notResponding) {

                //之前已處理過則不再處理

                Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);

                return;

            } else if (app.crashing) {

                //已經Crash的程序則不再處理ANR

                Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);

                return;

            }

            //設定ANR標誌,用於防止重複處理

            // In case we come through here for the same app before completing

            // this one, mark as anring now so we will bail out.

            app.notResponding = true;

            // Log the ANR to the event log.

            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,

                    app.processName, app.info.flags, annotation);

            //將發生ANR的應用pid放入firstPids集合中

            // Dump thread traces as quickly as we can, starting with "interesting" processes.

            firstPids.add(app.pid);

            //將父PID也放入firstPids集合

            int parentPid = app.pid;

            if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid;

            if (parentPid != app.pid) firstPids.add(parentPid);

            //將主PID放入firstPids集合

            if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

            //將persistent程序放入firstPids集合

            for (int i = mLruProcesses.size() - 1; i >= 0; i--) {

                ProcessRecord r = mLruProcesses.get(i);

                if (r != null && r.thread != null) {

                    int pid = r.pid;

                    if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {

                        if (r.persistent) {

                            firstPids.add(pid);

                        } else {

                            lastPids.put(pid, Boolean.TRUE);

                        }

                    }

                }

            }

        }

        // 將ANR資訊寫入log

        // Log the ANR to the main log.

        StringBuilder info = new StringBuilder();

        info.setLength(0);

        info.append("ANR in ").append(app.processName);

        if (activity != null && activity.shortComponentName != null) {

            info.append(" (").append(activity.shortComponentName).append(")");

        }

        info.append("\n");

        info.append("PID: ").append(app.pid).append("\n");

        if (annotation != null) {

            info.append("Reason: ").append(annotation).append("\n");

        }

        if (parent != null && parent != activity) {

            info.append("Parent: ").append(parent.shortComponentName).append("\n");

        }

        final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,

                NATIVE_STACKS_OF_INTEREST);

        String cpuInfo = null;

        if (MONITOR_CPU_USAGE) {

            updateCpuStatsNow();

            synchronized (mProcessCpuThread) {

                cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);

            }

            info.append(processCpuTracker.printCurrentLoad());

            info.append(cpuInfo);

        }

        info.append(processCpuTracker.printCurrentState(anrTime));

        Slog.e(TAG, info.toString());

        ......

    }

AMS在發生ANR時,會按以下格式將ANR資訊記錄到logcat日誌中:

ActivityManager: ANR in [Process Name]([short component name])

ActivityManager: PID: [Application Pid]

ActivityManager: Reason: [Annotation]

ActivityManager: Parent: [short component name of parent]

[Process CPU state]

其中的Reason資訊可以給出ANR產生原因的一些詳細資訊,

·        輸入事件處理引起的ANR,提示資訊格式為:“Inputdispatching timed out [anr reason]

·        Service處理引起的ANR,提示資訊格式為:“Executingservice [Package name]/[Short class name]

·        Broadcast處理引起的ANR,提示資訊格式為:“Broadcast of [Intent focused byBroadcast receiver]

其中的Process CPU state 資訊格式如下:

ActivityManager: Load: [Load1] / [Load5] / [Load15]

ActivityManager: CPU usage from [上次取樣與現在的時間差] ms to [當前取樣與現在的時間差] ms ago/later:

ActivityManager: [總的CPU時間佔用率][PID]/[Process Name][使用者CPU時間佔用率]% user + [系統CPU時間佔用率]% kernel + [IO等待CPU時間佔用率]% iowait + [硬中斷CPU時間佔用率]% irq + [軟中斷CPU時間佔用率]% softirq / faults: [次要頁錯誤/主要頁錯誤]minor/major

ActivityManager: .....

ActivityManager: [CPU時間佔用率合計統計]

其中Load1, Load5, Load15分別為CPU 1分鐘平均任務負載數,5分鐘平均任務負載數,15分鐘平均任務負載數,平均任務負載數和CPU佔用率並沒有必然聯絡,可以作為參考資訊。可以通過分析各程序的CPU時間佔用率,來判斷是否為某些程序長期佔用CPU導致該程序無法獲取到足夠的CPU處理時間,而導致發生ANR。

這裡需要重點關注的是Load1,各程序總的CPU時間佔用率,使用者CPU時間佔用率,系統CPU時間佔用率,以及iowait CPU時間佔用率。

2.2       通過trace檔案分析原因

ActivityManagerService的appNotResponding方法在寫入logcat日誌的同時,還會將ANR時的stack trace資訊寫入到trace檔案,具體程式碼流程如下:

 final void appNotResponding(ProcessRecord app, ActivityRecord activity,

            ActivityRecord parent, boolean aboveSystem, final String annotation) {

 ......

          File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,

                NATIVE_STACKS_OF_INTEREST);

          ......

}

public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,

            ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {

        //判斷系統屬性,確認trace檔案路徑

        String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);

        if (tracesPath == null || tracesPath.length() == 0) {

            return null;

        }

        //若trace檔案不存在則建立,並設定可讀寫屬性

        File tracesFile = new File(tracesPath);

        try {

            File tracesDir = tracesFile.getParentFile();

            if (!tracesDir.exists()) {

                tracesFile.mkdirs();

                if (!SELinux.restorecon(tracesDir)) {

                    return null;

                }

            }

            FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1);  // drwxrwxr-x

            if (clearTraces && tracesFile.exists()) tracesFile.delete();

            tracesFile.createNewFile();

            FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw-

        } catch (IOException e) {

            Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e);

            return null;

        }

        //執行dump操作

        dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);

        return tracesFile;

    }

    private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids,

            ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {

        //設定file observer

        // Use a FileObserver to detect when traces finish writing.

        // The order of traces is considered important to maintain for legibility.

        FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {

            @Override

            public synchronized void onEvent(int event, String path) { notify(); }

        };

        try {

            observer.startWatching();

          //向firstPids程序集合傳送關閉訊號,虛擬機器接收到SIGNAL_QUIT訊號後會將程序中各個執行緒的函//數堆疊資訊輸出到traces.txt檔案中

            // First collect all of the stacks of the most important pids.

            if (firstPids != null) {

                try {

                    int num = firstPids.size();

                    for (int i = 0; i < num; i++) {

                        synchronized (observer) {

                            Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);

                            observer.wait(200);  // Wait for write-close, give up after 200msec

                        }

                    }

                } catch (InterruptedException e) {

                    Log.wtf(TAG, e);

                }

            }

            // Next collect the stacks of the native pids

            if (nativeProcs != null) {

                int[] pids = Process.getPidsForCommands(nativeProcs);

                if (pids != null) {

                    for (int pid : pids) {

                        Debug.dumpNativeBacktraceToFile(pid, tracesPath);

                    }

                }

            }

            //進行CPU佔用率取樣,並將佔用率排在前幾位的程序的stack trace資訊dump到檔案

            // Lastly, measure CPU usage.

            if (processCpuTracker != null) {

                processCpuTracker.init();

                System.gc();

                processCpuTracker.update();

                try {

                    synchronized (processCpuTracker) {

                        processCpuTracker.wait(500); // measure over 1/2 second.

                    }

                } catch (InterruptedException e) {

                }

                processCpuTracker.update();

                // We'll take the stack crawls of just the top apps using CPU.

                final int N = processCpuTracker.countWorkingStats();

                int numProcs = 0;

                for (int i=0; i<N && numProcs<5; i++) {

                    ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);

                    if (lastPids.indexOfKey(stats.pid) >= 0) {

                        numProcs++;

                        try {

                            synchronized (observer) {

                                Process.sendSignal(stats.pid, Process.SIGNAL_QUIT);

                                observer.wait(200);  // Wait for write-close, give up after 200msec

                            }

                        } catch (InterruptedException e) {

                            Log.wtf(TAG, e);

                        }

                    }

                }

            }

        } finally {

            observer.stopWatching();

        }

    }

其中firstPids的Pid內容可以參考appNotResponding方法中的實現邏輯。

3       ANR解決方法

因為ANR主要是因為主執行緒由於耗時操作被阻塞而產生的,所以常見的解決方法是不在主執行緒做耗時操作,具體實現時需要注意以下幾點:

·        主執行緒需要做耗時操作時,比如網路訪問、資料庫操作及點陣圖變換等,必須啟動一子執行緒處理,並利用handler來更新UI;

·        子執行緒儘量使用Android提供的API,比如HandlerThread,AsyncTask,AsyncQueryHandler等,這些API都提供了對於執行緒的系統級管理。如果應用直接使用Thread實現的話,則需要對這些子執行緒進行顯式管理,比如執行緒池及執行緒週期的控制,以防止系統資源和記憶體洩漏;

·        Broadcast Receiver中如果有耗時操作,可以放到service中來處理;

·        在後臺子執行緒處理耗時操作時,為了提高使用者體驗,可以在前臺介面顯示某些動畫或者progress bar;