應用與系統穩定性第三篇---FD洩露問題漫談
在正式解釋什麼是fd洩露的時候,先看看三份log,是否有眼熟而不知所措感覺?結合公司同事的深入研究,總結了多種實際案例,才有了這篇文章,以後FD洩露問題在也不慌了。
log 1: Could not read input channel file descriptors from parcel
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: FATAL EXCEPTION: main 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: Process: com.miui.weather2, PID: 20556 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: java.lang.RuntimeException: Could not read input channel file descriptors from parcel. 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.InputChannel.nativeReadFromParcel(Native Method) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.InputChannel.readFromParcel(InputChannel.java:148) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.InputChannel$1.createFromParcel(InputChannel.java:39) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.InputChannel$1.createFromParcel(InputChannel.java:37) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.view.InputBindResult.<init>(InputBindResult.java:68) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:112) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:110) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.view.IInputMethodManager$Stub$Proxy.startInputOrWindowGainedFocus(IInputMethodManager.java:723) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.inputmethod.InputMethodManager.startInputInner(InputMethodManager.java:1295) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.inputmethod.InputMethodManager.onPostWindowFocus(InputMethodManager.java:1543) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:4069) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.os.Handler.dispatchMessage(Handler.java:106) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.os.Looper.loop(Looper.java:171) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at android.app.ActivityThread.main(ActivityThread.java:6642) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at java.lang.reflect.Method.invoke(Native Method) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:518) 06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:873)
log 2:Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: FATAL EXCEPTION: main 06-22 11:59:30.335 2308 2308 E AndroidRuntime: Process: com.xiaomi.bluetooth, PID: 2308 06-22 11:59:30.335 2308 2308 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.nativeCreate(Native Method) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.start(Thread.java:730) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.c.dk(SynchronizedGattCallback.java:54) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.dk(GattPeripheral.java:97) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eN(GattPeripheral.java:227) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eq(GattPeripheral.java:221) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.z.run(PeripheralConnectionManager.java:462) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.handleCallback(Handler.java:754) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:95) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Looper.loop(Looper.java:160) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6202) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:874) 06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:764)
log 3:unable to open database file (code 14)
android.database.sqlite.SQLiteCantOpenDatabaseException: unable to open database file (code 14) at android.database.sqlite.SQLiteConnection.nativeExecuteForChangedRowCount(Native Method) at android.database.sqlite.SQLiteConnection.executeForChangedRowCount(SQLiteConnection.java:735) at android.database.sqlite.SQLiteSession.executeForChangedRowCount(SQLiteSession.java:754) at android.database.sqlite.SQLiteStatement.executeUpdateDelete(SQLiteStatement.java:64) at android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:1653) at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1599) at com.android.providers.telephony.TelephonyProvider.update(TelephonyProvider.java:2704) at android.content.ContentProvider$Transport.update(ContentProvider.java:357) at android.content.ContentResolver.update(ContentResolver.java:1688) at com.android.internal.telephony.SubscriptionController.setCarrierText(SubscriptionController.java:1202) at com.android.internal.telephony.SubscriptionControllerInjectorBase$DatabaseHandler.handleMessage(SubscriptionControllerInjectorBase.java:201) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loop(Looper.java:164) at android.os.HandlerThread.run(HandlerThread.java:65)
相信大多人都覺得這類問題不好解決,更有人覺得這種問題直接加個try catch,結果下個版本接中上報。因為上面的Log基本上都不是案發現場,正式開始需要先補一波FD洩露的基礎知識。在 ofollow,noindex">Android程序系列第一篇---程序基礎 中的2.3小節也有簡單介紹。
一、FD的相關概念
概念:Fd的全稱是File descriptor,在linux OS裡,所有都可以抽象成檔案,比如普通的檔案、目錄、塊裝置、字元裝置、socket、管道等等。當通過一些系統呼叫(如open/socket等),會返回一個fd(就是一個數字)給你,然後根據這個fd對應的檔案進行操作,比如讀、寫。
1、FD從何而來?
我們說fd是一個數字,那麼這個數字是怎麼計算出來的?在核心程序結構體task_struct中為每個程序維護了一個數組,陣列下標就是fd,裡面儲存的是對這個檔案的描述了。裡面就有files指標,維護著所有開啟的檔案資訊:
struct task_struct { ... /* Open file information: */ struct files_struct*files; ... }
/* * Open file table structure */ struct files_struct { /* * read mostly part */ atomic_t count; bool resize_in_progress; wait_queue_head_t resize_wait; struct fdtable __rcu *fdt; struct fdtable fdtab; /* * written part on a separate cache line in SMP */ spinlock_t file_lock ____cacheline_aligned_in_smp; unsigned int next_fd; unsigned long close_on_exec_init[1]; unsigned long open_fds_init[1]; unsigned long full_fds_bits_init[1]; struct file __rcu * fd_array[NR_OPEN_DEFAULT]; };
files_struct中維護一個fdtable,fdtable裡的fd就是一個數組,file結構體就是為開啟檔案的資訊了。
struct fdtable { unsigned int max_fds; struct file __rcu **fd;/* current fd array */ unsigned long *close_on_exec; unsigned long *open_fds; unsigned long *full_fds_bits; struct rcu_head rcu; };
2、閾值
linux預設對每個程序最大能開啟的fd的個數是1024(軟限制是1024,硬限制是4096),你可以通過/proc/$pid/limits檢視Max open files:

fd閾值檢視
當一個程序開啟的檔案數超過這個軟限制值1024將無法再開啟檔案了。所以就報出各種問題,和OOM問題一樣,crash堆疊有可能只是壓死駱駝的最後一根稻草,並不是真實案發現場。所以用完fd後需要close關閉這個fd,那麼這個fd對應的數字就被系統回收了,下一次的open才會被重新利用。
軟限制和硬限制的區別
硬限制是可以在任何時候任何程序中設定 但硬限制只能由超級使用者提起
軟限制是核心實際執行的限制,任何程序都可以將軟限制設定為任意小於等於對程序限制的硬限制的操作fd
如果覺得fd不夠用了,也可以用下面方式調整.
getrlimit(RLIMIT_NOFILE, &rlim); setrlimit(RLIMIT_NOFILE, &rlim); ulimit -n 2048 android.system.Os.getrlimit(OsConstants.RLIMIT_NOFILE);
egg:
void modifyfdlimit() { rlimit fdLimit; fdLimit.rlim_cur = 30000; fdLimit.rlim_max = 30000; if (-1 == setrlimit(RLIMIT_NOFILE, & fdLimit)) { printf("Set max fd open count fai. /nl"); char cmdBuffer[ 64]; sprintf(cmdBuffer, "ulimit -n %d", 30000); if (-1 == system(cmdBuffer)) { printf("%s failed. /n", cmdBuffer); exit(0); } if (-1 == getrlimit(RLIMIT_NOFILE, & fdLimit)){ printf("Ulimit fd number failed."); exit(0); } } }
3、開啟的fd檢視
使用ls -la /proc/$pid/fd檢視
nitrogen:/ # pidofsystem_server 1956 nitrogen:/ # ls -la /proc/1956/f fd/fdinfo/ nitrogen:/ # ls -la /proc/1956/fd total 0 dr-x------ 2 system system0 2018-11-02 10:57 . dr-xr-xr-x 9 system system0 2018-11-02 10:57 .. lrwx------ 1 system system 64 2018-11-02 12:26 0 -> /dev/null lrwx------ 1 system system 64 2018-11-02 12:26 1 -> /dev/null lr-x------ 1 system system 64 2018-11-02 12:26 10 -> /system/framework/QPerformance.jar lrwx------ 1 system system 64 2018-11-02 12:26 100 -> anon_inode:[timerfd] lrwx------ 1 system system 64 2018-11-02 12:26 101 -> anon_inode:[timerfd] lrwx------ 1 system system 64 2018-11-02 12:26 102 -> anon_inode:[timerfd] lrwx------ 1 system system 64 2018-11-02 12:26 103 -> anon_inode:[timerfd] lrwx------ 1 system system 64 2018-11-02 12:26 104 -> anon_inode:[timerfd] lrwx------ 1 system system 64 2018-11-02 12:26 105 -> anon_inode:[eventpoll] lr-x------ 1 system system 64 2018-11-02 12:26 106 -> anon_inode:inotify lr-x------ 1 system system 64 2018-11-02 12:26 107 -> pipe:[36681] l-wx------ 1 system system 64 2018-11-02 12:26 108 -> pipe:[36681] lrwx------ 1 system system 64 2018-11-02 12:26 109 -> anon_inode:[eventfd] lr-x------ 1 system system 64 2018-11-02 12:26 11 -> /system/framework/core-oj.jar lrwx------ 1 system system 64 2018-11-02 12:26 110 -> anon_inode:[eventpoll] lrwx------ 1 system system 64 2018-11-02 12:26 111 -> socket:[35371] lrwx------ 1 system system 64 2018-11-02 12:26 112 -> socket:[35372] lr-x------ 1 system system 64 2018-11-02 12:26 113 -> /system/media/theme/defau
二、Fd Leak案例
現在看幾種FD洩露問題的案例,FD洩露問題的特點是:
- 同一個問題可能出現不同堆疊, 比較隱晦
- Fd洩漏時記憶體可能不會出現不足,就算觸發GC也不一定能夠回收已經建立的檔案控制代碼
日誌關鍵字:
ashmem_create_region failed for ‘indirect ref table’: Too many open files
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
當你看到上面幾種crash的堆疊之後,就需要往fd洩露的方向上去思考了
1、Resource相關
使用輸入輸出流沒有關閉的可能會出問題,FileInputStream,FileOutputStream,FileReader,FileWriter 等,因為每開啟一個檔案需要fd。一些輸入流也提供了基於fd的構造方法
174public FileInputStream(FileDescriptor fdObj) { 175this(fdObj, false /* isFdOwner */); 176} 177
下面是一種洩露案例
frameworks/base/services/core/java/com/android/server/pm/ResmonWhitelistPackage.java 10final class ResmonWhitelistPackage { 11private final File mSystemDir; 12private final File mWhitelistFile; 13 14final ArrayList<String> mPackages = new ArrayList<String>(); 15 16ResmonWhitelistPackage() { 17mSystemDir = new File("/system/", "etc"); 18mWhitelistFile = new File(mSystemDir, "resmonwhitelist.txt"); 19} 20 21void readList() { .... 25try { 26/// M: Clear white list record before update it 27mPackages.clear(); 28BufferedReader br = new BufferedReader(new FileReader(mWhitelistFile)); 29String line = br.readLine(); 30while (line != null) { 31mPackages.add(line); 32line = br.readLine(); 33} 34br.close(); 35} catch (IOException e) { 36//Log.e(PackageManagerService.TAG, "IO Exception happened while reading resmon whitelist"); 37e.printStackTrace(); 38} 39} 40}
br.close並不是在finally語句中,可能會出現未關閉的可能。如果程式碼寫的風騷一點,也有辦法。從 Java 7 build 105 版本開始,Java 7 的編譯器和執行環境支援新的 try-with-resources 語句,稱為 ARM 塊(Automatic Resource Management) ,自動資源管理。
private static void customBufferStreamCopy(File source, File target) { try (InputStream fis = new FileInputStream(source); OutputStream fos = new FileOutputStream(target)){ byte[] buf = new byte[8192]; int i; while ((i = fis.read(buf)) != -1) { fos.write(buf, 0, i); } } catch (Exception e) { e.printStackTrace(); } }
程式碼清晰,且不會發生洩露。
2、HandlerThread相關
使用HandlerThread不小心也會發生fd洩露,看看這個案例
2.1、現象
systemui總是crash,發生問題系統版本Android O
2.2、初步分析
pid: 18465, tid: 32737, name: async_sensor >>> com.android.systemui <<< signal 5 (SIGTRAP), code -32763 (PTRACE_EVENT_STOP), fault addr 0x3e800007fe1 x0 fffffffffffffffc x1 000000735df1ec38 x2 0000000000000010 x3 00000000ffffffff x4 0000000000000000 x5 0000000000000008 x6 0000007428971000 x7 0000000000bb3876 x8 0000000000000016 x9 7fffffffffffffff x10 000000000000000c x11 0000000000000000 x12 000000735df1ed38 x13 000000005b20a831 x14 002cd0c4e58dc31b x15 0000fdf7aa690a91 x16 00000074245e7498 x17 000000742453bd00 x18 0000000000000004 x19 000000735df1f588 x20 000000738727b708 x21 00000000ffffffff x22 000000735df1f588 x23 000000738727b660 x24 0000000000000028 x25 000000000000000c x26 0000000014a000b0 x27 0000007385815300 x28 00000000710507c8 x29 000000735df1ebe0 x30 000000742453bd38 sp 000000735df1ebc0 pc 00000074245866dc pstate 0000000060000000 v0 00000000000000000000000000000000 v1 00000000000000000000000000000001 v2 00000000000000002065766974616e3c v3 00000000000000000000000000000000 v4 00000000000000008020080200000000 v5 00000000000000004000000000000000 v6 00000000000000000000000000000000 v7 00000000000000008020080280200802 v8 00000000000000000000000000000000 v9 00000000000000000000000000000000 v10 00000000000000000000000000000000 v11 00000000000000000000000000000000 v12 00000000000000000000000000000000 v13 00000000000000000000000000000000 v14 00000000000000000000000000000000 v15 00000000000000000000000000000000 v16 40100401401004014010040140100401 v17 a0080000a00a0000a800aa0040404000 v18 80200800000000008020080200000000 v19 000000000000000000000000ebad8083 v20 000000000000000000000000ebad8084 v21 000000000000000000000000ebad8085 v22 000000000000000000000000ebad8086 v23 000000000000000000000000ebad8087 v24 000000000000000000000000ebad8088 v25 000000000000000000000000ebad8089 v26 000000000000000000000000ebad808a v27 000000000000000000000000ebad808b v28 000000000000000000000000ebad808c v29 000000000000000000000000ebad808d v30 000000000000000000000000ebad808e v31 00000000000000000000000041e00000 fpsr 00000013 fpcr 00000000 backtrace: #00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8) #01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52) #02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144) #03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108) #04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44) #05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140) #06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
乍看是處理訊息的時候掛了?繼續檢視log發現
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files 06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22 06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed 06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files 06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22 06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed 06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files 06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22 06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed 06-15 22:00:33.922 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files 06-15 22:00:33.922 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22 06-15 22:00:33.922 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed 06-15 22:00:33.922 1000 2155 2335 E OpenGLRenderer: GL error: GL_INVALID_OPERATION 06-15 22:00:33.922 1000 2155 2335 F OpenGLRenderer: glCopyTexSubImage2D error! GL_INVALID_OPERATION (0x502
狀態列open fd超過1024, 看log有很多上面這種log,這個是真實案發現場嗎?
2.3、深入分析
O上發生NE時會將fd資訊列印到tombstone檔案中,看fd資訊確實已經滿了,多為anon_inode:[eventfd]和anon_inode:dmabuf,
backtrace: #00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8) #01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52) #02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144) #03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108) #04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44) #05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140) #06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192) .... fd 556: anon_inode:[eventpoll] fd 557: anon_inode:[eventpoll] fd 558: anon_inode:[eventfd] fd 559: anon_inode:[eventpoll] fd 560: anon_inode:[eventfd] fd 561: anon_inode:[eventpoll] fd 562: anon_inode:[eventfd] fd 563: anon_inode:[eventfd] fd 564: anon_inode:[eventpoll] fd 565: anon_inode:[eventfd] fd 566: anon_inode:[eventpoll] fd 567: /dev/ashmem ..... //省略千行 fd 1022: anon_inode:dmabuf fd 1023: socket:[3549620]
通過trace分析,還有一個關鍵的異常log,看到systemui程序有很多個async_sensor執行緒,為什麼這個執行緒這麼多呢?
pid: 11019, tid: 2301, name: async_sensor >>> com.android.systemui <<< pid: 11019, tid: 2431, name: async_sensor >>> com.android.systemui <<< pid: 11019, tid: 2522, name: async_sensor >>> com.android.systemui <<< pid: 11019, tid: 2542, name: async_sensor >>> com.android.systemui <<< pid: 11019, tid: 2600, name: async_sensor >>> com.android.systemui <<< .....//省略若干 pid: 11019, tid: 5693, name: async_sensor >>> com.android.systemui <<<
搜查程式碼async_sensor是什麼?發現async_sensor是個HanderThread

image.png
看在DozeFactory中,有new AsyncSensorManager 的地方:

image.png
繼續檢視assembleMachine方法在哪裡呼叫的,DozeService中有呼叫 assembleMachine的地方

image.png
回頭在結合log發現,DozeService被頻繁的啟動,看來一步步的接近真相了。這個問題看來是鎖屏同事造成的問題。
isTest=false, canDoze=true, userId=0 06-15 21:48:11.888 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:48:13.337 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:48:24.519 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:48:33.577 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:48:50.640 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:48:56.540 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:28.240 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:29.207 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:30.206 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:33.332 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:37.435 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:49:50.529 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:50:20.010 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 06-15 21:50:30.148 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0 ......
2.4、修復方案
最終轉給鎖屏同事,將此修復

image.png
所以本問題的RootCase就是頻繁的啟動了DozeService,建立了大量的HandlerThread導致fd洩露,那麼為什麼HandlerThread和fd洩露有關係呢?跟蹤原始碼發現HandlerThread建立會引起Looper的建立,每一個Looper在建立的時候會開啟兩個fd,一個是eventfd,另外一個是mEpolled,這個和tombstone檔案中列印的fd也對上了。

image.png
總結,這種洩露問題如果分析的時候,如果不知道HandlerThread會建立兩個fd的基本知識,那麼這個問題比較難以分析。
2、Thread.start相關
執行緒啟動的時候,可能也會有fd洩露的風險,不過這種錯誤不太容易犯下,如果你真是在一個迴圈中建立1024執行緒,那麼立刻見效,程式死掉。

image.png
trace1
java.lang.OutOfMemoryError: Could not allocate JNI Env at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:729) at com.android.server.wifi.WifiNative.startHal(WifiNative.java:1639) at com.android.server.wifi.WifiStateMachine.setupDriverForSoftAp(WifiStateMachine.java:3970) at com.android.server.wifi.WifiStateMachine.-wrap9(WifiStateMachine.java) at com.android.server.wifi.WifiStateMachine$InitialState.processMessage(WifiStateMachine.java:4480) at com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:980) at com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:799) at android.os.Handler.dispatchMessage(Handler.java:102) at android.os.Looper.loop(Looper.java:163) at android.os.HandlerThread.run(HandlerThread.java:61)
trace2
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:733) at com.tencent.mm.sdk.f.b$a.start(SourceFile:61) at com.tencent.mm.am.a.bU(SourceFile:60) at com.tencent.mm.ui.MMAppMgr$8.tC(SourceFile:315) at com.tencent.mm.sdk.platformtools.am.handleMessage(SourceFile:69) at com.tencent.mm.sdk.platformtools.aj.handleMessage(SourceFile:173) at com.tencent.mm.sdk.platformtools.aj.dispatchMessage(SourceFile:128) at android.os.Looper.loop(Looper.java:176) at android.app.ActivityThread.main(ActivityThread.java:6701) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
3、Inputchannel相關
Inputchannel也會可能出現fd洩露問題,如下:

image.png
3.1在Activity中不斷彈Dialog
public class Main2Activity extends Activity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main2); } public void onClick(View view) { for (int i = 0; i < 1024; i++) { AlertDialog.Builder builder = new AlertDialog.Builder(this); builder.setTitle("fd").setIcon(R.drawable.ic_launcher_background).create(); builder.show(); } } }
不過一會,這個App就死了,報出了下面的問題
11-02 17:38:22.263 9351-9351/com.example.wangjing.rebootdemo E/AndroidRuntime: FATAL EXCEPTION: main Process: com.example.wangjing.rebootdemo, PID: 9351 java.lang.IllegalStateException: Could not execute method for android:onClick at android.view.View$DeclaredOnClickListener.onClick(View.java:5391) at android.view.View.performClick(View.java:6311) at android.view.View$PerformClick.run(View.java:24833) at android.os.Handler.handleCallback(Handler.java:794) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:173) at android.app.ActivityThread.main(ActivityThread.java:6653) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821) Caused by: java.lang.reflect.InvocationTargetException at java.lang.reflect.Method.invoke(Native Method) at android.view.View$DeclaredOnClickListener.onClick(View.java:5386) at android.view.View.performClick(View.java:6311) at android.view.View$PerformClick.run(View.java:24833) at android.os.Handler.handleCallback(Handler.java:794) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:173) at android.app.ActivityThread.main(ActivityThread.java:6653) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821) Caused by: java.lang.RuntimeException: Could not read input channel file descriptors from parcel. at android.view.InputChannel.nativeReadFromParcel(Native Method) at android.view.InputChannel.readFromParcel(InputChannel.java:148) at android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:804) at android.view.ViewRootImpl.setView(ViewRootImpl.java:770) at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:356) at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:94) at android.app.Dialog.show(Dialog.java:330) at android.app.AlertDialog$Builder.show(AlertDialog.java:1114) at com.example.wangjing.rebootdemo.Main2Activity.onClick(Main2Activity.java:28) at java.lang.reflect.Method.invoke(Native Method) at android.view.View$DeclaredOnClickListener.onClick(View.java:5386) at android.view.View.performClick(View.java:6311) at android.view.View$PerformClick.run(View.java:24833) at android.os.Handler.handleCallback(Handler.java:794) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:173) at android.app.ActivityThread.main(ActivityThread.java:6653) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
檢視一下這個程序的fd資訊,很多的socket,正常情況下,這些東西是沒有的
jason:/ # ps -ef |grep "wangjing" u0_a161131349462 3 17:43:12 ?00:00:04 com.example.wangjing.rebootdemo root13385 13380 3 17:45:26 pts/1 00:00:00 grep wangjing jason:/ # ls -laproc/13134/fd/ total 0 dr-x------ 2 u0_a161 u0_a1610 2018-11-02 17:43 . dr-xr-xr-x 9 u0_a161 u0_a1610 2018-11-02 17:43 .. lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 0 -> /dev/null lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 1 -> /dev/null lr-x------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 10 -> /system/framework/com.nxp.nfc.nq.jar lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 100 -> socket:[17760179] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 101 -> socket:[17753761] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 102 -> socket:[17773237] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 103 -> socket:[17760182] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 104 -> socket:[17760184] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 105 -> socket:[17773239] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 106 -> socket:[17776657] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 107 -> socket:[17774959] lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 108 -> socket:[17776659] ......
5、Bitmap相關
bitmap也是需要fd的,如下圖,沒有關閉,可能引發fd洩露的可能。

image.png
trace1
java.lang.RuntimeException: Could not allocate dup blob fd. at android.graphics.Bitmap.nativeCreateFromParcel(Native Method) at android.graphics.Bitmap.access$100(Bitmap.java:36) at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1528) at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1520) at android.widget.RemoteViews$BitmapCache.<init>(RemoteViews.java:954) at android.widget.RemoteViews.<init>(RemoteViews.java:1820) at android.widget.RemoteViews.<init>(RemoteViews.java:1812) at android.widget.RemoteViews.clone(RemoteViews.java:1905) at android.app.Notification.cloneInto(Notification.java:1534) at android.app.Notification.clone(Notification.java:1508) at android.service.notification.StatusBarNotification.clone(StatusBarNotification.java:161) at com.android.server.notification.NotificationManagerService$NotificationListeners.notifyPostedLocked(NotificationManagerService.java:3557) at com.android.server.notification.NotificationManagerService$8.run(NotificationManagerService.java:2337) at android.os.Handler.handleCallback(Handler.java:815) at android.os.Handler.dispatchMessage(Handler.java:104) at android.os.Looper.loop(Looper.java:207) at com.android.server.SystemServer.run(SystemServer.java:410) at com.android.server.SystemServer.main(SystemServer.java:255) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:933) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:782)
三、總結
通過上面的五個案例,總結常見的fd洩露的情景,一般出現下面的log,就需要懷疑是否有fd洩露的情況。
"Too many open files""Could not allocate JNI Env""Could not allocate dup blob fd""Could not read input channel file descriptors from parcel""pthread_create""InputChannel is not initialized""Could not open input channel pair"
3.1、容易復現
1.檢視fd資訊adb shell ls -a -l /proc/<pid>/fd ,lsof
2.檢視程序執行緒資訊:ps -t <pid>,或者抓程序trace, kill -3 <pid>
3.抓取hprof定位資源使用情況
3.2、難復現
1.對於應用自身fd洩漏發生JE時可以在複寫UncatchHandlerException在應用crash的時候通過readlink的方式讀取/proc/self/fd的資訊,在後面發生的時候可以以獲取fd資訊
2.O之後NE的Tombstone檔案中有open files,可以檢視開啟的fd資訊
3.抓取程序的ps資訊或者trace資訊
4.如果是inputchannel型別的,有可能是視窗型別的,因此可以檢視window情況,dumpsys window