tensorflow 記憶體不足：Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

阿新 • • 發佈：2018-11-25

tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 536870912 bytes on host: CUDA_ER

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

多方搜尋，才知出現這個問題是因為記憶體使用過多導致。我剛開始設定batch_size為5 迭代了137次出現了以上錯誤資訊。按照網上說法將batch_size改為2迭代了3百多次停止。後來乾脆設定為1，才沒能出現以上錯誤資訊。很是奇怪，覺得batch_size為1或者為2差別不大，應該不是根本原因，而是誤打誤撞解決了

再行搜尋，得知原因是：loss或者網路的輸出不斷積累導致計算圖不斷擴張。解決方案：在訓練的迴圈過程中，需要用到loss，則用loss.data[0]

今天剛把伺服器搭建起來結果程式就跑不起來當時差點把自己嚇尿了
錯誤型別：CUDA_ERROE_OUT_OF_MEMORY

E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORYW ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 17179869184Killed

其實很容易理解大致意思就是伺服器的GPU大小為M
tensorflow只能申請N（N<M）
也就是tensorflow告訴你不能申請到GPU的全部資源然後就不幹了
解決方法：
找到程式碼中Session
在session定義前增加

config = tf.ConfigProto(allow_soft_placement=True)
#最多佔gpu資源的70%
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
#開始不會給tensorflow全部gpu資源而是按需增加
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
這樣就沒問題了

其實tensorflow 算是一個比較貪心的工具了
就算用device_id指定gpu 也會佔用別的GPU的視訊記憶體資源必須在執行程式前
執行 export CUDA_VISIBLE_DEVICES=n（n為可見的伺服器編號）
再去執行python 程式碼.py 才不會佔用別的GPU資源
最近剛開始搞tensorflow 之前都是caffe
這周連續3天被實驗室的人舉報佔用過多伺服器資源真是心累只要用上面的方法
也就是執行程式碼前執行 export CUDA_VISIBLE_DEVICES=n
只讓1個或者個別GPU可見其他GPU看不見就行了
---------------------
作者：無奈的小心酸
來源：CSDN
原文：https://blog.csdn.net/wangkun1340378/article/details/72782593
版權宣告：本文為博主原創文章，轉載請附上博文連結！

tensorflow 記憶體不足：Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

tensorflow 記憶體不足：Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

配置Pycharm，解決終端可執行，Pycharm報錯：Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

pycharm除錯pycaffe，出現Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

pycharm報錯：Process finished with exit code -1073741819 (0xC0000005)

pycharm Process finished with exit code (0xC0000005)

Emulator: Process finished with exit code 1

解決Pycharm debug下出現錯誤Process finished with exit code -1073741819 (0xC0000005)的問題

大坑之Process finished with exit code -1073740791（0xC0000409）

【spring boot】base服務啟動失敗--Process finished with exit code 0

Android Studio Emulator: Process finished with exit code 1 問題解決

Spring boot 啟動後 Process finished with exit code 0

bug寶典node篇 nodejs Process finished with exit code 0

執行程式碼後出現Process finished with exit code 0是為什麼？

IntelliJ Process finished with exit code 0 when spring-boot run

pyCharm Process finished with exit code -1073740791 (0xC0000409)

Bug處理之Pycharm-Python-pandas-Process finished with exit code -1073741819 (0xC0000005)

spring cloud Process finished with exit code 1分析

spring boot-jsp專案main啟動出現process finished with exit code 0 錯誤

SpringBoot啟動直接結束控制檯只顯示 “Process finished with exit code 1” 處理方式

python 使用requests時提示Process finished with exit code -1073741819 (0xC0000005)

tensorflow 記憶體不足：Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

相關推薦