python – Keras在呼叫train_on_batch,fit等時使用太多的GPU記憶體
我已經搞砸了克拉斯,喜歡它到目前為止.在使用相當深入的網路時,我遇到了一個大問題:當呼叫model.train_on_batch或model.fit等時,Keras會分配比模型本身需要更多的GPU記憶體.這不是因為嘗試訓練一些非常大的影象,而是網路模型本身似乎需要大量的GPU記憶體.我創造了這個玩具示例來顯示我的意思.基本上是這樣做的:
我首先建立一個相當深入的網路,並使用model.summary()獲取網路所需的引數總數(在這種情況下為206538153,對應於大約826 MB).然後我使用nvidia-smi來看看Keras已經分配了多少GPU記憶體,我可以看到它是完美的(849 MB).
然後我編譯網路,並且可以確認這不會增加GPU記憶體使用.正如我們在這種情況下可以看到的那樣,我現在有近1 GB的VRAM可用.
然後,我嘗試向網路提供一個簡單的16×16影象和1×1的地面真相,然後一切都會爆炸,因為Keras再次分配大量的記憶體,沒有任何理由對我來說是顯而易見的.培訓網路的東西似乎需要更多的記憶,而不僅僅是模型,這對我來說沒有意義.我已經在其他框架中對這個GPU進行了明顯的更深入的網路培訓,所以這讓我覺得我使用了Keras錯誤(或者我的設定有問題,或者在Keras中,但是當然這是很難知道的).
以下是程式碼:
from scipy import misc import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Reshape, Flatten, ZeroPadding2D, Dropout import os model = Sequential() model.add(Convolution2D(256, 3, 3, border_mode='same', input_shape=(16,16,1))) model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2))) model.add(Convolution2D(512, 3, 3, border_mode='same')) model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2))) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(Convolution2D(1024, 3, 3, border_mode='same')) model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2))) model.add(Convolution2D(256, 3, 3, border_mode='same')) model.add(Convolution2D(32, 3, 3, border_mode='same')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten()) model.add(Dense(4)) model.add(Dense(1)) model.summary() os.system("nvidia-smi") raw_input("Press Enter to continue...") model.compile(optimizer='sgd', loss='mse', metrics=['accuracy']) os.system("nvidia-smi") raw_input("Compiled model. Press Enter to continue...") n_batches = 1 batch_size = 1 for ibatch in range(n_batches): x = np.random.rand(batch_size, 16,16,1) y = np.random.rand(batch_size, 1) os.system("nvidia-smi") raw_input("About to train one iteration. Press Enter to continue...") model.train_on_batch(x, y) print("Trained one iteration")
哪個給我以下輸出:
Using Theano backend. Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN 5103) /usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5. warnings.warn(warn) ____________________________________________________________________________________________________ Layer (type)Output ShapeParam #Connected to ==================================================================================================== convolution2d_1 (Convolution2D)(None, 16, 16, 256)2560convolution2d_input_1[0][0] ____________________________________________________________________________________________________ maxpooling2d_1 (MaxPooling2D)(None, 8, 8, 256)0convolution2d_1[0][0] ____________________________________________________________________________________________________ convolution2d_2 (Convolution2D)(None, 8, 8, 512)1180160maxpooling2d_1[0][0] ____________________________________________________________________________________________________ maxpooling2d_2 (MaxPooling2D)(None, 4, 4, 512)0convolution2d_2[0][0] ____________________________________________________________________________________________________ convolution2d_3 (Convolution2D)(None, 4, 4, 1024)4719616maxpooling2d_2[0][0] ____________________________________________________________________________________________________ convolution2d_4 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_3[0][0] ____________________________________________________________________________________________________ convolution2d_5 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_4[0][0] ____________________________________________________________________________________________________ convolution2d_6 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_5[0][0] ____________________________________________________________________________________________________ convolution2d_7 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_6[0][0] ____________________________________________________________________________________________________ convolution2d_8 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_7[0][0] ____________________________________________________________________________________________________ convolution2d_9 (Convolution2D)(None, 4, 4, 1024)9438208convolution2d_8[0][0] ____________________________________________________________________________________________________ convolution2d_10 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_9[0][0] ____________________________________________________________________________________________________ convolution2d_11 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_10[0][0] ____________________________________________________________________________________________________ convolution2d_12 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_11[0][0] ____________________________________________________________________________________________________ convolution2d_13 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_12[0][0] ____________________________________________________________________________________________________ convolution2d_14 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_13[0][0] ____________________________________________________________________________________________________ convolution2d_15 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_14[0][0] ____________________________________________________________________________________________________ convolution2d_16 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_15[0][0] ____________________________________________________________________________________________________ convolution2d_17 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_16[0][0] ____________________________________________________________________________________________________ convolution2d_18 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_17[0][0] ____________________________________________________________________________________________________ convolution2d_19 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_18[0][0] ____________________________________________________________________________________________________ convolution2d_20 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_19[0][0] ____________________________________________________________________________________________________ convolution2d_21 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_20[0][0] ____________________________________________________________________________________________________ convolution2d_22 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_21[0][0] ____________________________________________________________________________________________________ convolution2d_23 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_22[0][0] ____________________________________________________________________________________________________ convolution2d_24 (Convolution2D) (None, 4, 4, 1024)9438208convolution2d_23[0][0] ____________________________________________________________________________________________________ maxpooling2d_3 (MaxPooling2D)(None, 2, 2, 1024)0convolution2d_24[0][0] ____________________________________________________________________________________________________ convolution2d_25 (Convolution2D) (None, 2, 2, 256)2359552maxpooling2d_3[0][0] ____________________________________________________________________________________________________ convolution2d_26 (Convolution2D) (None, 2, 2, 32)73760convolution2d_25[0][0] ____________________________________________________________________________________________________ maxpooling2d_4 (MaxPooling2D)(None, 1, 1, 32)0convolution2d_26[0][0] ____________________________________________________________________________________________________ flatten_1 (Flatten)(None, 32)0maxpooling2d_4[0][0] ____________________________________________________________________________________________________ dense_1 (Dense)(None, 4)132flatten_1[0][0] ____________________________________________________________________________________________________ dense_2 (Dense)(None, 1)5dense_1[0][0] ==================================================================================================== Total params: 206538153 ____________________________________________________________________________________________________ None Thu Oct6 09:05:42 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.63Driver Version: 352.63| |-------------------------------+----------------------+----------------------+ | GPUNamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | FanTempPerfPwr:Usage/Cap|Memory-Usage | GPU-UtilCompute M. | |===============================+======================+======================| |0GeForce GTX 960Off| 0000:01:00.0On |N/A | | 30%37CP228W / 120W |1082MiB /2044MiB |9%Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes:GPU Memory | |GPUPIDTypeProcess nameUsage| |=============================================================================| |01796G/usr/bin/X155MiB | |02597Gcompiz65MiB | |05966Cpython849MiB | +-----------------------------------------------------------------------------+ Press Enter to continue... Thu Oct6 09:05:44 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.63Driver Version: 352.63| |-------------------------------+----------------------+----------------------+ | GPUNamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | FanTempPerfPwr:Usage/Cap|Memory-Usage | GPU-UtilCompute M. | |===============================+======================+======================| |0GeForce GTX 960Off| 0000:01:00.0On |N/A | | 30%38CP228W / 120W |1082MiB /2044MiB |0%Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes:GPU Memory | |GPUPIDTypeProcess nameUsage| |=============================================================================| |01796G/usr/bin/X155MiB | |02597Gcompiz65MiB | |05966Cpython849MiB | +-----------------------------------------------------------------------------+ Compiled model. Press Enter to continue... Thu Oct6 09:05:44 2016 +------------------------------------------------------+ | NVIDIA-SMI 352.63Driver Version: 352.63| |-------------------------------+----------------------+----------------------+ | GPUNamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | FanTempPerfPwr:Usage/Cap|Memory-Usage | GPU-UtilCompute M. | |===============================+======================+======================| |0GeForce GTX 960Off| 0000:01:00.0On |N/A | | 30%38CP228W / 120W |1082MiB /2044MiB |0%Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes:GPU Memory | |GPUPIDTypeProcess nameUsage| |=============================================================================| |01796G/usr/bin/X155MiB | |02597Gcompiz65MiB | |05966Cpython849MiB | +-----------------------------------------------------------------------------+ About to train one iteration. Press Enter to continue... Error allocating 37748736 bytes of device memory (out of memory). Driver report 34205696 bytes free and 2144010240 bytes total Traceback (most recent call last): File "memtest.py", line 65, in <module> model.train_on_batch(x, y) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 712, in train_on_batch class_weight=class_weight) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1221, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 717, in __call__ return self.function(*inputs) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in __call__ storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in __call__ outputs = self.fn() MemoryError: Error allocating 37748736 bytes of device memory (out of memory). Apply node that caused the error: GpuContiguous(GpuDimShuffle{3,2,0,1}.0) Toposort index: 338 Inputs types: [CudaNdarrayType(float32, 4D)] Inputs shapes: [(1024, 1024, 3, 3)] Inputs strides: [(1, 1024, 3145728, 1048576)] Inputs values: ['not shown'] Outputs clients: [[GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0}), GpuDnnConvGradI{algo='none', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
需要注意的幾件事情
>我嘗試過Theano和TensorFlow後端.兩者都有相同的問題,並且在同一行中的記憶體不足.在TensorFlow中,似乎Keras預先分配了大量記憶體(大約1.5 GB),所以nvidia-smi並不能幫助我們跟蹤那裡發生了什麼,但是我得到了相同的記憶體不足的例外.再次,這表明我的(我的使用)Keras的錯誤(雖然很難確定這樣的事情,它可能是我的設定的東西).
>我嘗試在Theano中使用CNMEM,其行為像TensorFlow:它預分配大量記憶體(約1.5 GB),但在同一個地方崩潰.
>有一些關於CudNN版本的警告.我嘗試使用CUDA執行Theano後端,但不是CudNN,並且我得到相同的錯誤,所以這不是問題的根源.
>如果要在自己的GPU上測試,可能需要使網路更深/更淺,這取決於您需要測試多少GPU記憶體.
>我的配置如下:Ubuntu 14.04,GeForce GTX 960,CUDA 7.5.18,CudNN 5.1.3,Python 2.7,Keras 1.1.0(通過pip安裝)
>我已經嘗試改變模型的編譯,以使用不同的優化和損失,但這似乎並沒有改變任何東西.
>我已經嘗試更改train_on_batch函式來使用fit,但它也有同樣的問題.
>我在StackOverflow–ofollow,noindex" target="_blank">Why does this Keras model require over 6GB of memory? 上看到一個類似的問題 – 但據我所知,我的配置沒有這些問題.我從來沒有安裝過多個版本的CUDA,而且我重新檢查了我的PATH,LD_LIBRARY_PATH和CUDA_ROOT變數比我可以計數的次數.
> Julius建議啟用引數本身佔用GPU記憶體.如果這是真的,有人可以更清楚地解釋一下嗎?我已經嘗試將我的卷積層的啟用函式更改為明顯硬編碼的函式,沒有任何可學習的引數,我可以告訴,這並不改變任何東西.此外,這些引數似乎不太可能佔用與網路本身其餘部分幾乎一樣多的記憶體.
>經過徹底的測試,我可以訓練的最大的網路是大約453 MB的引數,我的〜2 GB的GPU RAM.這是正常嗎?
>在對我的GPU進行適配的一些較小的CNN上測試Keras之後,我可以看到GPU RAM使用率突然上升.如果我執行一個大約100 MB引數的網路,99%的培訓時間將使用少於200 MB的GPU RAM.但是每隔一段時間,記憶體使用量就會上升到大約1.3 GB.似乎很安全地假設這是引起我的問題的這些尖峰.我從來沒有在其他框架中看到這些尖峰,但是他們可能在那裡有一個很好的理由?如果有人知道是什麼原因造成的,如果有辦法避免它們,請進來!