SLI導致雙顯示卡被TensorFlow同時佔用問題(Windows下)
最近學習TensorFlow,被一些不是bug的問題折騰的頭暈腦脹,藉此寫一下解決方法。本人是在win10下使用TensorFlow的,所以ubuntu下的繞行吧,不會出現這些問題。(此文有些地方我重新整理了一遍,放在了相約機器人公眾號上,大家可以參見連結)
眾所周知,TensorFlow在執行時,會搶佔所有檢測到的GPU的視訊記憶體,這種做法褒貶不一吧,只能說,但怎麼單獨設定使用哪幾塊顯示卡呢,唯一的方法就是利用CUDA本身隱藏掉某些顯示卡(除此之外就是拔掉多餘顯示卡了,大家應該不會傻到這麼去做),有些教輔書或網上教程中寫的以下方法都是治標不治本的:
(1)使用with.....device語句
例如
with tf.device("/gpu:1"):
這只是指定下面的程式在哪塊GPU上執行,程式本身還是會佔用所有GPU的資源(信不信由你)
(2)使用allow_growth=True或per_process_gpu_memory_fraction
例如
import tensorflow as tf g = tf.placeholder(tf.int16) h = tf.placeholder(tf.int16) mul = tf.multiply(g,h) gpu_options = tf.GPUOptions(allow_growth = True) #gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.7) config = tf.ConfigProto(log_device_placement = True,allow_soft_placement = True,gpu_options = gpu_options) with tf.Session(config=config) as sess: print("相乘:%d" % sess.run(mul, feed_dict = {g:3,h:4}))
前者能夠實現隨著程式本身慢慢增加所佔用的GPU的視訊記憶體,但仍舊會佔用所有GPU,如下:
上圖為程式執行前,下圖為程式執行後,可見程式執行後,兩塊GPU均被佔用了,但實際上只有GPU0執行了上述程式:
C:\Users\B622>python Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> >>> g = tf.placeholder(tf.int16) >>> h = tf.placeholder(tf.int16) >>> mul = tf.multiply(g,h) >>> >>> gpu_options = tf.GPUOptions(allow_growth = True) >>> #gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.7) ... config = tf.ConfigProto(log_device_placement = True,allow_soft_placement = True,gpu_options = gpu_options) >>> with tf.Session(config=config) as sess: ... print("相乘:%d" % sess.run(mul, feed_dict = {g:3,h:4})) ... 2018-08-21 07:00:01.651592: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-08-21 07:00:01.927932: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721 pciBusID: 0000:17:00.0 totalMemory: 11.00GiB freeMemory: 9.10GiB 2018-08-21 07:00:02.025456: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721 pciBusID: 0000:65:00.0 totalMemory: 11.00GiB freeMemory: 9.10GiB 2018-08-21 07:00:02.030441: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0, 1 2018-08-21 07:00:03.036953: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-21 07:00:03.040347: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 1 2018-08-21 07:00:03.042564: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N N 2018-08-21 07:00:03.044994: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 1: N N 2018-08-21 07:00:03.047419: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8806 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2018-08-21 07:00:03.054450: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8806 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1 2018-08-21 07:00:03.064623: I T:\src\github\tensorflow\tensorflow\core\common_runtime\direct_session.cc:284] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1 Mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-21 07:00:03.074668: I T:\src\github\tensorflow\tensorflow\core\common_runtime\placer.cc:886] Mul: (Mul)/job:localhost/replica:0/task:0/device:GPU:0 Placeholder_1: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-21 07:00:03.078028: I T:\src\github\tensorflow\tensorflow\core\common_runtime\placer.cc:886] Placeholder_1: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0 Placeholder: (Placeholder): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-21 07:00:03.081462: I T:\src\github\tensorflow\tensorflow\core\common_runtime\placer.cc:886] Placeholder: (Placeholder)/job:localhost/replica:0/task:0/device:GPU:0 相乘:12
而後者設定固定大小資源的per_process_gpu_memory_fraction,也只是均勻搶佔每塊GPU這麼多資源而已,仍舊佔用了所有GPU,如下:
正確的做法是利用CUDA來隱藏某些GPU,方法如下:
(1)直接在程式碼中利用python語句實現
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
(2)直接在終端寫入
windows下(test.py改成自己的py檔案):
set CUDA_VISIBLE_DEVICES=1
python tset.py
linux下:
CUDA_VISIBLE_DEVICES=1 python test.py
但是如果程式中出現with tf.device():等語句,可能會因為不小心的索引而發生錯誤,為什麼這麼說呢?
CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
CUDA_VISIBLE_DEVICES="" No GPU will be visible
舉個例子,當執行如下程式碼時,程式會提示錯誤:
import tensorflow as tf
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
with tf.device("/gpu:1"):
g = tf.placeholder(tf.int16)
h = tf.placeholder(tf.int16)
mul = tf.multiply(g,h)
gpu_options = tf.GPUOptions(allow_growth = True)
config = tf.ConfigProto(log_device_placement = True,gpu_options = gpu_options)
#config = tf.ConfigProto(log_device_placement = True,allow_soft_placement = True)
with tf.Session(config=config) as sess:
print("相乘:%d" % sess.run(mul, feed_dict = {g:3,h:4}))
因為當設定os.environ["CUDA_VISIBLE_DEVICES"] = "1"時,如果你又使用了with tf.device("/gpu:1"):(注:with tf.device("/gpu:0"):是正確的),則程式會提示你沒有可用的GPU1,只有可用的CPU0和GPU0,如下(原因是因為設定了CUDA_VISIBLE_DEVICES後,CUDA本身會重新按你設定的順序從0開始排列可見的GPU,這裡只設置了一塊GPU,所以只能索引到第0號GPU,超出索引會報錯,雖然物理PCI總線上呼叫的還是GPU1這塊顯示卡,但程式本身認為該塊顯示卡的索引號是0而不是1):
InvalidArgumentError: Cannot assign a device for operation 'Mul': Operation was explicitly assigned to /device:GPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Mul = Mul[T=DT_INT16, _device="/device:GPU:1"](Placeholder, Placeholder_1)]]
Caused by op 'Mul', defined at:
File "E:\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 269, in <module>
main()
File "E:\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 265, in main
kernel.start()
File "E:\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 486, in start
self.io_loop.start()
File "E:\Anaconda3\lib\site-packages\tornado\platform\asyncio.py", line 127, in start
self.asyncio_loop.run_forever()
File "E:\Anaconda3\lib\asyncio\base_events.py", line 422, in run_forever
self._run_once()
File "E:\Anaconda3\lib\asyncio\base_events.py", line 1432, in _run_once
handle._run()
File "E:\Anaconda3\lib\asyncio\events.py", line 145, in _run
self._callback(*self._args)
File "E:\Anaconda3\lib\site-packages\tornado\platform\asyncio.py", line 117, in _handle_events
handler_func(fileobj, events)
File "E:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "E:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "E:\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "E:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "E:\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "E:\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "E:\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2662, in run_cell
raw_cell, store_history, silent, shell_futures)
File "E:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2785, in _run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "E:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2909, in run_ast_nodes
if self.run_code(code, result):
File "E:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-f92d6fb2b710>", line 1, in <module>
runfile('C:/Users/B622/.spyder-py3/temp.py', wdir='C:/Users/B622/.spyder-py3')
File "E:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "E:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/B622/.spyder-py3/temp.py", line 22, in <module>
add2 = tf.multiply(g,h)
File "E:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 337, in multiply
return gen_math_ops.mul(x, y, name)
File "E:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5066, in mul
"Mul", x=x, y=y, name=name)
File "E:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "E:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "E:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Mul': Operation was explicitly assigned to /device:GPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Mul = Mul[T=DT_INT16, _device="/device:GPU:1"](Placeholder, Placeholder_1)]]
同理,如果你設定了os.environ["CUDA_VISIBLE_DEVICES"] = "3,0,1"(假設你有4塊GPU),則這時物理上的GPU3在程式看來是GPU0,物理上的GPU0在程式看來是GPU1,物理上的GPU1在程式看來是GPU2,物理上的GPU2不可見(被隱藏掉了)。
當然為了防止不小心的索引,可以在tf.ConfigProto中設定allow_soft_placement = True(表示指定的裝置不存在時,允許tf自動分配裝置),但這其實和我們所要將某些程式碼指配給某塊GPU相違背,所以在寫tf.device時要想清楚現在的GPU索引號。
除上述之外,在windows下還有很坑的一點是,當你的機子上有兩塊GPU設定了交火後,即使用了SLI橋後,無論你怎麼設定os.environ["CUDA_VISIBLE_DEVICES"] = "1"或在終端寫入對應指定某塊GPU的指令,TensorFlow還是會佔用所有GPU,雖然真的只有設定的GPU可見。
是不是感覺隱藏的GPU不可用,但還是被佔了視訊記憶體,有點賠了夫人又折兵啊。就是這麼荒唐,這個問題,排查了我一宿加一早上,百度又百度都找不到任何答案。嘗試過拆除SLI橋(如下圖):
但拆除後,發現windows檢測不到任何一塊顯示卡,如下圖(兩塊顯示卡都處於感嘆號狀態,這時你在終端使用nvidia-smi會報錯,表示不存在任何GPU):
裝上後又顯示正常了,真是很醉的操作,於是折騰了很久很久都沒有解決,一開始以為是驅動壞了,重灌了無數遍驅動,還是感嘆號,哇得一聲哭了出來(注:ubuntu下不會出現這樣的問題)。
最終,是禁用了SLI才解決的,即直接在NAVIDIA設定(NAVIDIA控制面板)中禁用掉就行了,如下圖:
禁用的時候會顯示需要關閉一些程式,直接在工作管理員裡結束即可。
注意:在結束上圖中的第一個程序(WindowsInternal...)時,該程序會在一兩秒內自動重啟用,所以速度要快,多嘗試幾次就行。
禁用SLI後,就不會出現兩塊GPU同時被tf佔用了,真正實現指定哪塊就佔用哪塊。