Nvidia-smi簡介及常用指令及其引數說明
【時間】2018.10.10
【題目】Nvidia-smi簡介及常用指令及其引數說明
目錄
一、什麼是Nvidia-smi
nvidia-smi是nvidia 的系統管理介面 ,其中smi是System management interface的縮寫,它可以收集各種級別的資訊,檢視視訊記憶體使用情況。此外, 可以啟用和禁用 GPU 配置選項 (如 ECC 記憶體功能)。
二、常用的Nvidia-smi指令
(以下圖片主要來自http://hui.sohu.com/infonews/article/6337322514200395777
1. nvidia-smi
-
【功能】 顯示出當前GPU的所有基礎資訊。
解釋相關引數含義:
-
GPU:本機中的GPU編號
-
Name:GPU 型別
-
Persistence-M:
-
Fan:風扇轉速
-
Temp:溫度,單位攝氏度
-
Perf:表徵效能狀態,從P0到P12,P0表示最大效能,P12表示狀態最小效能
-
Pwr:Usage/Cap:能耗表示
-
Bus-Id:涉及GPU匯流排的相關資訊;
-
Disp.A:Display Active,表示GPU的顯示是否初始化
-
Memory-Usage:視訊記憶體使用率
-
Volatile GPU-Util:浮動的GPU利用率
-
Uncorr. ECC:關於ECC的東西
-
Compute M.:計算模式
-
Processes 顯示每塊GPU上每個程序所使用的視訊記憶體情況。
(更詳細的說明可參考https://blog.csdn.net/sallyxyl1993/article/details/62220424)
2、 nvidia-smi -L 命令
-
【功能】 列出所有可用的 NVIDIA 裝置
3、 nvidia-smi topo --matrix 命令
-
【功能】檢視系統拓撲
-
【說明】 要正確地利用更先進的 NVIDIA GPU 功能 (如 GPUDirect),使用系統拓撲正確配置往往是至關重要的。該拓撲指的是 PCI Express 裝置 (GPUs, InfiniBand HCAs, storage controllers, 等) 如何互相連線以及如何連線到系統的CPU。如果使用不正確的拓撲, 某些功能可能會減慢甚至停止工作
4. nvidia-smi -q -d CLOCK 命令
-
【功能】檢視當前的 GPU 時鐘速度、預設時鐘速度和最大可能的時鐘速度
5. nvidia-smi -q -d SUPPORTED_CLOCKS
-
【功能】顯示每個 GPU 的可用時鐘速度列表
6. nvidia-smi vgpu
-
【功能】 檢視當前vGPU的狀態資訊:
-
【補充說明】 虛擬圖形處理單元(vGPU)是在虛擬桌面上渲染圖形的一個元件。倘若沒有此元件,顯示如下:
7. nvidia-smi vgpu -p
-
【功能】迴圈顯示虛擬桌面中應用程式對GPU資源的佔用情況
8. nvidia-smi -q
-
【功能】 檢視當前所有GPU的資訊,也可以通過引數i指定具體的GPU。
-
通過nvidia-smi -q 我們可以獲取以下有用的資訊:
-
系統中的GPU的基本資訊
-
GPU的SN號、VBIOS、PN號等資訊:
-
-
GPU的視訊記憶體、BAR1、所有資源利用率、ECC模式等資訊:
三、各種指令引數總結
(參考:https://www.cnblogs.com/xuyuan77/p/7856487.html)
輸入指令 nvidia-smi -h
輸出如下資訊:
NVIDIA System Management Interface – v352.79
NVSMI provides monitoring information for Tesla and select Quadro devices.
The data is presented in either a plain text or an XML format, via stdout or a file.
NVSMI also provides several management operations for changing the device state.
Note that the functionality of NVSMI is exposed through the NVML C-based
library. See the NVIDIA developer website for more information about NVML.
Python wrappers to NVML are also available. The output of NVSMI is
not guaranteed to be backwards compatible; NVML and the bindings are backwards
compatible.
http://developer.nvidia.com/nvidia-management-library-nvml/
http://pypi.python.org/pypi/nvidia-ml-py/
Supported products:
Full Support
All Tesla products, starting with the Fermi architecture
All Quadro products, starting with the Fermi architecture
All GRID products, starting with the Kepler architecture
GeForce Titan products, starting with the Kepler architecture
Limited Support
All Geforce products, starting with the Fermi architecture
命令
nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...
引數
引數 |
詳解 |
-h, –help |
Print usage information and exit. |
LIST OPTIONS:
引數 |
詳解 |
-L, –list-gpus |
Display a list of GPUs connected to the system. |
SUMMARY OPTIONS:
引數 |
詳解 |
-i,–id= |
Target a specific GPU. |
-f,–filename= |
Log to a specified file, rather than to stdout. |
-l,–loop= |
Probe until Ctrl+C at specified second interval. |
QUERY OPTIONS:
引數 |
詳解 |
-q, |
–query |
-u,–unit |
Show unit, rather than GPU, attributes. |
-i,–id= |
Target a specific GPU or Unit. |
-f,–filename= |
Log to a specified file, rather than to stdout. |
-x,–xml-format |
Produce XML output. |
–dtd |
When showing xml output, embed DTD. |
-d,–display= |
Display only selected information: MEMORY, |
-l, –loop= |
Probe until Ctrl+C at specified second interval. |
-lms, –loop-ms= |
Probe until Ctrl+C at specified millisecond interval. |
SELECTIVE QUERY OPTIONS:
引數 |
詳解 |
補充 |
–query-gpu= |
Information about GPU. |
Call –help-query-gpu for more info. |
–query-supported-clocks= |
List of supported clocks. |
Call –help-query-supported-clocks for more info. |
–query-compute-apps= |
List of currently active compute processes. |
Call –help-query-compute-apps for more info. |
–query-accounted-apps= |
List of accounted compute processes. |
Call –help-query-accounted-apps for more info. |
–query-retired-pages= |
List of device memory pages that have been retired. |
Call –help-query-retired-pages for more info. |
[mandatory]
引數 |
命令 |
-i, –id= |
Target a specific GPU or Unit. |
-f, –filename= |
Log to a specified file, rather than to stdout. |
-l, –loop= |
Probe until Ctrl+C at specified second interval. |
-lms, –loop-ms= |
Probe until Ctrl+C at specified millisecond interval. |
DEVICE MODIFICATION OPTIONS:
引數 |
命令 |
補充 |
-pm, –persistence-mode= |
Set persistence mode: 0/DISABLED, 1/ENABLED |
|
-e, –ecc-config= |
Toggle ECC support: 0/DISABLED, 1/ENABLED |
|
-p, –reset-ecc-errors= |
Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE |
|
-c, –compute-mode= |
Set MODE for compute applications: |
0/DEFAULT,1/EXCLUSIVE_THREAD (deprecated),2/PROHIBITED, 3/EXCLUSIVE_PROCESS |
–gom= |
Set GPU Operation Mode: |
0/ALL_ON, 1/COMPUTE, 2/LOW_DP |
-r –gpu-reset |
Trigger reset of the GPU. |
|
UNIT MODIFICATION OPTIONS:
引數 |
命令 |
-t, –toggle-led= |
Set Unit LED state: 0/GREEN, 1/AMBER |
-i, –id= |
Target a specific Unit. |
SHOW DTD OPTIONS:
引數 |
命令 |
–dtd |
Print device DTD and exit. |
-f, –filename= |
Log to a specified file, rather than to stdout. |
-u, –unit |
Show unit, rather than device, DTD. |
–debug= |
Log encrypted debug information to a specified file. |
Process Monitoring:
引數 |
命令 |
補充 |
pmon |
Displays process stats in scrolling format. |
“nvidia-smi pmon -h” for more information. |
TOPOLOGY: (EXPERIMENTAL)
引數 |
命令 |
補充 |
topo |
Displays device/system topology. “nvidia-smi topo -h” for more information. |
Please see the nvidia-smi(1) manual page for more detailed information. |