網上搜的NVIDIA顯示卡驅動ubuntu安裝的知識。。。。。挺詳盡的或許有用

阿新 • • 發佈：2018-12-13

Installing Nvidia CUDA 8.0 on Ubuntu 16.04 for Linux GPU Computing (New Troubleshooting Guide)
釋出日期: 釋出日期: 2017 年 4 月 1 日
Victor Oliveira Antonino

If you want to train deep neural networks, you should probably be familiar with packages like Caffe, Keras, TensorFlow, Theano, and Torch. These libraries use GPU computation power that you will probably want to use to further speed up training, which can be very long on CPU. No news so far, specially if you are an experienced machine learning engineer. However, the experience of installing CUDA on Ubuntu may be very frustrating.

These are the most frequent causes:

You were greeted by a black screen after installing Nvidia Driver
You got stuck in “login loop” after installing Nvidia Driver
When you tried to run the base installer (cuda_<version>_linux.run) you received this lovely message (specially on EC2 instances) : "The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly. If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag."
Even though there are tons of tutorials over the web, I have lost a considerable amount of time and I have spent days installing CUDA on Ubuntu over different computers, whether laptops or desktops. You might be familiar with most of the steps presented here, so don't mind jumping a few steps until you find something useful.

Kill your current X server session by pressing CTRL+ALT+F1 and login using your credentials.
sudo service lightdm stop
Why?
X is an application that manages one or more graphic displays. Makes total sense to disable it since its main component is responsible for resizing and moving of windows, decorative elements, title bars, minimize, close buttons, etc. [Ref]

1. Update your system

sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get dist-upgrade -y
Why?
Keeping your system up to date is essential, right? Ubuntu images are not updated constantly and you are probably using a snapshot from a point in time. [Ref]

2. Install build-essential package

sudo apt-get install build-essential
Why?
If some library needs a C/C++ compiler, you need to install build-essential. [Ref]

3. Blacklist the "nouveau" driver

echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
Reboot the computer and repeat step 1.

Why?
Nouveau is a free and open-source driver developed by reverse engineering Nvidia's proprietary Linux drivers. We can't use it for multiple reasons: inferior performance compared to Nvidia's proprietary graphics device drivers, no CUDA support, and we need to configure the xserver accordingly to avoid black screen/login loop issues, in other words, let's disable conflicting modules.

4. Install linux kernel modules

When asked about grub changes select choose package maintainers version.

apt-get install linux-image-extra-virtual
Why?
This is tricky. Especially if you are using an EC2 instance. This link gives you a good explanation why this is needed. However, I will quote the important piece:

"Nvidia's driver depends on the drm module, but that's not included in the default 'virtual' ubuntu that's on the cloud (as it usually has no graphics). It's available in the linux-image-extra-virtual package (and linux-image-generic supposedly), but just installing those directly will install the drm module for the NEWEST available kernel, not the one we're currently running. Hence, we need to specify the version manually. This command will probably need to be re-run every time you upgrade the kernel and reboot."

5. Install linux source and headers

apt-get install linux-source
apt-get source linux-image-$(uname -r)
apt-get install linux-headers-$(uname -r)
Why?
This is also needed to avoid the "unable to locate the kernel source" message!

CUDA toolkit documentation may not be very appealing to some, but I will also quote another important piece that explicitly says:

"The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed."

5. Install CUDA 8.0

Run the following commands:

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
sudo sh cuda_8.0.61_375.26_linux.run --override --no-opengl-lib
Your log may be similar to this:

Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26? ((y)es/(n)o/(q)uit): y
Install the CUDA 8.0 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 8.0 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/user ]: /usr/local/cuda-8.0
Why?
The "--override" is needed so you don't get the error, "Toolkit: Installation Failed. Using unsupported Compiler."

The "--no-opengl-lib" prevents the driver installation from installing NVIDIA's GL libraries. Useful for systems where the display is driven by a non-NVIDIA GPU. In such systems, NVIDIA's GL libraries could prevent X from loading properly. This flag is very important to avoid getting stuck in “login loop” or black screen!

Wait.. something is still not quite right! I am still receiving a message saying 'the driver installation is unable to locate the kernel source'. Even though I am using the flag --kernel-source-path=<path> !!!!

So.. let's check the following log file:

sudo vi /var/log/nvidia-installer.log
It says:

"ERROR: The kernel module failed to load, because it was not signed by a key that is trusted by the kernel. Please try installing the driver again, and sign the kernel module when prompted to do so.

ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release."

Usually the error "Unable to load the kernel module 'nvidia.ko'" is associated with dkms and installing linux kernel modules on step 4 might be enough. [See here]

However, my experience installing CUDA on a desktop computer showed me something different. Especially because of what the first paragraph says!

And there you have it:

Many linux distributions require modules to be cryptographically signed by a key trusted by the kernel when these modules are loaded into kernels running on UEFI systems with Secure Boot enabled. For those who did not get the last piece, the Unified Extensible Firmware Interface (UEFI) is a specification that defines a software interface between an operating system and platform firmware. UEFI replaces the Basic Input/Output System (BIOS) firmware interface originally present in all IBM PC-compatible personal computers.

Here, you can find details about how to generate signing keys in nvidia-installer.

Easy alternative? Disable UEFI Secure Boot (if possible), or use a kernel that doesn't require signed modules.

How to disable Secure Boot on Ubuntu, then!?!?

Since Ubuntu kernel build 4.4.0-21.37 this can be fixed by running:

sudo apt install mokutil
sudo mokutil --disable-validation
Since questions may arise, see third party kernel modules on UEFI with enabled Secure Boot and the consequences of disabling it.

I hope after this you were able to see the "beautiful" nvidia-smi message on your terminal, similar to the one above.

網上搜的NVIDIA顯示卡驅動ubuntu安裝的知識。。。。。挺詳盡的或許有用

網上搜的NVIDIA顯示卡驅動ubuntu安裝的知識。。。。。挺詳盡的或許有用

Ubuntu下安裝nvidia顯示卡驅動（安裝方式簡單）

【Linux】手動安裝nvidia顯示卡驅動 ---- Ubuntu 14.04

Ubuntu16.04 Nvidia顯示卡驅動簡明安裝指南

ubuntu 16.04 +nvidia顯示卡驅動+caffe+cudnn安裝過程及問題總結

在Ubuntu烏班圖虛擬機器中安裝nvidia顯示卡驅動導致迴圈跳到登入介面

Ubuntu安裝和NVIDIA驅動和安裝(.run方法--有時.deb方法安裝時NVIDIA顯示卡驅動裝不上)

解決Nvidia顯示卡的電腦安裝Ubuntu及驅動的各種坑

Ubuntu 18.04 安裝NVIDIA顯示卡驅動教程

ubuntu安裝nvidia顯示卡驅動(雙顯示卡)

Ubuntu 16.04系統中利用CUDA安裝更新NVIDIA顯示卡驅動程式的方法

Ubuntu 16.04 解除安裝Nvidia顯示卡驅動和cuda

[Linux]ubuntu 下安裝NVIDIA顯示卡驅動出現X service error問題解決方法

CUDA（32）之Ubuntu下安裝Nvidia顯示卡驅動和Cuda之後，回頭聊聊這兩個磨人的小妖精

ubuntu 下安裝NVIDIA顯示卡驅動出現X service error問題解決方法

ubuntu16.04安裝Nvidia顯示卡驅動,CUDA8.0,cuDNN6,TensorFlow-gpu

Linux 桌面玩家指南：11. 在同一個硬碟上安裝多個 Linux 發行版以及為 Linux 安裝 Nvidia 顯示卡驅動

[專業親測]Ubuntu16.04安裝Nvidia顯示卡驅動（cuda）--解決你的所有困惑

centos7 安裝nvidia顯示卡驅動的方法

深度學習開發環境配置第一彈：Ubuntu16.04下安裝NVIDIA顯示卡驅動+CUDA9.0.176配置

網上搜的NVIDIA顯示卡驅動ubuntu安裝的知識。。。。。挺詳盡的或許有用

相關推薦