CUDA-version之间的切换

CUDA-version之间的切换

有时候见到模型需要的cuda的version要求不一样,所以在不同的cuda之间进行切换就很重要。比如目前常用的有

9.2的,9.0的,8.0的。就可以一下子安着三个,然后看具体用哪个在进行相应的切换就可以了,其实质上就是建立不同的软链接。

假设已经用deb的安装好了一个cuda之后,下面是下载不同version的runfile格式的进行安装的(见网上是这样介绍的,具体不是runfile的自己没有试过)。

下载好runfile文件之后,执行

sudo chmod 777 -R cuda_9.0.176_384.81_linux.run 
sudo ./cuda_9.0.176_384.81_linux.run 

然后 一直按着enter读完更多

接下来是这样的

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: n

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...



其中可以根据自己的选择,因为我安装了更高的驱动了就选择了n,暂时没有创建软链接是因为等会儿自己想看看具体的操作。

出的结如下

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-9.0
Samples:  Not Selected

Please make sure that
 -   PATH includes /usr/local/cuda-9.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_1976.log


注意上面有许多的提醒,可以不用管,不过环境变量最好还是加上,即上面提醒的

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64


这是我电脑上面的,原因是我已经弄了softlink,所以是/usr/local/cuda下面。在安装的时候会提醒要不要安装驱动,注意这个如果已经安装了的话就不需要了,我觉得这个最好还是别点吧,最好就用一个稍微高version的驱动器。

现在在 /usr/local下面已经有2个了。

bin cuda-9.0 cuda-9.2 etc games include lib man sbin share src

建立

sudo ln -s /usr/local/cuda-9.0/ /usr/local/cuda

就可以看到多了个高亮的cuda

如果想换到其它version,就把这个链接删掉从新建立一个就可以了。

之前碰到的一个bug

之前我的pc上碰到了一个这样的bug,即我安装了cuda9.0和cuda9.2也配了环境变量,但是我nvcc --version的时候看到的却总是cuda7.5的,后来我想,可能是我在安装torch的时候它里面自己安装了nvidia-cuda-toolkit吧,然后我就在根目录下查找nvcc的文件 cd /, sudo find -name "nvcc",然后就出现了有nvcc的地址, /usr/bin/nvcc, ‘/usr/lib/nvidia-cuda-toolkit/nvcc’, 还有我自己安装的两个, 我把前面的两个全部删除了之后,再建立了一个cuda9.0的软链接’/usr/local/cuda’,然后就能看到cuda的version是9.0了。

其实可能在于在安装nvidia-cuda-toolkit的时候把环境变量弄到’/usr/bin/nvcc’那里去了,其实把’/usr/bin/nvcc’ 这个delete掉就可以了。

另外一个错

gzip: stdin: invalid compressed data--format violated
Extraction failed.
Ensure there is enough space in /home and that the installation package is not corrupt
Signal caught, cleaning up


这时候可能有两种情况,一是没有下载好,重新下载安装文件,二是因为空间不足导致的,可以这样解决

sudo mkdir /home/tmp2
 
sudo chmod 1777 /home/tmp2
 
export TMPDIR=/home/tmp2
 
sudo sh cuda_9.0.176_384.81_linux.run --tmpdir=./home/tmp2



测试是否安装成功

测试CUDA

命令行运行以下三行:

cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery

make

sudo ./deviceQuery

结果显示如下


(py35) pengkun@ubuntu:~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ sudo ./deviceQuery 
[sudo] pengkun 的密码: 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          9.2 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6076 MBytes (6371475456 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 5 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
(py35) pengkun@ubuntu:~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ 

发现cuda driver version和cuda runtime 的version不一样,可能是因为我最初安的是9.2吧。

在另一台机器上看的时候结果是这样的

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          10.0 / 9.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 10988 MBytes (11521884160 bytes)
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
  (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
  GPU Max Clock rate:                            1635 MHz (1.63 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 5767168 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          10.0 / 9.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 10989 MBytes (11523260416 bytes)
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
  (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
  GPU Max Clock rate:                            1635 MHz (1.63 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 5767168 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 9 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce RTX 2080 Ti (GPU0) -> GeForce RTX 2080 Ti (GPU1) : Yes
> Peer access from GeForce RTX 2080 Ti (GPU1) -> GeForce RTX 2080 Ti (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.0, NumDevs = 2
Result = PASS


所以我理解 CUDA Driver指的是nvidia的驱动的version, 驱动version和运行时不一定要一样,前者高就行。这两者的关系应该是前者是管理硬件GPU的, 后者是处理运行时候的。

打赏,谢谢~~

取消

感谢您的支持,我会继续努力的!

扫码支持
扫码打赏,多谢支持~

打开微信扫一扫,即可进行扫码打赏哦