Ubuntu16.04でのDeepLearning用環境構築

Ubuntuのインストール前に，SecureBootを無効にしておいた方が良いかもしれない
本記事では，途中で気づいたのでその時に無効にした

構成

ハードウェア

CPU: i7-6850K
M/B: ASUS X99-E WS
GPU: ELSA GD1080-8GERXG *2

ソフトウェア

Ubuntu 16.04 LTS
CUDA 8.0
cuDNN 5.1
TensorFlow 0.12.1
Caffe 1.0.0-rc4
Chainer 1.21.0 (たぶん)

初期設定

ネットワーク

$ sudo apt-get -y install resolvconf

$ sudo nmcli con mod eno1 ipv4.method manual
$ sudo nmcli con mod eno1 ipv4.address 192.168.1.23/24
$ sudo nmcli con mod eno1 ipv4.dns 192.168.1.1
$ sudo nmcli con mod eno1 ipv4.gateway 192.168.1.1

$ sudo nmcli con down eno1 && sudo nmcli con up eno1

アップデート

$ sudo apt-get update
$ sudo apt-get -y upgrade
$ sudo apt-get -y dist-upgrade

OpenSSHのインストール

$ sudo apt-get -y install openssh-server

$ sudo systemctl start sshd
$ sudo systemctl enable sshd

CUDA

確認

以下のコマンドで，何も出てこないことを確認する

$ sudo dpkg -l | grep nvidia
$ sudo dpkg -l | grep cuda

Nvidiaドライバのインストール

リポジトリ(Proprietary GPU Drivers : “Graphics Drivers” team)
を登録して，ドライバをインストールする
途中で，secure bootを無効にするかを聞かれたが，無効にせずに続行した

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

$ sudo apt-get -y install nvidia-370

再起動し，GPUが認識されているかを確認すると下のメッセージが表示され，GPUが認識されなかった

$ sudo reboot

$ nvidia-smi
NVIDIA-SMI has failed because it couldn\'t communicate with the NVIDIA driver

BIOSでSecure Bootを無効化(Boot->Secure Bootで，OS TypeをOther OSに変更)すると認識できるようになった

$ nvidia-smi
Mon Jan 16 17:24:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:05:00.0      On |                  N/A |
| 43%   37C    P8     9W / 220W |     52MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   32C    P8     8W / 220W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1176    G   /usr/lib/xorg/Xorg                              50MiB |
+-----------------------------------------------------------------------------+

CUDA本体のインストール

CUDA 8.0 Downloads | NVIDIA DeveloperでLinux -> x86_64 -> Ubuntu -> 16.04 -> runfile(local)の順に進み，Downloadからダウンロードし，指示通りにインストールする

debパッケージを使用すると，下のエラーが出たのでrunfileを使用することにしている

$ sudo apt-get update (省略) W: Invalid 'Date' entry in Release file /var/lib/apt/lists/partial/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Release

$ chmod u+x cuda_8.0.44_linux-run
$ sudo ./cuda_8.0.44_linux-run

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is "ホームディレクトリ" ]:

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_1804.log

メッセージ通りに，以下のコマンドでインストールする

$ sudo ./cuda_8.0.44_linux-run -silent -driver
$ sudo reboot

パスを通す

$ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

$ source ~/.bashrc
$ sudo ldconfig

$ which nvcc

インストールの確認

ホームディレクトリにインストールしたサンプルコードを実行する

$ sudo apt-get install mesa-common-dev freeglut3-dev

$ cd NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
$ make

$ ./nbody -benchmark -numbodies=256000 -device=0
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX 1080]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2365.614 ms
= 277.036 billion interactions per second
= 5540.718 single-precision GFLOP/s at 20 flops per interaction

Python

インストールの確認のためにPythonの環境を構築しておく

$ python --version
Python 2.7.12

$ sudo apt-get -y install python-pip python-dev

cuDNN

NVIDIA cuDNN | NVIDIA DeveloperからcuDNN v5.1 (Jan 20, 2017), for CUDA 8.0のcuDNN v5.1 Library for Linuxをダウンロードする(要登録)

$ tar xf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo mv cuda/include/cudnn.h /usr/local/cuda/include
$ sudo mv cuda/lib64/libcudnn* /usr/local/cuda/lib64/

TensorFlow

Bazelのインストール

Installing Bazel - Bazelを参考にインストールを行う．

$ sudo apt-get -y install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

$ sudo apt-get update
$ sudo apt-get -y install bazel

TensorFlow本体のインストール

cuDNNのバージョンを確認しておく．今回は5.1.5．

$ ls /usr/local/cuda/lib64/ | grep libcudnn
libcudnn.so
libcudnn.so.5
libcudnn.so.5.1.5
libcudnn_static.a

./configureのところでcuDNNのバージョンに5.1とかを入れると，下のエラーがでたので気をつける Invalid path to cuDNN toolkit. Neither of the following two files can be found: /usr/local/cuda-8.0/lib64/libcudnn.so.5.0 /usr/local/cuda-8.0/libcudnn.so.5.0 .5.0

$ sudo apt-get -y install python-numpy

$ git clone https://github.com/tensorflow/tensorflow
$ sudo mv tensorflow /usr/local/src/

$ cd /usr/local/src/tensorflow

./configureする

$ ./configure
Please specify the location of python. [Default is /usr/bin/python]:
Please specify optimization flags to use during compilation [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow

CUDAサポートはyにする

Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

CUDAのバージョンを指定

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

cuDNNのバージョンを指定

Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.1.5
Please specify the location where cuDNN 5.1.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.

GTX 1018を使用するので6.1を指定

[Default is: "3.5,5.2"]: 6.1
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.......
INFO: All external dependencies fetched successfully.
Configuration finished

ビルド

$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl

テストコードを動かしてみる

チュートリアル(Deep MNIST for Experts | TensorFlow) のコードがTensorFlowを遊び倒す！ 2-1. MNIST For Experts - Platinum Data Blog by BrainPad にいい感じでまとまっているので，これを実行する．

そのままだとエラーが出たので

4行目をimport input_dataからfrom tensorflow.examples.tutorials.mnist import input_dataに変更する
strides...の後のコメント直前に紛れ込んでいる全角スペースを消す

    return tf.nn.max_pool(x,
                          ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1],　# 真ん中2つが縦横のストライド
                          padding='SAME')

テストコードを実行

$ time python test_mnist.py
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-01-31 15:55:07: W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3f8d150
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)

(省略)

step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
2017-01-31 16:07:51: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
test accuracy 0.9933

real    1m52.055s
user    2m30.684s
sys     0m34.948s

一応GPUを使って動いてそう．

Caffe

Caffe | Installation: Ubuntuに書いている通りに依存関係をインストールする

$ sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
$ sudo apt-get -y install --no-install-recommends libboost-all-dev

$ sudo apt-get -y install libatlas-base-dev libopenblas-dev

その他必要なものをインストール

$ sudo apt-get -y install cmake
$ sudo apt-get -y install liblmdb-dev libgflags-dev libgoogle-glog-dev doxygen
$ sudo apt-get -y install python-skimage

Caffe | Installationの通りにビルドする

$ git clone https://github.com/BVLC/caffe
$ sudo mv caffe /usr/local/src/

$ cd /usr/local/src/caffe
$ mkdir build
$ cd build
$ cmake ..
$ make all -j$(nproc)
$ make pycaffe -j$(nproc)
$ make install

テスト

$ make runtest
$ make pytest

テストコードを動かしてみる

Caffe | LeNet MNIST Tutorialの通りに実行する

$ cd /usr/local/src/caffe
$ ./data/mnist/get_mnist.sh
$ ./examples/mnist/create_mnist.sh

学習用のスクリプトを動かす

$ ./examples/mnist/train_lenet.sh

学習中にnvidia-smiしてみるとGPUが使用されていることがわかる

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:05:00.0     Off |                  N/A |
| 43%   42C    P2    36W / 220W |      2MiB /  8111MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   45C    P2    99W / 220W |    245MiB /  8113MiB |     72%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      1320    C   ./build/tools/caffe                            243MiB |
+-----------------------------------------------------------------------------+

インストール

よくわからなかったので，とりあえずシンボリックリンクを貼ってパスを通しておく

$ sudo ln -s /usr/local/src/caffe/build/install /usr/local/caffe

~/.bashrcでパスを通す

$ vim ~/.bashrc
export PATH='/usr/local/caffe/bin:$PATH'
export LD_LIBRARY_PATH='/usr/local/caffe/lib:$LD_LIBRARY_PATH'
export PYTHONPATH="/usr/local/caffe/python:$PYTHONPATH"

Chainer

依存関係のインストール

$ sudo apt-get install libhdf5-dev python-5py

ダウンロード

$ git clone https://github.com/pfnet/chainer.git
$ sudo mv chainer /usr/local/src/

ビルド

README.mdを参考に，以下の作業を行う

cuDNNを参照できるように~/.bashrcに追加する

$ export CFLAGS=-I/usr/local/cuda/include
$ export LDFLAGS=-L/usr/local/cuda/lib64
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

依存関係をインストール

$ sudo pip install cython pillow h5py

chainer本体をインストール

$ cd /usr/local/src/chainer
$ sudo CUDA_PATH=/usr/local/cuda pip install -e .

確認

$ python
>>> import chainer
>>>

サンプルを動かす

$ python /usr/local/src/chainer/examples/mnist/train_mnist.py --gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz...
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190653    0.0928411             0.9429         0.97                      12.4203
2           0.0732004   0.0702411             0.9773         0.9784                    14.8764
3           0.0492653   0.0659305             0.984616       0.9805                    17.2205
4           0.0347082   0.0761442             0.988432       0.98                      19.5639
5           0.0270766   0.0734653             0.991231       0.9775                    21.9413
6           0.0251563   0.072476              0.991882       0.9805                    24.2833
7           0.0199481   0.0731711             0.993382       0.9806                    26.6206
8           0.0196817   0.10124               0.993565       0.9754                    28.953
9           0.0164322   0.0876729             0.994548       0.9809                    31.2958
10          0.0154422   0.122923              0.995282       0.9742                    33.6844
11          0.0168969   0.110708              0.994832       0.9767                    36.0464
12          0.0107279   0.0856068             0.996665       0.9813                    38.3777
13          0.0117293   0.0990945             0.996215       0.9795                    40.7093
14          0.00978297  0.120509              0.996566       0.9777                    43.0557
15          0.0117472   0.107208              0.996866       0.9789                    45.4293
16          0.0113395   0.0992631             0.996682       0.981                     47.7726
17          0.00929236  0.0937098             0.997383       0.9828                    50.1157
18          0.0103718   0.104585              0.997149       0.9796                    52.4636
19          0.00638133  0.0990586             0.997999       0.981                     54.8122
20          0.00843088  0.105776              0.997666       0.9823                    57.1418

を実行中にnvidia-smiで確認する

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:05:00.0     Off |                  N/A |
| 43%   42C    P0    45W / 220W |      2MiB /  8111MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   36C    P2    41W / 220W |    117MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      3023    C   python                                         115MiB |
+-----------------------------------------------------------------------------+

最後に

とりあえず，インストールとサンプルの実行はできた

プログラミングとかLinuxとかの備忘録

プログラミング、Linuxでハマった箇所や環境構築のメモ

構成

ハードウェア

ソフトウェア

初期設定

ネットワーク

アップデート

OpenSSHのインストール

CUDA

確認

Nvidiaドライバのインストール

CUDA本体のインストール

インストールの確認

Python

cuDNN

TensorFlow

Bazelのインストール

TensorFlow本体のインストール

テストコードを動かしてみる

Caffe

テストコードを動かしてみる

インストール

Chainer

依存関係のインストール

ダウンロード

ビルド

確認

サンプルを動かす

最後に

References