






  • CPU: i7-6850K
  • M/B: ASUS X99-E WS
  • GPU: ELSA GD1080-8GERXG *2


  • Ubuntu 16.04 LTS
  • CUDA 8.0
  • cuDNN 5.1
  • TensorFlow 0.12.1
  • Caffe 1.0.0-rc4
  • Chainer 1.21.0 (たぶん)



$ sudo apt-get -y install resolvconf

$ sudo nmcli con mod eno1 ipv4.method manual
$ sudo nmcli con mod eno1 ipv4.address
$ sudo nmcli con mod eno1 ipv4.dns
$ sudo nmcli con mod eno1 ipv4.gateway

$ sudo nmcli con down eno1 && sudo nmcli con up eno1


$ sudo apt-get update
$ sudo apt-get -y upgrade
$ sudo apt-get -y dist-upgrade


$ sudo apt-get -y install openssh-server

$ sudo systemctl start sshd
$ sudo systemctl enable sshd




$ sudo dpkg -l | grep nvidia
$ sudo dpkg -l | grep cuda


リポジトリ(Proprietary GPU Drivers : “Graphics Drivers” team)
途中で,secure bootを無効にするかを聞かれたが,無効にせずに続行した

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

$ sudo apt-get -y install nvidia-370


$ sudo reboot

$ nvidia-smi
NVIDIA-SMI has failed because it couldn\'t communicate with the NVIDIA driver

BIOSでSecure Bootを無効化(Boot->Secure Bootで,OS TypeOther OSに変更)すると認識できるようになった

$ nvidia-smi
Mon Jan 16 17:24:34 2017
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1080    Off  | 0000:05:00.0      On |                  N/A |
| 43%   37C    P8     9W / 220W |     52MiB /  8110MiB |      0%      Default |
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   32C    P8     8W / 220W |      1MiB /  8113MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    0      1176    G   /usr/lib/xorg/Xorg                              50MiB |


CUDA 8.0 Downloads | NVIDIA DeveloperLinux -> x86_64 -> Ubuntu -> 16.04 -> runfile(local)の順に進み,Downloadからダウンロードし,指示通りにインストールする


$ sudo apt-get update (省略) W: Invalid 'Date' entry in Release file /var/lib/apt/lists/partial/developer.download.nvidia.com_compute_cuda_repos_ubuntu1604_x86%5f64_Release

$ chmod u+x cuda_8.0.44_linux-run
$ sudo ./cuda_8.0.44_linux-run

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is "ホームディレクトリ" ]:

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_1804.log


$ sudo ./cuda_8.0.44_linux-run -silent -driver
$ sudo reboot


$ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

$ source ~/.bashrc
$ sudo ldconfig

$ which nvcc



$ sudo apt-get install mesa-common-dev freeglut3-dev

$ cd NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
$ make

$ ./nbody -benchmark -numbodies=256000 -device=0
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX 1080]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2365.614 ms
= 277.036 billion interactions per second
= 5540.718 single-precision GFLOP/s at 20 flops per interaction



$ python --version
Python 2.7.12

$ sudo apt-get -y install python-pip python-dev


NVIDIA cuDNN | NVIDIA DeveloperからcuDNN v5.1 (Jan 20, 2017), for CUDA 8.0cuDNN v5.1 Library for Linuxをダウンロードする(要登録)

$ tar xf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo mv cuda/include/cudnn.h /usr/local/cuda/include
$ sudo mv cuda/lib64/libcudnn* /usr/local/cuda/lib64/



Installing Bazel - Bazelを参考にインストールを行う.

$ sudo apt-get -y install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

$ sudo apt-get update
$ sudo apt-get -y install bazel



$ ls /usr/local/cuda/lib64/ | grep libcudnn

./configureのところでcuDNNのバージョンに5.1とかを入れると,下のエラーがでたので気をつける Invalid path to cuDNN toolkit. Neither of the following two files can be found: /usr/local/cuda-8.0/lib64/libcudnn.so.5.0 /usr/local/cuda-8.0/libcudnn.so.5.0 .5.0

$ sudo apt-get -y install python-numpy

$ git clone https://github.com/tensorflow/tensorflow
$ sudo mv tensorflow /usr/local/src/

$ cd /usr/local/src/tensorflow


$ ./configure
Please specify the location of python. [Default is /usr/bin/python]:
Please specify optimization flags to use during compilation [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow


Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.1.5
Please specify the location where cuDNN 5.1.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.

GTX 1018を使用するので6.1を指定

[Default is: "3.5,5.2"]: 6.1
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
INFO: All external dependencies fetched successfully.
Configuration finished


$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl


チュートリアル(Deep MNIST for Experts  |  TensorFlow) のコードがTensorFlowを遊び倒す! 2-1. MNIST For Experts - Platinum Data Blog by BrainPad にいい感じでまとまっているので,これを実行する.


  1. 4行目をimport input_dataからfrom tensorflow.examples.tutorials.mnist import input_dataに変更する
  2. strides...の後のコメント直前に紛れ込んでいる全角スペースを消す
    return tf.nn.max_pool(x,
                          ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], # 真ん中2つが縦横のストライド


$ time python test_mnist.py
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
2017-01-31 15:54:53: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:05: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-01-31 15:55:07: W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3f8d150
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
2017-01-31 15:55:07: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)


step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
2017-01-31 16:07:51: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
test accuracy 0.9933

real    1m52.055s
user    2m30.684s
sys     0m34.948s



Caffe | Installation: Ubuntuに書いている通りに依存関係をインストールする

$ sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
$ sudo apt-get -y install --no-install-recommends libboost-all-dev

$ sudo apt-get -y install libatlas-base-dev libopenblas-dev


$ sudo apt-get -y install cmake
$ sudo apt-get -y install liblmdb-dev libgflags-dev libgoogle-glog-dev doxygen
$ sudo apt-get -y install python-skimage

Caffe | Installationの通りにビルドする

$ git clone https://github.com/BVLC/caffe
$ sudo mv caffe /usr/local/src/

$ cd /usr/local/src/caffe
$ mkdir build
$ cd build
$ cmake ..
$ make all -j$(nproc)
$ make pycaffe -j$(nproc)
$ make install


$ make runtest
$ make pytest


Caffe | LeNet MNIST Tutorialの通りに実行する

$ cd /usr/local/src/caffe
$ ./data/mnist/get_mnist.sh
$ ./examples/mnist/create_mnist.sh


$ ./examples/mnist/train_lenet.sh


$ nvidia-smi
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1080    Off  | 0000:05:00.0     Off |                  N/A |
| 43%   42C    P2    36W / 220W |      2MiB /  8111MiB |      0%      Default |
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   45C    P2    99W / 220W |    245MiB /  8113MiB |     72%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    1      1320    C   ./build/tools/caffe                            243MiB |



$ sudo ln -s /usr/local/src/caffe/build/install /usr/local/caffe


$ vim ~/.bashrc
export PATH='/usr/local/caffe/bin:$PATH'
export LD_LIBRARY_PATH='/usr/local/caffe/lib:$LD_LIBRARY_PATH'
export PYTHONPATH="/usr/local/caffe/python:$PYTHONPATH"



$ sudo apt-get install libhdf5-dev python-5py


$ git clone https://github.com/pfnet/chainer.git
$ sudo mv chainer /usr/local/src/




$ export CFLAGS=-I/usr/local/cuda/include
$ export LDFLAGS=-L/usr/local/cuda/lib64
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH


$ sudo pip install cython pillow h5py


$ cd /usr/local/src/chainer
$ sudo CUDA_PATH=/usr/local/cuda pip install -e .


$ python
>>> import chainer


$ python /usr/local/src/chainer/examples/mnist/train_mnist.py --gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

Downloading from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz...
Downloading from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz...
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190653    0.0928411             0.9429         0.97                      12.4203
2           0.0732004   0.0702411             0.9773         0.9784                    14.8764
3           0.0492653   0.0659305             0.984616       0.9805                    17.2205
4           0.0347082   0.0761442             0.988432       0.98                      19.5639
5           0.0270766   0.0734653             0.991231       0.9775                    21.9413
6           0.0251563   0.072476              0.991882       0.9805                    24.2833
7           0.0199481   0.0731711             0.993382       0.9806                    26.6206
8           0.0196817   0.10124               0.993565       0.9754                    28.953
9           0.0164322   0.0876729             0.994548       0.9809                    31.2958
10          0.0154422   0.122923              0.995282       0.9742                    33.6844
11          0.0168969   0.110708              0.994832       0.9767                    36.0464
12          0.0107279   0.0856068             0.996665       0.9813                    38.3777
13          0.0117293   0.0990945             0.996215       0.9795                    40.7093
14          0.00978297  0.120509              0.996566       0.9777                    43.0557
15          0.0117472   0.107208              0.996866       0.9789                    45.4293
16          0.0113395   0.0992631             0.996682       0.981                     47.7726
17          0.00929236  0.0937098             0.997383       0.9828                    50.1157
18          0.0103718   0.104585              0.997149       0.9796                    52.4636
19          0.00638133  0.0990586             0.997999       0.981                     54.8122
20          0.00843088  0.105776              0.997666       0.9823                    57.1418


$ nvidia-smi
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1080    Off  | 0000:05:00.0     Off |                  N/A |
| 43%   42C    P0    45W / 220W |      2MiB /  8111MiB |      0%      Default |
|   1  GeForce GTX 1080    Off  | 0000:06:00.0     Off |                  N/A |
| 43%   36C    P2    41W / 220W |    117MiB /  8113MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    1      3023    C   python                                         115MiB |


