Fix increment compilation issue

jetsonhacks · Jan 16, 2017 · ba7113e · ba7113e
1 parent b10c0e8
commit ba7113e
Show file tree

Hide file tree

Showing 289 changed files with 36 additions and 1,137 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 # installTensorFlowTX1
 December 28, 2016 
-Last modified Jan 12, 2017
+
+Last modified Jan 15, 2017
 
 Install TensorFlow r0.11 on NVIDIA Jetson TX1 Development Kit
 
@@ -54,5 +55,17 @@ $ time python tensorflow/models/image/mnist/convolutional.py
 
 #### Build Issues
 
-For various reasons, the build may fail. The 'debug' folder contains instructions on how to resume an incremental build. 
+For various reasons, the build may fail. The 'debug' folder contains a version of the buildTensorFlow.sh script which is more verbose in the way that it describes both what it is doing and errors it encounters. See the debug directory for more details.
+
+#### Notes
+As of this writing (Jan 15, 2017) the TensorFlow repository has an issue which does not allow incremental compilation to work correctly. This is due to an issue in the file:
+
+tensorflow/third_party/gpus/cuda_configure.bzl
+
+Where the rule:
+
+cuda_configure = repository_rule( implementation = _cuda_autoconf_impl, local = True, )
+
+forces Bazel to always rebuild the CUDA configuration, which in turn foobars the incremental build process. The cloneTensorFlow.sh script patches the file to remove the local = True statement. Additionally, buildTensorFlow.sh sets TensorFlow environment variables to reflect the CUDA structure of the Jetson TX1.
 
+Since v0.11 was published, the location of the zlib library being used has moved. This is also taken into account by the cloneTensorFlow.sh script, which patches the library location.
diff --git a/buildTensorFlow.sh b/buildTensorFlow.sh
@@ -1,6 +1,15 @@
 #!/bin/bash
 # NVIDIA Jetson TX1
 # TensorFlow Installation
+# Export TensorFlow GPU environment variables
+# WARNING This needs to match setTensorFlowEV.sh settings
+export TF_NEED_CUDA=1
+export TF_CUDA_VERSION=8.0
+export CUDA_TOOLKIT_PATH=/usr/local/cuda
+export TF_CUDNN_VERSION=5.1.5
+export CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu/
+export TF_CUDA_COMPUTE_CAPABILITIES=5.3
+
 # Build Tensorflow
 cd $HOME/tensorflow
 bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

diff --git a/cloneTensorFlow.sh b/cloneTensorFlow.sh
@@ -11,6 +11,7 @@ cd tensorflow
 git checkout r0.11
 patch -p1 < $INSTALL_DIR/patches/tensorflow.patch
 patch -p1 < $INSTALL_DIR/patches/bazelzlib.patch
+patch -p1 < $INSTALL_DIR/patches/cudaConfigureBazel.patch
 
 
 

diff --git a/debug/README.md b/debug/README.md
@@ -1,43 +1,12 @@
 If you're looking at this document, something has probably gone wrong with the TensorFlow build. This assumes that the script 'buildTensorFlow.sh' failed to finish properly.
 
-Here's a couple of tips:
-
-STEP 1
-
-First, as of this writing (Jan 12, 2017) running buildTensorFlow.sh a second time causes an error stating that the GPU processing is not enabled, though the CUDA flag is true in the bazel build line. This is due to an issue in the file:
-
-tensorflow/third_party/gpus/cuda_configure.bzl
-
-Where the rule:
-
-cuda_configure = repository_rule(
-    implementation = _cuda_autoconf_impl,
-    local = True,
-)
-
-forces Bazel to always rebuild the CUDA configuration, which in turn foobars the incremental build. In order to fix that, run the patch:
-
-$ ./patchCUDAConfig.sh
-
-If you attempted to run buildTensorFlow.sh before patching, you will need to replace the folder local_config_cuda in the folder:
-
-/home/ubuntu/.cache/bazel/_bazel_ubuntu/<id>/external
-
-with the one in this directory. Ideally, you would save the folder after it is generated, and before it gets overwritten. Note: This may work, or it may not. There is a 'bin' directory which has symbolic pointers which may not match your system. If that is the case, you will have to run 'setTensorFlowEV.sh' and 'buildTensorFlow.sh' again and rebuild everything.
-
-STEP 2
-
-$ source exportEV.sh
-
-Which exports the needed TF CUDA Environment variables
-
-STEP 3
+This directory contains a more verbose version of buildTensorFlow.sh
 
 $ ./buildTensorFlow.sh
 
 This is a more verbose version of the original buildTensorFlow.sh in the parent directory. A file named 'explain.txt' is generated in the $HOME/tensorflow directory.
 
-This should allow for incremental compilation at least, and show you where the build is failing.
+This should  show you where the build is failing.
 
 
 

diff --git a/debug/buildTensorFlow.sh b/debug/buildTensorFlow.sh
@@ -2,6 +2,15 @@
 # NVIDIA Jetson TX1
 # TensorFlow Installation
 # Build Tensorflow
+# Export TensorFlow GPU environment variables
+# WARNING This needs to match setTensorFlowEV.sh settings
+export TF_NEED_CUDA=1
+export TF_CUDA_VERSION=8.0
+export CUDA_TOOLKIT_PATH=/usr/local/cuda
+export TF_CUDNN_VERSION=5.1.5
+export CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu/
+export TF_CUDA_COMPUTE_CAPABILITIES=5.3
+
 cd $HOME/tensorflow
 bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --explain explain.txt --verbose_explanations --config=cuda //tensorflow/tools/pip_package:build_pip_package
 
diff --git a/debug/exportEV.sh b/debug/exportEV.sh
diff --git a/debug/local_config_cuda/WORKSPACE b/debug/local_config_cuda/WORKSPACE
diff --git a/debug/local_config_cuda/crosstool/BUILD b/debug/local_config_cuda/crosstool/BUILD