Skip to content

Commit

Permalink
Fix increment compilation issue
Browse files Browse the repository at this point in the history
  • Loading branch information
jetsonhacks committed Jan 16, 2017
1 parent b10c0e8 commit ba7113e
Show file tree
Hide file tree
Showing 289 changed files with 36 additions and 1,137 deletions.
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# installTensorFlowTX1
December 28, 2016
Last modified Jan 12, 2017

Last modified Jan 15, 2017

Install TensorFlow r0.11 on NVIDIA Jetson TX1 Development Kit

Expand Down Expand Up @@ -54,5 +55,17 @@ $ time python tensorflow/models/image/mnist/convolutional.py

#### Build Issues

For various reasons, the build may fail. The 'debug' folder contains instructions on how to resume an incremental build.
For various reasons, the build may fail. The 'debug' folder contains a version of the buildTensorFlow.sh script which is more verbose in the way that it describes both what it is doing and errors it encounters. See the debug directory for more details.

#### Notes
As of this writing (Jan 15, 2017) the TensorFlow repository has an issue which does not allow incremental compilation to work correctly. This is due to an issue in the file:

tensorflow/third_party/gpus/cuda_configure.bzl

Where the rule:

cuda_configure = repository_rule( implementation = _cuda_autoconf_impl, local = True, )

forces Bazel to always rebuild the CUDA configuration, which in turn foobars the incremental build process. The cloneTensorFlow.sh script patches the file to remove the local = True statement. Additionally, buildTensorFlow.sh sets TensorFlow environment variables to reflect the CUDA structure of the Jetson TX1.

Since v0.11 was published, the location of the zlib library being used has moved. This is also taken into account by the cloneTensorFlow.sh script, which patches the library location.
9 changes: 9 additions & 0 deletions buildTensorFlow.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
#!/bin/bash
# NVIDIA Jetson TX1
# TensorFlow Installation
# Export TensorFlow GPU environment variables
# WARNING This needs to match setTensorFlowEV.sh settings
export TF_NEED_CUDA=1
export TF_CUDA_VERSION=8.0
export CUDA_TOOLKIT_PATH=/usr/local/cuda
export TF_CUDNN_VERSION=5.1.5
export CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu/
export TF_CUDA_COMPUTE_CAPABILITIES=5.3

# Build Tensorflow
cd $HOME/tensorflow
bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
Expand Down
1 change: 1 addition & 0 deletions cloneTensorFlow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ cd tensorflow
git checkout r0.11
patch -p1 < $INSTALL_DIR/patches/tensorflow.patch
patch -p1 < $INSTALL_DIR/patches/bazelzlib.patch
patch -p1 < $INSTALL_DIR/patches/cudaConfigureBazel.patch



Expand Down
35 changes: 2 additions & 33 deletions debug/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,12 @@
If you're looking at this document, something has probably gone wrong with the TensorFlow build. This assumes that the script 'buildTensorFlow.sh' failed to finish properly.

Here's a couple of tips:

STEP 1

First, as of this writing (Jan 12, 2017) running buildTensorFlow.sh a second time causes an error stating that the GPU processing is not enabled, though the CUDA flag is true in the bazel build line. This is due to an issue in the file:

tensorflow/third_party/gpus/cuda_configure.bzl

Where the rule:

cuda_configure = repository_rule(
implementation = _cuda_autoconf_impl,
local = True,
)

forces Bazel to always rebuild the CUDA configuration, which in turn foobars the incremental build. In order to fix that, run the patch:

$ ./patchCUDAConfig.sh

If you attempted to run buildTensorFlow.sh before patching, you will need to replace the folder local_config_cuda in the folder:

/home/ubuntu/.cache/bazel/_bazel_ubuntu/<id>/external

with the one in this directory. Ideally, you would save the folder after it is generated, and before it gets overwritten. Note: This may work, or it may not. There is a 'bin' directory which has symbolic pointers which may not match your system. If that is the case, you will have to run 'setTensorFlowEV.sh' and 'buildTensorFlow.sh' again and rebuild everything.

STEP 2

$ source exportEV.sh

Which exports the needed TF CUDA Environment variables

STEP 3
This directory contains a more verbose version of buildTensorFlow.sh

$ ./buildTensorFlow.sh

This is a more verbose version of the original buildTensorFlow.sh in the parent directory. A file named 'explain.txt' is generated in the $HOME/tensorflow directory.

This should allow for incremental compilation at least, and show you where the build is failing.
This should show you where the build is failing.



Expand Down
9 changes: 9 additions & 0 deletions debug/buildTensorFlow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@
# NVIDIA Jetson TX1
# TensorFlow Installation
# Build Tensorflow
# Export TensorFlow GPU environment variables
# WARNING This needs to match setTensorFlowEV.sh settings
export TF_NEED_CUDA=1
export TF_CUDA_VERSION=8.0
export CUDA_TOOLKIT_PATH=/usr/local/cuda
export TF_CUDNN_VERSION=5.1.5
export CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu/
export TF_CUDA_COMPUTE_CAPABILITIES=5.3

cd $HOME/tensorflow
bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --explain explain.txt --verbose_explanations --config=cuda //tensorflow/tools/pip_package:build_pip_package

11 changes: 0 additions & 11 deletions debug/exportEV.sh

This file was deleted.

2 changes: 0 additions & 2 deletions debug/local_config_cuda/WORKSPACE

This file was deleted.

44 changes: 0 additions & 44 deletions debug/local_config_cuda/crosstool/BUILD

This file was deleted.

Loading

0 comments on commit ba7113e

Please sign in to comment.