Fix Linear layer bias gradient computation; add size checks to CUDA functions #170

hweom · 2022-07-28T00:59:53Z

What does this PR accomplish?

🩹 Bug Fix

Closes #169

Changes proposed by this PR:

Add an assert to CUDA copy() function to check that source and destination have the same size.
Add an assert to CUDA gemm() function to check that computed matrix multiplication dimensions match the passed tensor sizes.
Fix the Linear layer bias gradient computation by summing all gradients in the batch instead of copying the output gradient to the bias gradient (which is incorrect since the output gradient contains N items, not 1).

After this change, cuda-memcheck no longer finds errors.

Notes to reviewer:

All unit tests continue to pass except ui which is also broken at HEAD.

📜 Checklist

The juice-examples run just fine

…ear logic

drahnr

Looks good, the asserts are a good addition, we might want to relax them just a little bit to allow for more niche use cases.

coaster-blas/src/frameworks/cuda/helper.rs

juice/src/layers/common/linear.rs

coaster-blas/src/frameworks/cuda/helper.rs

drahnr

The only thing that fails is the ui tests. Other than that, LGTM!

hweom · 2022-08-11T02:41:11Z

Test fixes are in #172.

Co-authored-by: Bernhard Schuster <[email protected]>

drahnr · 2022-08-11T07:27:34Z

Could you rebase this PR onto master? I'll merge right away then

hweom · 2022-08-14T01:44:55Z

Done.

* Implement Sequential layer * Fix coaster UI tests (rustc error messages changed in 1.62 (#172) Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> * Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs Co-authored-by: Bernhard Schuster <[email protected]> Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <[email protected]> * More ergonomic net creation and fallible Sequential constructor * Fix merge mistake in commit 6952a49 Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <[email protected]>

* Implement Sequential layer * Fix coaster UI tests (rustc error messages changed in 1.62 (fff-rs#172) Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> * Fix Linear layer bias gradient computation; add size checks to CUDA functions (fff-rs#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs Co-authored-by: Bernhard Schuster <[email protected]> Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <[email protected]> * More ergonomic net creation and fallible Sequential constructor * Fix merge mistake in commit 6952a49 Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <[email protected]>

* Fix coaster UI tests (rustc error messages changed in 1.62 (#172) * Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs * Fix merge mistake in commit 6952a49 (#173) * doc: clarify remote test (#175) * bump rust-bindgen to 0.60.1, bump cargo lock file (#174) * build(deps): bump capnp from 0.14.9 to 0.14.11 (#179) Bumps [capnp](https://github.com/capnproto/capnproto-rust) from 0.14.9 to 0.14.11. - [Release notes](https://github.com/capnproto/capnproto-rust/releases) - [Commits](capnproto/capnproto-rust@capnp-v0.14.9...capnp-v0.14.11) --- updated-dependencies: - dependency-name: capnp dependency-type: direct:production ... * build(deps): bump tokio from 1.21.0 to 1.23.1 (#183) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.0 to 1.23.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.21.0...tokio-1.23.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production ... * build(deps): bump bumpalo from 3.11.0 to 3.12.0 (#187) Bumps [bumpalo](https://github.com/fitzgen/bumpalo) from 3.11.0 to 3.12.0. - [Release notes](https://github.com/fitzgen/bumpalo/releases) - [Changelog](https://github.com/fitzgen/bumpalo/blob/main/CHANGELOG.md) - [Commits](fitzgen/bumpalo@3.11.0...3.12.0) --- updated-dependencies: - dependency-name: bumpalo dependency-type: indirect ... * build(deps): bump tokio from 1.23.1 to 1.24.2 (#191) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.23.1 to 1.24.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/commits) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production ... * Now also saves bias layers (#193) * build(deps): bump openssl from 0.10.41 to 0.10.48 Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.41 to 0.10.48. - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](sfackler/rust-openssl@openssl-v0.10.41...openssl-v0.10.48) updated-dependencies: - dependency-name: openssl dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Do not pass batch_size to cudnnGetRNNParamsSize(). * Add a feature for deterministic (pseudo)randomizing. * New network architecture pieces: Layer, Descriptor, Context, Network (#165) * New network architecture pieces: Layer, Descriptor, Context, Network * Update juice/src/net/descriptor.rs * Implement Sequential layer for the new architecture (#168) * Implement Sequential layer * Fix coaster UI tests (rustc error messages changed in 1.62 (#172) * Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs * More ergonomic net creation and fallible Sequential constructor * Fix merge mistake in commit 6952a49 * Add a few more layers to the new architecture (#176) * Add trainer subsystem with SGD and Adam optimizers (#177) * Coaster convolution API cleanup (#178) * Move Convolution workspace into context * Implement Convolution, Dropout and Pooling layers (#180) * Move Convolution workspace into context * Formatting fixes * Fixed unit tests * Partial implementation of the Convolution layer * Implement the remaining parts for Convolution layer * Implement dropout and pooling layers * Fix CUDA tensor descriptor size error and adjust layer testing infra * Extended debug output for layers with custom Debug impl * Add softmax layers and convert MNIST example (#184) * Move Convolution workspace into context * Formatting fixes * Fixed unit tests * Partial implementation of the Convolution layer * Implement the remaining parts for Convolution layer * Implement dropout and pooling layers * Fix CUDA tensor descriptor size error and adjust layer testing infra * Extended debug output for layers with custom Debug impl * Changed mnist example to the new architecture * Plumbed the momentum arg in the mnist example * Implemented softmax and logsoftmax layers * Remove unnecessary NLL parameter and fix mnist example * Fix native backend softmax and logsoftmax grad computation * Changed slicing syntax in native backend softmax functions * Convert juice benchtests to Criterion (#192) * Convert Juice benchmarks to Criterion * Add newline at the end of Cargo.toml * Made Layer operations return a Result (#186) * Made Layer operations return a Result * Change LayerError to contain Boxes * Update benchmarks for new layer API * Simplify new_rnn_config() Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: opfromthestart <[email protected]>

Assert the correct tensor sizes in copy() and gemm(); fix related Lin…

a36daeb

…ear logic

drahnr approved these changes Jul 28, 2022

View reviewed changes

coaster-blas/src/frameworks/cuda/helper.rs Show resolved Hide resolved

juice/src/layers/common/linear.rs Outdated Show resolved Hide resolved

Check output matrix dims in GEMM; fix corresponding Linear layer logic

e1dafaf

drahnr reviewed Aug 4, 2022

View reviewed changes

coaster-blas/src/frameworks/cuda/helper.rs Outdated Show resolved Hide resolved

drahnr approved these changes Aug 4, 2022

View reviewed changes

Update coaster-blas/src/frameworks/cuda/helper.rs

e33d732

Co-authored-by: Bernhard Schuster <[email protected]>

Merge branch 'master' into copy-fix

1f872c4

drahnr merged commit 6952a49 into fff-rs:master Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Linear layer bias gradient computation; add size checks to CUDA functions #170

Fix Linear layer bias gradient computation; add size checks to CUDA functions #170

hweom commented Jul 28, 2022

drahnr left a comment

drahnr left a comment

hweom commented Aug 11, 2022

drahnr commented Aug 11, 2022

hweom commented Aug 14, 2022

Fix Linear layer bias gradient computation; add size checks to CUDA functions #170

Fix Linear layer bias gradient computation; add size checks to CUDA functions #170

Conversation

hweom commented Jul 28, 2022

What does this PR accomplish?

Changes proposed by this PR:

Notes to reviewer:

📜 Checklist

drahnr left a comment

Choose a reason for hiding this comment

drahnr left a comment

Choose a reason for hiding this comment

hweom commented Aug 11, 2022

drahnr commented Aug 11, 2022

hweom commented Aug 14, 2022