Merge pull request #716 from LuxDL/ap/fix_amdtests_takeii

Try fixing AMDGPU test stalling
LuxDL · Jun 19, 2024 · f1b8c12 · f1b8c12 · avik-pal · Jun 19, 2024
2 parents 237831d + ad7acc0
commit f1b8c12
Show file tree

Hide file tree

Showing 9 changed files with 38 additions and 40 deletions.
diff --git a/.buildkite/testing.yml b/.buildkite/testing.yml
@@ -64,7 +64,6 @@ steps:
                 - src
                 - ext
         env:
-          RETESTITEMS_NWORKERS: 0
           JULIA_AMDGPU_CORE_MUST_LOAD: "1"
           JULIA_AMDGPU_HIP_MUST_LOAD: "1"
           JULIA_AMDGPU_DISABLE_ARTIFACTS: "1"
@@ -109,10 +108,10 @@ steps:
 
 
 env:
-  JULIA_PKG_PRECOMPILE_AUTO: 0
+  JULIA_AMDGPU_LOGGING_ENABLED: true
   RETESTITEMS_NWORKERS: 8
   RETESTITEMS_NWORKER_THREADS: 2
-  RETESTITEMS_TESTITEM_TIMEOUT: 10000
+  RETESTITEMS_TESTITEM_TIMEOUT: 3600
   JULIA_PKG_SERVER: ""
   JULIA_NUM_THREADS: 4
   SECRET_CODECOV_TOKEN: "jQ0BMTQgyZx7QGyU0Q2Ec7qB9mtE2q/tDu0FsfxvEG7/zOAGvXkyXrzIFFOQxvDoFcP+K2+hYZKMxicYdNqzr5wcxu505aNGN2GM3wyegAr+hO6q12bCFYx6qXzU9FLCCdeqINqn9gUSSOlGtWNFrbAlrTyz/D4Yo66TqBDzvaLL63FMnhCLaXW/zJt3hNuEAJaPY2O6Ze1rX2WZ3Y+i+s3uQ8aLImtoCJhPe8CRx+OhuYiTzGhynFfGntZ0738/1RN4gNM0S/hTC4gLE7XMVBanJpGh32rFaiDwW4zAyXKBrDkL3QA3MS1RvLTJxGJ085S16hCk0C4ddAhZCvIM9Q==;U2FsdGVkX1+bXdFeKMs5G79catOCyby2n07A2fg0FjVAvrjQLZ0yfvDS4paJiFikLkodho0khz2YALKb2Y0K6w=="
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -149,6 +149,8 @@ jobs:
         with:
           version: ${{ matrix.version }}
       - uses: julia-actions/julia-downgrade-compat@v1
+        with:
+          skip: 'AMDGPU'
       - uses: julia-actions/julia-buildpkg@v1
       - uses: julia-actions/julia-runtest@v1
         env:

diff --git a/Project.toml b/Project.toml
@@ -65,7 +65,7 @@ LuxZygoteExt = "Zygote"
 
 [compat]
 ADTypes = "0.2, 1"
-AMDGPU = "0.8.4, 0.9"
+AMDGPU = "0.8.4 - 0.9.4"
 Adapt = "4"
 Aqua = "0.8.4"
 ArgCheck = "2.1"

diff --git a/README.md b/README.md
@@ -25,16 +25,19 @@
 
 </div>
 
-The 🔥 Deep Learning Framework
+<div align="center">
+    <h2>Elegant & Performant Scientific Machine Learning in Julia</h2>
+    <h3>A Pure Julia Deep Learning Framework designed for Scientific Machine Learning</h3>
+</div>
 
-## Installation
+## 💻 Installation
 
 ```julia
 import Pkg
 Pkg.add("Lux")
 ```
 
-## Getting Started
+## 🤸 Quickstart
 
 ```julia
 using Lux, Random, Optimisers, Zygote
@@ -61,22 +64,18 @@ x = rand(rng, Float32, 128, 2) |> device
 y, st = Lux.apply(model, x, ps, st)
 
 # Gradients
-gs = gradient(p -> sum(Lux.apply(model, x, p, st)[1]), ps)[1]
+gs = only(gradient(p -> sum(first(Lux.apply(model, x, p, st))), ps))
 
 # Optimization
 st_opt = Optimisers.setup(Optimisers.Adam(0.0001), ps)
 st_opt, ps = Optimisers.update(st_opt, ps, gs)
 ```
 
-## Examples
+## 📚 Examples
 
 Look in the [examples](/examples/) directory for self-contained usage examples. The [documentation](https://lux.csail.mit.edu) has examples sorted into proper categories.
 
-## Ecosystem
-
-Checkout our [Ecosystem](http://lux.csail.mit.edu/dev/ecosystem) page for more details. 
-
-## Testing
+## 🧪 Testing
 
 The full test of `Lux.jl` takes a long time, here's how to test a portion of the code.
 
@@ -90,7 +89,7 @@ For example, let's consider the tests for `SkipConnection`:
 
 ```julia
 @testitem "SkipConnection" setup=[SharedTestSetup] tags=[:core_layers] begin
-    .....
+    ...
 end
 ```
 
@@ -118,7 +117,7 @@ use [TestEnv.jl](https://github.com/JuliaTesting/TestEnv.jl) as follows. Start w
 using TestEnv; TestEnv.activate(); using ReTestItems;
 
 # Assuming you are in the main directory of Lux
-ReTestItems.runtests("tests/"; name = <NAME OF THE TEST>)
+ReTestItems.runtests("tests/"; name = "NAME OF THE TEST")
 ```
 
 For the `SkipConnection` tests that would be:
@@ -127,11 +126,11 @@ For the `SkipConnection` tests that would be:
 ReTestItems.runtests("tests/"; name = SkipConnection)
 ```
 
-## Getting Help
+## 🆘 Getting Help
 
 For usage related questions, please use [Github Discussions](https://github.com/LuxDL/Lux.jl/discussions) or [JuliaLang Discourse (machine learning domain)](https://discourse.julialang.org/c/domain/ml/) which allows questions and answers to be indexed. To report bugs use [github issues](https://github.com/LuxDL/Lux.jl/issues) or even better send in a [pull request](https://github.com/LuxDL/Lux.jl/pulls).
 
-## Citation
+## 🧑‍🔬 Citation
 
 If you found this library to be useful in academic work, then please cite:
 

diff --git a/docs/src/api/Lux/autodiff.md b/docs/src/api/Lux/autodiff.md
@@ -1,4 +1,4 @@
-# Automatic Differentiation
+# [Automatic Differentiation](@id autodiff-lux)
 
 Lux is not an AD package, but it composes well with most of the AD packages available in the
 Julia ecosystem. This document lists the current level of support for various AD packages in

diff --git a/docs/src/introduction/index.md b/docs/src/introduction/index.md
@@ -4,16 +4,11 @@
 
 Install [Julia v1.10 or above](https://julialang.org/downloads/). Lux.jl is available
 through the Julia package manager. You can enter it by pressing `]` in the REPL and then
-typing
+typing `add Lux`. Alternatively, you can also do
 
 ```julia
-pkg> add Lux
-```
-
-Alternatively, you can also do
-
-```julia
-import Pkg; Pkg.add("Lux")
+import Pkg
+Pkg.add("Lux")
 ```
 
 ## Quickstart

diff --git a/docs/src/introduction/overview.md b/docs/src/introduction/overview.md
@@ -28,20 +28,24 @@ it both compiler and autodiff friendly.
   [edge cases and limitations](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.destructure). Lux
   forces users to make an explicit distinction between state variables and parameter
   variables to avoid these issues. Also, it comes battery-included for distributed training.
-  
+
 * **Sensible display of Custom Layers** -- Ever wanted to see Pytorch like Network printouts
   or wondered how to extend the pretty printing of Flux's layers? Lux handles all of that
   by default.
-  
+
 * **Truly immutable models** - No *unexpected internal mutations* since all layers are
-  implemented as pure functions. All layers are also *deterministic* given the parameters and
-  state: if a layer is supposed to be stochastic (say `Dropout`), the state must contain a
-  seed which is then updated after the function call.
+  implemented as pure functions. All layers are also *deterministic* given the parameters
+  and state: if a layer is supposed to be stochastic (say [`Dropout`](@ref)), the state
+  must contain a seed which is then updated after the function call.
 
 * **Easy Parameter Manipulation** -- By separating parameter data and layer structures,
-  Lux makes implementing `WeightNorm`, `SpectralNorm`, etc. downright trivial.
-  Without this separation, it is much harder to pass such parameters
-  around without mutations which AD systems don't like.
+  Lux makes implementing [`WeightNorm`](@ref), `SpectralNorm`, etc. downright trivial.
+  Without this separation, it is much harder to pass such parameters around without
+  mutations which AD systems don't like.
+
+* **Wider AD Support** -- Lux has extensive support for most
+  [AD systems in julia](@ref autodiff-lux), while Flux is mostly tied to Zygote (with some
+  initial support for Enzyme).
 
 * **Small Neural Networks on CPU** -- Lux is developed for training large neural networks.
   For smaller architectures, we recommend using
@@ -58,3 +62,4 @@ it both compiler and autodiff friendly.
   For these, python frameworks like PyTorch and Jax are better suited.
 
 * **XLA Support** -- Lux doesn't compile to XLA which means no TPU support unfortunately.
+  We are currently actively working on XLA support via [Reactant.jl](https://github.com/EnzymeAD/Reactant.jl).
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -16,7 +16,7 @@ end
 const EXTRA_PKGS = String[]
 
 if ("all" in LUX_TEST_GROUP || "distributed" in LUX_TEST_GROUP)
-    push!(EXTRA_PKGS, "MPI")
+    BACKEND_GROUP != "amdgpu" && push!(EXTRA_PKGS, "MPI")
     (BACKEND_GROUP == "all" || BACKEND_GROUP == "cuda") && push!(EXTRA_PKGS, "NCCL")
 end
 ("all" in LUX_TEST_GROUP || "others" in LUX_TEST_GROUP) && push!(EXTRA_PKGS, "Flux")
@@ -41,7 +41,7 @@ for tag in LUX_TEST_GROUP
 end
 
 # Distributed Tests
-if "all" in LUX_TEST_GROUP || "distributed" in LUX_TEST_GROUP
+if ("all" in LUX_TEST_GROUP || "distributed" in LUX_TEST_GROUP) && BACKEND_GROUP != "amdgpu"
     using MPI
 
     nprocs_str = get(ENV, "JULIA_MPI_TEST_NPROCS", "")

diff --git a/test/setup_modes.jl b/test/setup_modes.jl
@@ -1,6 +1,4 @@
-using Lux, LuxDeviceUtils, GPUArraysCore, Pkg
-
-GPUArraysCore.allowscalar(false)
+using Lux, LuxDeviceUtils
 
 if !@isdefined(BACKEND_GROUP)
     const BACKEND_GROUP = lowercase(get(ENV, "BACKEND_GROUP", "all"))
Benchmark suite	Current: `f1b8c12`	Previous: `237831d`	Ratio
`Dense(2 => 2)/cpu/reverse/ReverseDiff (compiled)/(2, 128)`	`3683.125` ns	`3694.375` ns	`1.00`
`Dense(2 => 2)/cpu/reverse/Zygote/(2, 128)`	`7288.666666666667` ns	`7175.4` ns	`1.02`
`Dense(2 => 2)/cpu/reverse/Tracker/(2, 128)`	`20909` ns	`21109` ns	`0.99`
`Dense(2 => 2)/cpu/reverse/ReverseDiff/(2, 128)`	`9847.3` ns	`9923.5` ns	`0.99`
`Dense(2 => 2)/cpu/reverse/Flux/(2, 128)`	`9238.375` ns	`8936.8` ns	`1.03`
`Dense(2 => 2)/cpu/reverse/SimpleChains/(2, 128)`	`4527.125` ns	`4492.25` ns	`1.01`
`Dense(2 => 2)/cpu/reverse/Enzyme/(2, 128)`	`1168.5407407407408` ns	`1164.4202898550725` ns	`1.00`
`Dense(2 => 2)/cpu/forward/NamedTuple/(2, 128)`	`1176.1526717557251` ns	`1112.5704225352113` ns	`1.06`
`Dense(2 => 2)/cpu/forward/ComponentArray/(2, 128)`	`1186.4857142857143` ns	`1178` ns	`1.01`
`Dense(2 => 2)/cpu/forward/Flux/(2, 128)`	`1782.859375` ns	`1797.4705882352941` ns	`0.99`
`Dense(2 => 2)/cpu/forward/SimpleChains/(2, 128)`	`179.37413073713492` ns	`180.1279554937413` ns	`1.00`
`Dense(20 => 20)/cpu/reverse/ReverseDiff (compiled)/(20, 128)`	`17342` ns	`17353` ns	`1.00`
`Dense(20 => 20)/cpu/reverse/Zygote/(20, 128)`	`17022` ns	`17052` ns	`1.00`
`Dense(20 => 20)/cpu/reverse/Tracker/(20, 128)`	`37380` ns	`37640` ns	`0.99`
`Dense(20 => 20)/cpu/reverse/ReverseDiff/(20, 128)`	`29484.5` ns	`29785` ns	`0.99`
`Dense(20 => 20)/cpu/reverse/Flux/(20, 128)`	`21770` ns	`21450` ns	`1.01`
`Dense(20 => 20)/cpu/reverse/SimpleChains/(20, 128)`	`17477.5` ns	`17402` ns	`1.00`
`Dense(20 => 20)/cpu/reverse/Enzyme/(20, 128)`	`4316.571428571428` ns	`4325.142857142857` ns	`1.00`
`Dense(20 => 20)/cpu/forward/NamedTuple/(20, 128)`	`3864.625` ns	`3876` ns	`1.00`
`Dense(20 => 20)/cpu/forward/ComponentArray/(20, 128)`	`3923.5` ns	`3953.625` ns	`0.99`
`Dense(20 => 20)/cpu/forward/Flux/(20, 128)`	`4809` ns	`4953.428571428572` ns	`0.97`
`Dense(20 => 20)/cpu/forward/SimpleChains/(20, 128)`	`1660.1` ns	`1652.1` ns	`1.00`
`Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 3, 128)`	`39311146` ns	`47320777` ns	`0.83`
`Conv((3, 3), 3 => 3)/cpu/reverse/Zygote/(64, 64, 3, 128)`	`57818439` ns	`58305356` ns	`0.99`
`Conv((3, 3), 3 => 3)/cpu/reverse/Tracker/(64, 64, 3, 128)`	`70725143` ns	`102789420` ns	`0.69`
`Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff/(64, 64, 3, 128)`	`89020101` ns	`95601238` ns	`0.93`
`Conv((3, 3), 3 => 3)/cpu/reverse/Flux/(64, 64, 3, 128)`	`72846612` ns	`78618619` ns	`0.93`
`Conv((3, 3), 3 => 3)/cpu/reverse/SimpleChains/(64, 64, 3, 128)`	`12056878.5` ns	`11718436` ns	`1.03`
`Conv((3, 3), 3 => 3)/cpu/reverse/Enzyme/(64, 64, 3, 128)`	`17802524.5` ns	`17850771.5` ns	`1.00`
`Conv((3, 3), 3 => 3)/cpu/forward/NamedTuple/(64, 64, 3, 128)`	`7028063` ns	`7036938` ns	`1.00`
`Conv((3, 3), 3 => 3)/cpu/forward/ComponentArray/(64, 64, 3, 128)`	`7000092.5` ns	`7001847` ns	`1.00`
`Conv((3, 3), 3 => 3)/cpu/forward/Flux/(64, 64, 3, 128)`	`9924699` ns	`11538399` ns	`0.86`
`Conv((3, 3), 3 => 3)/cpu/forward/SimpleChains/(64, 64, 3, 128)`	`6389608` ns	`6393151.5` ns	`1.00`
`vgg16/cpu/reverse/Zygote/(32, 32, 3, 16)`	`737562829` ns	`751448443` ns	`0.98`
`vgg16/cpu/reverse/Zygote/(32, 32, 3, 64)`	`2545549640` ns	`2573472812` ns	`0.99`
`vgg16/cpu/reverse/Zygote/(32, 32, 3, 2)`	`146821325` ns	`144720323` ns	`1.01`
`vgg16/cpu/reverse/Tracker/(32, 32, 3, 16)`	`868615027` ns	`968831713.5` ns	`0.90`
`vgg16/cpu/reverse/Tracker/(32, 32, 3, 64)`	`3064060217` ns	`3278943882` ns	`0.93`
`vgg16/cpu/reverse/Tracker/(32, 32, 3, 2)`	`219512795` ns	`234124583` ns	`0.94`
`vgg16/cpu/reverse/Flux/(32, 32, 3, 16)`	`685678726` ns	`746070446` ns	`0.92`
`vgg16/cpu/reverse/Flux/(32, 32, 3, 64)`	`2574375943` ns	`3009137266` ns	`0.86`
`vgg16/cpu/reverse/Flux/(32, 32, 3, 2)`	`127147427` ns	`132644713.5` ns	`0.96`
`vgg16/cpu/forward/NamedTuple/(32, 32, 3, 16)`	`171884482` ns	`174325259` ns	`0.99`
`vgg16/cpu/forward/NamedTuple/(32, 32, 3, 64)`	`650293250.5` ns	`647357046.5` ns	`1.00`
`vgg16/cpu/forward/NamedTuple/(32, 32, 3, 2)`	`34511836` ns	`34732801` ns	`0.99`
`vgg16/cpu/forward/ComponentArray/(32, 32, 3, 16)`	`164391167.5` ns	`164171075.5` ns	`1.00`
`vgg16/cpu/forward/ComponentArray/(32, 32, 3, 64)`	`634653416` ns	`641446171` ns	`0.99`
`vgg16/cpu/forward/ComponentArray/(32, 32, 3, 2)`	`29977086.5` ns	`30107004` ns	`1.00`
`vgg16/cpu/forward/Flux/(32, 32, 3, 16)`	`185946798` ns	`189802799.5` ns	`0.98`
`vgg16/cpu/forward/Flux/(32, 32, 3, 64)`	`765662897.5` ns	`799823428` ns	`0.96`
`vgg16/cpu/forward/Flux/(32, 32, 3, 2)`	`35241726.5` ns	`38276609` ns	`0.92`
`Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 64, 128)`	`1245538918.5` ns	`1306917435` ns	`0.95`
`Conv((3, 3), 64 => 64)/cpu/reverse/Zygote/(64, 64, 64, 128)`	`1864879281` ns	`1880303414` ns	`0.99`
`Conv((3, 3), 64 => 64)/cpu/reverse/Tracker/(64, 64, 64, 128)`	`2293551179` ns	`2465824739` ns	`0.93`
`Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff/(64, 64, 64, 128)`	`2516850614` ns	`2587857217` ns	`0.97`
`Conv((3, 3), 64 => 64)/cpu/reverse/Flux/(64, 64, 64, 128)`	`1882887952.5` ns	`1920389453.5` ns	`0.98`
`Conv((3, 3), 64 => 64)/cpu/reverse/Enzyme/(64, 64, 64, 128)`	`561045265` ns	`561226426` ns	`1.00`
`Conv((3, 3), 64 => 64)/cpu/forward/NamedTuple/(64, 64, 64, 128)`	`326179109` ns	`325726548` ns	`1.00`
`Conv((3, 3), 64 => 64)/cpu/forward/ComponentArray/(64, 64, 64, 128)`	`323271956` ns	`323189696` ns	`1.00`
`Conv((3, 3), 64 => 64)/cpu/forward/Flux/(64, 64, 64, 128)`	`349888101` ns	`472300185.5` ns	`0.74`
`Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 1, 128)`	`11973548` ns	`11879578` ns	`1.01`
`Conv((3, 3), 1 => 1)/cpu/reverse/Zygote/(64, 64, 1, 128)`	`17858872` ns	`18066903` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/reverse/Tracker/(64, 64, 1, 128)`	`19168560` ns	`19358439.5` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff/(64, 64, 1, 128)`	`23865197` ns	`24037285` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/reverse/Flux/(64, 64, 1, 128)`	`17866720` ns	`18030067` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/reverse/SimpleChains/(64, 64, 1, 128)`	`1158234` ns	`1161439` ns	`1.00`
`Conv((3, 3), 1 => 1)/cpu/reverse/Enzyme/(64, 64, 1, 128)`	`5814007` ns	`5877613` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/forward/NamedTuple/(64, 64, 1, 128)`	`2054540.5` ns	`2061078` ns	`1.00`
`Conv((3, 3), 1 => 1)/cpu/forward/ComponentArray/(64, 64, 1, 128)`	`2037248` ns	`2052642` ns	`0.99`
`Conv((3, 3), 1 => 1)/cpu/forward/Flux/(64, 64, 1, 128)`	`2078324` ns	`2085073` ns	`1.00`
`Conv((3, 3), 1 => 1)/cpu/forward/SimpleChains/(64, 64, 1, 128)`	`202510.5` ns	`207838` ns	`0.97`
`Dense(200 => 200)/cpu/reverse/ReverseDiff (compiled)/(200, 128)`	`293437.5` ns	`297415` ns	`0.99`
`Dense(200 => 200)/cpu/reverse/Zygote/(200, 128)`	`266057.5` ns	`267444.5` ns	`0.99`
`Dense(200 => 200)/cpu/reverse/Tracker/(200, 128)`	`365572` ns	`369540` ns	`0.99`
`Dense(200 => 200)/cpu/reverse/ReverseDiff/(200, 128)`	`407804` ns	`411308` ns	`0.99`
`Dense(200 => 200)/cpu/reverse/Flux/(200, 128)`	`275034` ns	`277337.5` ns	`0.99`
`Dense(200 => 200)/cpu/reverse/SimpleChains/(200, 128)`	`411080` ns	`409664.5` ns	`1.00`
`Dense(200 => 200)/cpu/reverse/Enzyme/(200, 128)`	`83504` ns	`83486` ns	`1.00`
`Dense(200 => 200)/cpu/forward/NamedTuple/(200, 128)`	`81180.5` ns	`81302` ns	`1.00`
`Dense(200 => 200)/cpu/forward/ComponentArray/(200, 128)`	`81631` ns	`85018` ns	`0.96`
`Dense(200 => 200)/cpu/forward/Flux/(200, 128)`	`86775.5` ns	`87734` ns	`0.99`
`Dense(200 => 200)/cpu/forward/SimpleChains/(200, 128)`	`104563` ns	`104626` ns	`1.00`
`Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 16, 128)`	`203633792` ns	`208418135` ns	`0.98`
`Conv((3, 3), 16 => 16)/cpu/reverse/Zygote/(64, 64, 16, 128)`	`328082047.5` ns	`329863332.5` ns	`0.99`
`Conv((3, 3), 16 => 16)/cpu/reverse/Tracker/(64, 64, 16, 128)`	`399733123` ns	`437868758` ns	`0.91`
`Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff/(64, 64, 16, 128)`	`429567326` ns	`473245652.5` ns	`0.91`
`Conv((3, 3), 16 => 16)/cpu/reverse/Flux/(64, 64, 16, 128)`	`375921768` ns	`409579507.5` ns	`0.92`
`Conv((3, 3), 16 => 16)/cpu/reverse/SimpleChains/(64, 64, 16, 128)`	`328704380` ns	`338434555` ns	`0.97`
`Conv((3, 3), 16 => 16)/cpu/reverse/Enzyme/(64, 64, 16, 128)`	`101203246` ns	`101758684` ns	`0.99`
`Conv((3, 3), 16 => 16)/cpu/forward/NamedTuple/(64, 64, 16, 128)`	`43990642` ns	`43942909` ns	`1.00`
`Conv((3, 3), 16 => 16)/cpu/forward/ComponentArray/(64, 64, 16, 128)`	`43821294.5` ns	`43793713` ns	`1.00`
`Conv((3, 3), 16 => 16)/cpu/forward/Flux/(64, 64, 16, 128)`	`53275150` ns	`57038485` ns	`0.93`
`Conv((3, 3), 16 => 16)/cpu/forward/SimpleChains/(64, 64, 16, 128)`	`28607335` ns	`28142581.5` ns	`1.02`
`Dense(2000 => 2000)/cpu/reverse/ReverseDiff (compiled)/(2000, 128)`	`19166105` ns	`19007086` ns	`1.01`
`Dense(2000 => 2000)/cpu/reverse/Zygote/(2000, 128)`	`19549447.5` ns	`19599865` ns	`1.00`
`Dense(2000 => 2000)/cpu/reverse/Tracker/(2000, 128)`	`23387251` ns	`23608296` ns	`0.99`
`Dense(2000 => 2000)/cpu/reverse/ReverseDiff/(2000, 128)`	`24155491` ns	`24199216.5` ns	`1.00`
`Dense(2000 => 2000)/cpu/reverse/Flux/(2000, 128)`	`19735654` ns	`19621295` ns	`1.01`
`Dense(2000 => 2000)/cpu/reverse/Enzyme/(2000, 128)`	`6562123` ns	`6523963` ns	`1.01`
`Dense(2000 => 2000)/cpu/forward/NamedTuple/(2000, 128)`	`6547446.5` ns	`6565571` ns	`1.00`
`Dense(2000 => 2000)/cpu/forward/ComponentArray/(2000, 128)`	`6511687` ns	`6584434` ns	`0.99`
`Dense(2000 => 2000)/cpu/forward/Flux/(2000, 128)`	`6536680` ns	`6525087.5` ns	`1.00`