Demonstrated is an example which calculates the sum of two slices of float32 using SIMD instructions.
result[i] = a[i] + b[i]
The example is implemented in three different ways:
- code generated Assembler by avo (only x86 possible)
- compile and translate to Go Assembler by gocc (any architecture supported by clang)
- call SIMD instructions directly in Go code with go-simd (only ARM possible)
To install see gocc README
See kelindar/bitmap for an example using auto-vectorization in combination with gocc.
goos: linux
goarch: amd64
pkg: github.com/Crash129/go-simd-example
cpu: AMD Ryzen 7 5700X 8-Core Processor
BenchmarkAvo
BenchmarkAvo-16 3100104 383.8 ns/op 0 B/op 0 allocs/op
BenchmarkGocc
BenchmarkGocc-16 3770598 318.4 ns/op 0 B/op 0 allocs/op
BenchmarkPlainGo
BenchmarkPlainGo-16 666050 1797 ns/op 0 B/op 0 allocs/op
BenchmarkGoSIMD
main_test.go:49: no ARM SIMD support
--- SKIP: BenchmarkGoSIMD
PASS
Note that the assembler code of the avo
implementation has no optimizations.
avo
is able to have the same performance as gocc
but the generated assembler code of gocc
is already optimized by clang.
goos: darwin
goarch: arm64
pkg: github.com/Crash129/go-simd-example
BenchmarkAvo
main_test.go:28: no AVX2 support
--- SKIP: BenchmarkAvo
BenchmarkGocc
BenchmarkGocc-12 15680464 374.5 ns/op
BenchmarkPlainGo
BenchmarkPlainGo-12 3339758 1798 ns/op
BenchmarkGoSIMD
BenchmarkGoSIMD-12 4593692 1309 ns/op
PASS