Skip to content

v1: initial release

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 19 Jan 11:27
· 1 commit to master since this release

benchmark

data format: fps / device memory (MiB)

format knlm nlm_cuda (1 stream) nlm_cuda (2 streams)
GRAY16 65.55 / 322 77.40 / 294 98.70 / 362
YUV444P16 34.57 / 406 39.05 / 342 57.85 / 458

1920x1080, d=1, a=2, s=4, KNLMeansCL 1.1.1
nvidia a10 (ecc disabled), vapoursynth-classic R57.A7, windows server 2022

  • performance contributors:

    • kernel fusion (distance with horizontal)
    • faster pcie transfers using pinned memory
    • reduced synchronization overhead using cuda-only warp sync primitive
    • ahead-of-time compilation
  • it is impossible to efficiently emulate num_streams for knlm because of pageable allocation