Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overhead of using libraries #13

Open
hobofan opened this issue Dec 3, 2015 · 3 comments
Open

Reduce overhead of using libraries #13

hobofan opened this issue Dec 3, 2015 · 3 comments

Comments

@hobofan
Copy link
Member

hobofan commented Dec 3, 2015

We currently still have a siginificant overhead when compared to directly calling a library implementation. As far as I can tell from profiling most of that overhead is due to dynamic dispatch, which in some cases might only be removed with some bigger restructuring of library.

Any input on where/how performance can be improved is highly appreciated! :)

@hobofan hobofan changed the title Improve overhead of using libraries Reduce overhead of using libraries Dec 3, 2015
hobofan added a commit that referenced this issue Dec 3, 2015
@bklooste
Copy link

I dont know how rust does it but virtual dynamic dispatch to a shared/dynamic lib in C++ does a lookup on each call. Good JIT run times use polymorphic inline caches to avoid this at runtime but obviously that's not an option. Your only options are to ensure the calls happen rarely eg chunky calls/send multiple commands , change the library architecture or possibly static link the lib .

@hobofan
Copy link
Member Author

hobofan commented Jan 26, 2016

@bklooste : I am also not too sure how Rust handles that, but I think that maybe LTO (Link time optimization) already does something like that. At least I haven't really seen any significant overhead for that. As for static linking the lib, that should generally be possible to do in the relevant plugins (cudnn in -nn and cublas in -blas should support static linking).

What I was originally meant with dynamic dispatch was the one in Rust as explained in this part of the Rust book.

@bklooste
Copy link

Looking at +1000x Dot product of two vectors of size 100 | 48,870 ns (+/- 499) | 15,226 ns (+/- 244)

The cost of dynamic dispatch is not typically huge , its just a static indexed lookup for the right method . However it can suffer

  1. Extremely tight method like a micro-bench can be many multiple small calls. You can certainly make 100M virt calls in a second. Here is the typical cost as you can see its not high ( 2 moves and an index )

this.v1();
00000012 8B CE mov ecx,esi
00000014 8B 01 mov eax,dword ptr [ecx] ; fetch method table address
00000016 FF 50 38 call dword ptr [eax+38h] ; fetch/call method address

  1. Inlining nor whole program optimization is not possible .
  2. You cant use link time optimization for virts , in fact shared libraries need a extra lookup ( sometimes a hash) (not just rust but C++ ) , http://eli.thegreenplace.net/2013/12/05/the-cost-of-dynamic-virtual-calls-vs-static-crtp-dispatch-in-c. This can be 20% of program execution . However i was wrong with rust as rust libs tend to be compiled whole program eg they are not shared libs.
  1. Is IMHO not the case but it seems the only one , maybe something rust specific , i will have a look at the assem for the build.
  2. Should not be a factor since the loops should be in the called library,
  3. Static linking should improve but both should have the same overhead on both tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants