-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API to fill tensors #58
Comments
Yeah generally this should be a collenchyma plugin, and apart from the However in the special case of filling it with a constant value it might be a good idea to have in centrally in collenchyma since that is something that a lot of people need. Additionally this should also help to reduce the problem that CUDA memory is not 0-initalized. Related to #32. |
Great! I'd like to solve this issue but end of my vacation is coming closer and I'd better train a network or two while there is enough mindspace ) Maybe I'll be able to return to it later... Regarding initialization it'd be good to refactor API to make it impossible to use uninitialized tensors. But it's not that straightforward: tensor may be initialized on creation, or it can be filled the first time when used as output of some computations. In each case it must be guaranteed that all elements of tensor are initialized. Another practical approach that doesn't guarantee anything but may still be useful is to support Another partial solution could be splitting |
Yeah I think that's a good idea. #37 intends something similar, though not from the perspective of uninitalized memory. |
It would be nice to have an API to fill
SharedTensor
with a constant value. Currently closest thing isleaf::weight::FillerType::Constant { value: 0.0 }.fill(&mut tensor)
. There are two problems: usability and performance.On usability side this interface is available only from
leaf
crate, from first glance looks like it's have to do something with weights and is quite verbose.On performance side it's implemented by adding native device, filling CPU mem and syncronizing with original device. If original belongs to
Cuda
framework, I think this operation can be done without allocating host memory, filling it using CPU and doing a PCI transfer. At least forSharedTensor<f32>
there iscuMemsetD32()
.I don't completely understand whole arhitecture, but it seems that because the operation depends on backend, it should be implemented as collenchyma plugin. It looks like it'd be too much to create separate repo for this, so maybe it should be done inside collenchyma somewhere in
src/plugins/
?Well, that said, it's not clear if it's worth to do now... In my opinion this mostly depends on how it affects performance. And I haven't seen any perf issues yet except one probably fixed in autumnai/leaf#90.
The text was updated successfully, but these errors were encountered: