Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to fill tensors #58

Open
alexandermorozov opened this issue Mar 23, 2016 · 3 comments
Open

Add API to fill tensors #58

alexandermorozov opened this issue Mar 23, 2016 · 3 comments

Comments

@alexandermorozov
Copy link
Contributor

It would be nice to have an API to fill SharedTensor with a constant value. Currently closest thing is leaf::weight::FillerType::Constant { value: 0.0 }.fill(&mut tensor). There are two problems: usability and performance.

On usability side this interface is available only from leaf crate, from first glance looks like it's have to do something with weights and is quite verbose.

On performance side it's implemented by adding native device, filling CPU mem and syncronizing with original device. If original belongs to Cuda framework, I think this operation can be done without allocating host memory, filling it using CPU and doing a PCI transfer. At least for SharedTensor<f32> there is cuMemsetD32().

I don't completely understand whole arhitecture, but it seems that because the operation depends on backend, it should be implemented as collenchyma plugin. It looks like it'd be too much to create separate repo for this, so maybe it should be done inside collenchyma somewhere in src/plugins/?

Well, that said, it's not clear if it's worth to do now... In my opinion this mostly depends on how it affects performance. And I haven't seen any perf issues yet except one probably fixed in autumnai/leaf#90.

@hobofan
Copy link
Member

hobofan commented Mar 23, 2016

Yeah generally this should be a collenchyma plugin, and apart from the Constant filler might even be functionally spread among multiple collenchyma plugins. Most of the fillers in Leaf will be different random distributions so it might fit well in a collenchyma-rand crate.

However in the special case of filling it with a constant value it might be a good idea to have in centrally in collenchyma since that is something that a lot of people need. Additionally this should also help to reduce the problem that CUDA memory is not 0-initalized.

Related to #32.

@alexandermorozov
Copy link
Contributor Author

Great! I'd like to solve this issue but end of my vacation is coming closer and I'd better train a network or two while there is enough mindspace ) Maybe I'll be able to return to it later...

Regarding initialization it'd be good to refactor API to make it impossible to use uninitialized tensors. But it's not that straightforward: tensor may be initialized on creation, or it can be filled the first time when used as output of some computations. In each case it must be guaranteed that all elements of tensor are initialized. ndarray solves first part by accepting constant values, slices, Vecs and iterators as initializers. But it doesn't allow to allocate uninitialized chunk of memory to be overwritten later; looks like there is no way to statically check if usage is correct, so checks have to be done at runtime and that'll have a small runtime cost.

Another practical approach that doesn't guarantee anything but may still be useful is to support cuda-memcheck tool that has special mode to check for uninitialized memory. Currently collenchyma will gladly sync mem from CUDA device to host even if cuda mem wasn't initialized, so cuda-memcheck output is completely spammed. There is also valgrind to check initialization of host memory. With both of them it'd be possible to trace unexpected problems more easily.

Another partial solution could be splitting sync() into sync_for_write_only() and sync_for_read_write() or something, where former function allocates mem on backend if it isn't allocated yet, and latter function does the same, but only if there is already a valid chunk of memory on some backend to trasfer data from and returns error otherwise. Then caller of sync_for_write_only() has to insure by convention that requested memory is completely initialized.

@hobofan
Copy link
Member

hobofan commented Mar 24, 2016

Another partial solution could be splitting sync() into sync_for_write_only() and sync_for_read_write() or something, where former function allocates mem on backend if it isn't allocated yet, and latter function does the same, but only if there is already a valid chunk of memory on some backend to trasfer data from and returns error otherwise. Then caller of sync_for_write_only() has to insure by convention that requested memory is completely initialized.

Yeah I think that's a good idea. #37 intends something similar, though not from the perspective of uninitalized memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants