Add API to fill tensors #58

alexandermorozov · 2016-03-23T12:59:28Z

It would be nice to have an API to fill SharedTensor with a constant value. Currently closest thing is leaf::weight::FillerType::Constant { value: 0.0 }.fill(&mut tensor). There are two problems: usability and performance.

On usability side this interface is available only from leaf crate, from first glance looks like it's have to do something with weights and is quite verbose.

On performance side it's implemented by adding native device, filling CPU mem and syncronizing with original device. If original belongs to Cuda framework, I think this operation can be done without allocating host memory, filling it using CPU and doing a PCI transfer. At least for SharedTensor<f32> there is cuMemsetD32().

I don't completely understand whole arhitecture, but it seems that because the operation depends on backend, it should be implemented as collenchyma plugin. It looks like it'd be too much to create separate repo for this, so maybe it should be done inside collenchyma somewhere in src/plugins/?

Well, that said, it's not clear if it's worth to do now... In my opinion this mostly depends on how it affects performance. And I haven't seen any perf issues yet except one probably fixed in autumnai/leaf#90.

The text was updated successfully, but these errors were encountered:

hobofan · 2016-03-23T13:12:00Z

Yeah generally this should be a collenchyma plugin, and apart from the Constant filler might even be functionally spread among multiple collenchyma plugins. Most of the fillers in Leaf will be different random distributions so it might fit well in a collenchyma-rand crate.

However in the special case of filling it with a constant value it might be a good idea to have in centrally in collenchyma since that is something that a lot of people need. Additionally this should also help to reduce the problem that CUDA memory is not 0-initalized.

Related to #32.

alexandermorozov · 2016-03-24T14:00:04Z

Great! I'd like to solve this issue but end of my vacation is coming closer and I'd better train a network or two while there is enough mindspace ) Maybe I'll be able to return to it later...

Regarding initialization it'd be good to refactor API to make it impossible to use uninitialized tensors. But it's not that straightforward: tensor may be initialized on creation, or it can be filled the first time when used as output of some computations. In each case it must be guaranteed that all elements of tensor are initialized. ndarray solves first part by accepting constant values, slices, Vecs and iterators as initializers. But it doesn't allow to allocate uninitialized chunk of memory to be overwritten later; looks like there is no way to statically check if usage is correct, so checks have to be done at runtime and that'll have a small runtime cost.

Another practical approach that doesn't guarantee anything but may still be useful is to support cuda-memcheck tool that has special mode to check for uninitialized memory. Currently collenchyma will gladly sync mem from CUDA device to host even if cuda mem wasn't initialized, so cuda-memcheck output is completely spammed. There is also valgrind to check initialization of host memory. With both of them it'd be possible to trace unexpected problems more easily.

Another partial solution could be splitting sync() into sync_for_write_only() and sync_for_read_write() or something, where former function allocates mem on backend if it isn't allocated yet, and latter function does the same, but only if there is already a valid chunk of memory on some backend to trasfer data from and returns error otherwise. Then caller of sync_for_write_only() has to insure by convention that requested memory is completely initialized.

hobofan · 2016-03-24T14:22:15Z

Another partial solution could be splitting sync() into sync_for_write_only() and sync_for_read_write() or something, where former function allocates mem on backend if it isn't allocated yet, and latter function does the same, but only if there is already a valid chunk of memory on some backend to trasfer data from and returns error otherwise. Then caller of sync_for_write_only() has to insure by convention that requested memory is completely initialized.

Yeah I think that's a good idea. #37 intends something similar, though not from the perspective of uninitalized memory.

jonysy mentioned this issue Mar 5, 2017

Consider using a tensor lib for native memory (instead of flat slices) jonysy/parenchyma#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to fill tensors #58

Add API to fill tensors #58

alexandermorozov commented Mar 23, 2016

hobofan commented Mar 23, 2016

alexandermorozov commented Mar 24, 2016

hobofan commented Mar 24, 2016

Add API to fill tensors #58

Add API to fill tensors #58

Comments

alexandermorozov commented Mar 23, 2016

hobofan commented Mar 23, 2016

alexandermorozov commented Mar 24, 2016

hobofan commented Mar 24, 2016