Parallel processing
Add the possibility to do the forward and backward propagation on both processors of the ESP32, hopefully reaching x2 speedup.
I actually did not see any speedup at all. Maybe I did it wrong or the test cases are not suited for that (too small)?