-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question - About Prediction time over CPU and GPU #13
Comments
A couple of notes on this:
So I could speed this up by quite a lot, probably bringing it down to only 2-3secs on GPU per song, but I would risk introducing new errors in the process. So the main question would be - how important/sufficient is the prediction speed for people that use this repository? So far I did not get any complaints about speed, but if you show a common use case that requires more speed to be feasible, please present it and, if others also indicate that they would like to have this, I can consider putting in some speed-ups. But Multi-GPU training and prediction for example is not super straightforward to code, so I decided to avoid that in favour of keeping correct, readable code that can be adapted by people to their own needs easily. |
The part where prediction for an input song is made is actually here: https://github.com/f90/Wave-U-Net/blob/master/Evaluate.py#L109 What could be changed without a lot of effort would be to change batch size from 1 to the default (16), however that also means prediction requires more RAM/GPU memory. We would have to make sure though that prediction still works exactly the same way as before. Also I am not so sure how much it speeds up prediction especially on CPU. Multi-GPU implementation is also possible, but requires a bit more effort to get right. Keep in mind this is also all supposed to work right out of the gate without people having to configure the GPU setup. In case someone wants to provide such fast implementations, I am all ears. |
@f90 thank you, currently I'm using the latest model
So have a
|
This is expected. Out of interest, what's your CPU usage while predicting? Is it only using a single core, or multiple ones? If it is already using all available cores at 100% then CPU can not be sped up further by changing the code. If not, then maybe using a larger batch size can improve things, but only if Tensorflow is implemented such that it parallelises automatically across multiple CPU cores when processing a whole batch of samples - and I am not sure of that |
There would also be the issue when implementing support for any |
@f90 ok just realized that this is hardcoded here # Batch size of 1
sep_input_shape[0] = 1
sep_output_shape[0] = 1
mix_context, sources = Input.get_multitrack_placeholders(sep_output_shape, model_config["num_sources"], sep_input_shape, "input") so to change |
Be aware that changing the code there means that the internal code further down has to be adapted as well though, since it assumes that we insert one audio segment and get predictions for one back, not multiple. If i have some time for this, and there is sufficient need indicated from all of you (leave a like/comment here to show that) then I will come around to implement that. I would keep the standard setting to a batch size of 1 though to make sure prediction still works even on small systems. |
@f90 ok thank you very much it makes sense. |
I'm also interested in a speed-up, but I'm not sure it's possible since my CPU is already using all cores |
OK so i looked into this a bit more, I implemented a batched variant of prediction and compared running times for a 3 minute input piece. Results: GPU (1x GTX1080)
CPU
These numbers give the time spent within the The batched version also gave memory warnings for CPU, and was using all my CPU cores at once, so it is not surprising a speedup cannot be achieved this way. So, to summarise:
If prediction time is an issue for you, it can be reduced by
Going to close this soon unless there are some good ideas how to improve this otherwise. |
Hi @f90 , |
I am curious why you expect any improvements with the batched version. But if you want to experiment with it, replace the Also you have to comment out
found in
|
Going to close this issue soon if I don't get any reports on the above code snippet bringing much benefit in terms of prediction speed... |
@f90 thanks a lot, we are going to try this asap! |
You were right, this does not bring improvements in terms of speed. |
I'm interested in any kind of multiple GPU support or tricks that would speed up the process! |
Multi-GPU is definitely an interesting option. I would like to establish this repository as a "go-to" resource for people learning about deep learning for source separation, so I would like to keep the source code simple, and I am not sure whether a multi-GPU implementation is straightforward enough for that? While training could be elegant to implement especially in newer TF versions, with the specific way we need to predict song outputs I am not sure it would turn out that elegant. I'm open to feedback on this though! As for MP3 export, see my post here: (#2 (comment)) |
I am trying this repo on google colab and get the following error while running the following command !python Predict.py with cfg.full_44KHz input_path="audio_examples/Cristina\ Vane\ -\ So\ Easy/mix.mp3" output_path="Myoutput"Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last):
|
Hey, this looks like a typical error if the CUDA libraries are not included in your environment properly. Please refer to the CUDA installation manual and how to setup CUDA properly in your particular environment. I think with a simple test.py file that just does "import tensorflow" you will also get the same error, so I don't think it's related to my code in particular |
I'm doing some tests for CPU and GPU environment usages for prediction (Predict.py).
I'm using an audio file
Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s
of duration00:03:15.29
On a Intel i7 - 12 core CPU the prediction time log says Completed after 0:03:19
while on Intel Xeon 12 core plus 2x Nvidia GeForce GTX 1080 says Completed after 0:00:16
I'm not sure from logging if tensorflow is using both gpu devices or gpu
0
only. If I'm not wrong, most of the work is done in theModels.py
here https://github.com/f90/Wave-U-Net/blob/master/Models/UnetSpectrogramSeparator.py#L39 when the computation graph is calculated. I assume that these operations go ongpu:0
in this configuration, sogpu:1
will not be used - but I'm not sure of it.Thank you very much!
The text was updated successfully, but these errors were encountered: