-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a video #1840
Comments
I will donate 10$ |
Which test? Which errors? |
I encountered so many errors that I don't know where to start. I got the message CM error: automation script not found! I found this error today, up to this point the instructions worked. The log is below CM error: automation script not found! |
I ran this script cm run --tags=run-mlperf,inference,_find-performance,_full,_r4.0 --model=3d-unet-99 --implementation=intel --framework=pytorch --category=edge - -scenario=Offline --execution_mode=test --device=cpu --quiet --test_query_count=50 There is no libffi7 package on my Ubuntu 23.04 sudo DEBIAN_FRONTEND=noninteractive apt-get install -y libffi7 CM error: Portable CM script failed (name = get-generic-sys-util, return code = 256) https://github.com/mlcommons/cm4mlops/issues The CM concept is to collaboratively fix such issues inside portable CM scripts |
Hi @Agalakdak The docs page uses |
Hi @arjunsuresh ! Sorry for the long reply, I tried different ways to solve it. I used the command cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.0 --model=3d-unet-99 --implementation=intel --framework=pytorch --category=edge --scenario=Offline --execution_mode=test --device=cpu --docker --quiet --test_query_count=50 From here https://docs.mlcommons.org/inference/benchmarks/medical_imaging/3d-unet/ And at the last step I got an error And some more logs in the file I don't understand what to do with this. What information can I provide you to solve the problem? |
@arjunsuresh, I tried to run another benchmark. But there was an error there too. Please help me figure it out. Command Error log 1 warning found (use docker --debug to expand):
Full error log |
Hi @Agalakdak we do have problem with Intel implementation as reported here. We'll work with Intel to fix these. But even then Intel implementation is expected to work on only the latest Intel server/workstation CPUs - we'll update this in the documentation. |
Hi @arjunsuresh, Hi arjunsuresh, thanks for the prompt reply. I'll check that code on the intel XEON GOLD 6346 x2 processor a little later |
@arjunsuresh Can I clarify for the future? Are there any problems with the "Quadro RTX 5000" and "Nvidia A40" video cards? |
@Agalakdak Nvidia doesn't officially support them for MLPerf inference. But typically we have had good success running Nvidia code on such GPUs without much difficulty. Do you have a plan of what all you are trying to benchmark? |
@arjunsuresh Yes, sure. First, I'd like to just run one of the inference benchmarks and compare them with the "reference indicators". If the launch is successful, I'll try to run a benchmark for "training" the network on several video cards using a docker container. And then use these results to find bottlenecks in the system (if there are any) Today I'll try to run as many benchmarks as possible. And then I'll write about the results. If you need any additional information about the system, let me know. |
@Agalakdak If you want to run as many benchmarks as possible the best option to start with will be using Nvidia implementation. Even if any issue is there, they are usually quickly resolvable. If you just want to try getting a result, reference implementation is good for smaller models like resnet50 and bert-99 as it runs on almost any CPUs. And if you are referring to MLPerf training benchmarks - hat's very different from inference even though many of the models in inference come from MLPerf training. Currently there is no automated way to run training benchmarks and the only option is to follow the submitter READMEs is the results repository. |
@arjunsuresh Oh, thanks a lot for the help, but I'm afraid I have another question. I successfully ran "Text to Image using Stable Diffusion" |
@Agalakdak that's only the first step. You need to do the following command from the documentation page inside that docker container. |
Hello @arjunsuresh , unfortunately a new day and new problems cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev What can I do with this error? /usr/include/x86_64-linux-gnu/bits/mathcalls.h(110): error: identifier "_Float32" is undefined Error limit reached. CM error: Portable CM script failed (name = get-cuda-devices, return code = 256) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Full log with error |
@Agalakdak the problem is due to CUDA compilation not working on the host machine. This is actually not a necessity though we never had such an issue before. Let me share you the option to skip this. @anandhu-eng are you able to share this option? |
@arjunsuresh Hi, I encountered a similar problem when I wanted to run ResNet50 and it was ok. And it worked fine - I got inside the container. In the container I ran And I got (I assume) a similar error. INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/run.sh from tmp-run.sh Checking compiler version ... nvcc: NVIDIA (R) Cuda compiler driver Compiling program ... Running program ... /home/cmuser INFO:root: CM error: Portable CM script failed (name = get-cuda-devices, return code = 256) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ https://github.com/mlcommons/cm4mlops/issues The CM concept is to collaboratively fix such issues inside portable CM scripts Log with error |
Hi @Agalakdak We also sometimes face the below error while using Nvidia GPUs inside a container
A quick fix for this is to exit the container. Use We have also removed the requirement to have NVCC in the host system - please do |
Hello. I have been trying to run at least 1 test for a long time and I constantly get errors. Please record a video or give me a link so that I can understand what a normal launch without pain should look like.
The text was updated successfully, but these errors were encountered: