Generating a Cosmoflow dataset is very slow #66

Linzsd · 2024-05-05T17:40:17Z

When I was generating the dataset for Cosmoflow, the generation was slow, and it took a minute interval to print the logs once. When I switch the data format to npz, the generation is very fast, why is that?

daniarherikurniawan · 2024-07-30T19:19:40Z

Try increasing the -n (--num-accelerators) to big number ( 20 or 50) during data generation.
I also modified line 262 https://github.com/mlcommons/storage/blob/main/benchmark.sh#L262C2-L262C37
Changing mpirun -hosts $hosts -np $parallel ... to mpirun -np $parallel ...
I haven't tried the npz format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating a Cosmoflow dataset is very slow #66

Generating a Cosmoflow dataset is very slow #66

Linzsd commented May 5, 2024

daniarherikurniawan commented Jul 30, 2024

Generating a Cosmoflow dataset is very slow #66

Generating a Cosmoflow dataset is very slow #66

Comments

Linzsd commented May 5, 2024

daniarherikurniawan commented Jul 30, 2024