Parse GAF files in parallel #200

sampsyo · 2024-12-29T03:28:06Z

This is truly of dubious utility, but it was sorta fun to experiment with. Presumably, a similar parallel treatment could be applied to the GFA parser someday?

Parsing the GAF in parallel scales pretty well in my experiments. On my M1 Max laptop (10 cores total: 8 performance, 2 efficiency), the parallel version goes from 5.131 seconds to 608.0 ms, so a speedup of 8.4×. Or in terms of lines per second, that's from about 9.6 million to 80.6 million. Nice!

I can't pretend this is actually all that insightful or surprising, but it might be useful.

The parallelizable way seems to be just as fast as the line-by-line parser. So no need for the sequential one, I guess?

sampsyo added 6 commits December 28, 2024 21:13

Refactor GAF stuff to op module

61f7787

Try two different GAF parsers?

fc2a735

Eliminate explicit position from GAF line parser

e452032

Remove the "sequential" GAF parser

38014ac

The parallelizable way seems to be just as fast as the line-by-line parser. So no need for the sequential one, I guess?

First attempt at parallel GAF processing

2c065d9

Clean up parallel case in command

6241f37

sampsyo merged commit 49594a6 into main Dec 29, 2024
11 checks passed

sampsyo deleted the gaf-parallel branch December 29, 2024 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse GAF files in parallel #200

Parse GAF files in parallel #200

sampsyo commented Dec 29, 2024

Parse GAF files in parallel #200

Parse GAF files in parallel #200

Conversation

sampsyo commented Dec 29, 2024