Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calyx_depth.py: use the eDSL, stop using odgi's Python bindings #105

Open
2 tasks
anshumanmohan opened this issue Jun 19, 2023 · 4 comments
Open
2 tasks
Labels
feature New feature, or a meaningful extension to an existing feature

Comments

@anshumanmohan
Copy link
Contributor

anshumanmohan commented Jun 19, 2023

Currently, calyx_depth.py generates Calyx code

  1. Without the use of the Calyx eDSL (aka the builder library).
  2. With the use of the odgi Python bindings (it's not obvious to see, but it uses parse_data which in turn uses the Python bindings).

I propose to change both of these.

  1. Using the eDSL will lighten the code significantly.
  2. The odgi Python bindings have proved hard to use. Besides, the current code uses the bindings only to infer the dimensions of the graph, and we have in-house ways of doing that now.

These relatively minor gains aside, the true win is that, after this, we will truly be using odgi only as a source of inspiration and oracle-testing, not as part of our workflow.

We currently do something like:

  • graph.gfa --odgi build--> graph.og --parse_data.py--> graph.data
  • calyx_depth.py (with one call to parse_data.py to learn the graph's dimensions) --> calyx_depth.futil
  • calyx_depth.futil tangoes with graph.data in the usual Calyx way.

My proposal is something more like:

  • graph.gfa --pollen_data_gen--> graph.data
  • calyx_depth.py (with one call to mygfa to learn the graph's dimensions, and with calls to the Calyx eDSL for lightening) --> calyx_depth.futil
  • calyx_depth.futil tangoes with graph.data in the usual Calyx way.

The first piece of this is already there: I already generate the same graph.data using pollen_data_gen as one would have using the usual route. The third piece should work if the first two work.

The second piece decomposes into two unrelated parts:

@anshumanmohan anshumanmohan added the triage required Ideas that need further discussion, whether asynchronous or synchronous, before we can take action label Jun 20, 2023
@sampsyo
Copy link
Collaborator

sampsyo commented Jun 24, 2023

Wonderful; thank you for summarizing this so clearly! Indeed, both of these would be great advances that would "modernize" our hardware generation.

One small operational note: the eDSL adoption can happen incrementally, if it's a lot of work. That is, it's not necessary to flip everything over all at once; it suffices to convert individual pieces to use the builder library one at a time since it interoperates with the "plain" AST library pretty smoothly.

@anshumanmohan
Copy link
Contributor Author

Thanks Adrian! Dropping the "triage required" tag. This issue remains up for grabs.

@anshumanmohan anshumanmohan added feature New feature, or a meaningful extension to an existing feature and removed triage required Ideas that need further discussion, whether asynchronous or synchronous, before we can take action labels Jun 29, 2023
@susan-garry
Copy link
Contributor

This looks good to me! I agree that switching over the builder library isn't a top priority, but it would certainly be nice to at some point.

I don't necessarily think it is a bad thing to use odgi in our hardware generation if it turns out that it's faster to transform a .gfa graph to an .og graph and then use odgi to get its dimensions, rather than just obtaining the dimensions from the .gfa file directly, but since .gfa files seem to be sort of a "standard representation" for pangenomic graphs, it would be good to at least transition exine to taking in .gfa files as input rather than .og files, which it currently expects. I also think it would be a win just to avoid making pollen users install odgi.

@sampsyo
Copy link
Collaborator

sampsyo commented Jul 13, 2023

Indeed. Sticking with .gfa files is the right thing to do here because (a) they are the "source of truth," (b) odgi is a difficult-to-install dependency (to put it lightly), and (c) performance at hardware-generation time is not very important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature, or a meaningful extension to an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants