Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polish alignment kernel options #113

Merged
merged 12 commits into from
Jan 23, 2025
Merged

Conversation

mmolari
Copy link
Collaborator

@mmolari mmolari commented Jan 21, 2025

This PR should contain the following changes:

  • remove minimap2-cli option for the alignment kernel. The reason is that this should be redundant, given that we already ported the minimap library
  • improve the external call to mmseqs2 and parsing of the output
  • add an initial check for the availability of the mmseqs command in path
  • fix error with mmseqs intervals

@mmolari
Copy link
Collaborator Author

mmolari commented Jan 21, 2025

running target/release/pangraph build -k mmseqs data/mpox.fa with mmseqs version 17.b804f currently results into this error:

Error: 
   0: When traversing guide tree
   1: When merging graphs
   2: During self-merge iteration 0
   3: During reweave
   4: When splitting block 1
   5: When extracting intervals for block 1
   6: When performing interval sanity checks
   7: Last interval does not end at the block length. Interval: [190088, 197210), block length: 197209. This is an internal error. Please report it to developers.

Location:
   packages/pangraph/src/pangraph/pangraph_interval.rs:60

I need to investigate further, probably just a matter of 1-based vs 0-based indexing.

Edit: indeed, according to the documentation while mmseqs standard output see uses 0-based indexing, the use of --format-output results in 1-based indexing. This was not a problem in julia (1-based indexing). It was easily fixed by correct indexing conversion.

@mmolari
Copy link
Collaborator Author

mmolari commented Jan 22, 2025

Now error messages are more informative:

If mmseqs is not available in path:

Error: 
   0: When performing preliminary checks before building the pangraph.
   1: Executing `mmseqs --help` results in the following error:
      No such file or directory (os error 2)
      Please make sure that `mmseqs` is installed and available in PATH.

Location:
   packages/pangraph/src/commands/build/build_run.rs:23

If mmseqs command returns an error:

Error: 
   0: When traversing guide tree
   1: When merging graphs
   2: During self-merge iteration 0
   3: mmseqs command exit status: 1
      failed with stderr:
      Unrecognized parameter "--threds". Did you mean "--threads" (Threads)?

Location:
   packages/pangraph/src/align/mmseqs/align_with_mmseqs.rs:57

@mmolari mmolari linked an issue Jan 22, 2025 that may be closed by this pull request
@mmolari mmolari marked this pull request as ready for review January 22, 2025 10:14
This follows `color_eyre` conventions and uses different sections to make error message prettier.

It's important to note that any error can manifest with this exact message (e.g. if mmseqs requires libraries and they are missing, or if mmseqs binary is not executable, if it segfaults etc.), so I made the suggestion message a little more generic (not assuming that the error is "file not found").

![Image](https://github.com/user-attachments/assets/346bf3f9-db27-4c7f-a2d1-cd642c0d008f)
This follows `color_eyre` conventions and uses different sections to make error message prettier.

I think that chaining `.arg()` and then adding `-k` with `.args()` does not necessarily actually adds `-k`, but I don't know how to check it. So I opted for manually constructing all-in-one array of args, which I pass using a single `.args()` call. This also allow us to print the full command to the user and inspect it during debugging.

Also refactored and simplified things a little.

![Image](https://github.com/user-attachments/assets/726a580f-c7f2-4e59-8f45-50089ab8125e)
This removes dev jargon from the help test. Non-technical users might not be familiar with it.

Also added a link to mmseqs while we are on it.
@ivan-aksamentov ivan-aksamentov merged commit b2d5ff6 into rust Jan 23, 2025
@ivan-aksamentov ivan-aksamentov deleted the alignment-kernel-polish branch January 23, 2025 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better handling of mmseqs errors
2 participants