Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

Open
gitoliver opened this issue Nov 29, 2023 · 4 comments
Assignees
Labels

Comments

@gitoliver
Copy link
Contributor

For non ano-ano linkages our current labeling system works fine:
IndexOrdered:
DNeu5Aca2-6DGalpb1-4DGlcpNAc[3S]b1-2DManpa1-3[DGlcpNAcb1-4][DManp[2S,3Me]a1-6DManpa1-6]DManpb1-4DGlcpNAc[6Me]b1-4DGlcpNAcb1-OH
IndexOrderedLabeled:
DNeu5Ac&Label=residue-14;a2-6&Label=link-12;DGalp&Label=residue-13;b1-4&Label=link-11;DGlcpNAc&Label=residue-12;[3&Label=link-13;S&Label=residue-15;]b1-2&Label=link-10;DManp&Label=residue-11;a1-3&Label=link-9;[DGlcpNAc&Label=residue-10;b1-4&Label=link-8;][DManp&Label=residue-7;[2&Label=link-7;S&Label=residue-9;,3&Label=link-6;Me&Label=residue-8;]a1-6&Label=link-5;DManp&Label=residue-6;a1-6&Label=link-4;]DManp&Label=residue-5;b1-4&Label=link-3;DGlcpNAc&Label=residue-3;[6&Label=link-2;Me&Label=residue-4;]b1-4&Label=link-1;DGlcpNAc&Label=residue-2;b1-&Label=link-0;OH&Label=residue-1;

However for ano-ano linkages, this means the "a" gets lost.
IndexOrdered:
DGlcpa1-1DGlcpa
IndexOrderedLabeled:
DGlcp&Label=residue-2;a1-1&Label=link-0;DGlcp&Label=residue-1;

Essentially it's because we put the alpha of DGlcpa into the linkage, and not the sugar residue. This follows the convention of what scientists do, i.e. we think like this:
image
But what's cleaner to program is that the linkage is e.g. "1-2" and the sugar should be "DGlcpa"

With the current approach I'm losing the a info for Glc
image
I'm not even sure where we would want the a to be in this representation?

@Lachele
Copy link
Member

Lachele commented Nov 29, 2023

There is a legitimate scientific reason for putting the anomeric configuration into the linkage. The monosaccharides generally interconvert between alpha, beta and linear when in solution. It is only at the stage of making a linkage that they are fixed into one of them. So, the creation of the linkage is important. But, I also see your point, and it is possible to have, say, a crystal of DGlcpa1-OH.

I have no issues with splitting out the anomeric config from the linkage data. It might be better to separate it entirely rather than to put it back into the monosaccharide annotation, though I cringe at greater verbosity.

Let's brainstorm a bit about all the reps.

Re the images, my first thought is to put the a & b nearer the glyphs and have the 2-1 float more in the center.

gitoliver added a commit that referenced this issue Dec 4, 2023
More detail of the underlying issue here; #165. For now I've added this code to add the anomericity of the rightmost ano-ano sugar back into the "orderedSequence" generated by gmml. Gems uses the orderedSequence and not the user supplied one, which is why it was not able to build ano-ano, while gmml could.
@gitoliver
Copy link
Contributor Author

I can't do this at the sequenceParser level as it doesn't know which connection is the ano-ano for cases like DGlcpa1-2[LFucpa1-1]DFrufb. i.e. should it be a1-1a or a1-2a? There aren't atoms yet so I can't guess which one is the anomeric atom. Either I bring in metadata telling me it's 2 for Fru, or I do it at a later point when I know what the sugars are.

@Lachele
Copy link
Member

Lachele commented Jan 9, 2024 via email

@gitoliver
Copy link
Contributor Author

I wrote the comment as I'm not going to do it immediately, so just writing out both options.
If I do it in sequenceParser I will have to create metadata specifically for this step telling me the anomeric atom for each residue we handle. Note at that level it's just parsing the string into a graph structure using our rules so you can do DCowpa1-2LMoob1-OH and it won't throw an error until later when it tries to find a Glycam prep entry for "LMoob". sequenceParser was separated from the other logic on purpose, it's a standalone thing that drawGlycan uses. You can draw a 2D SNFG graph of DCowpa1-2LMoob1-OH and other things we don't support in Glycam.
If I go the other way and fix the label when I have the atomic structures then it's easy, but I won't be able to have it be correct in drawGlycan. No-one (including us) is using drawGlycan, and we might end up using 3rd party software anyway, so I'm planning to leave this open for a while to see where we end up. For now it's not impacting anything that's in use so it's ok to leave it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants