Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

gitoliver · 2023-11-29T11:48:27Z

For non ano-ano linkages our current labeling system works fine:
IndexOrdered:
DNeu5Aca2-6DGalpb1-4DGlcpNAc[3S]b1-2DManpa1-3[DGlcpNAcb1-4][DManp[2S,3Me]a1-6DManpa1-6]DManpb1-4DGlcpNAc[6Me]b1-4DGlcpNAcb1-OH
IndexOrderedLabeled:
DNeu5Ac&Label=residue-14;a2-6&Label=link-12;DGalp&Label=residue-13;b1-4&Label=link-11;DGlcpNAc&Label=residue-12;[3&Label=link-13;S&Label=residue-15;]b1-2&Label=link-10;DManp&Label=residue-11;a1-3&Label=link-9;[DGlcpNAc&Label=residue-10;b1-4&Label=link-8;][DManp&Label=residue-7;[2&Label=link-7;S&Label=residue-9;,3&Label=link-6;Me&Label=residue-8;]a1-6&Label=link-5;DManp&Label=residue-6;a1-6&Label=link-4;]DManp&Label=residue-5;b1-4&Label=link-3;DGlcpNAc&Label=residue-3;[6&Label=link-2;Me&Label=residue-4;]b1-4&Label=link-1;DGlcpNAc&Label=residue-2;b1-&Label=link-0;OH&Label=residue-1;

However for ano-ano linkages, this means the "a" gets lost.
IndexOrdered:
DGlcpa1-1DGlcpa
IndexOrderedLabeled:
DGlcp&Label=residue-2;a1-1&Label=link-0;DGlcp&Label=residue-1;

Essentially it's because we put the alpha of DGlcpa into the linkage, and not the sugar residue. This follows the convention of what scientists do, i.e. we think like this:

But what's cleaner to program is that the linkage is e.g. "1-2" and the sugar should be "DGlcpa"

With the current approach I'm losing the a info for Glc

I'm not even sure where we would want the a to be in this representation?

Lachele · 2023-11-29T13:01:37Z

There is a legitimate scientific reason for putting the anomeric configuration into the linkage. The monosaccharides generally interconvert between alpha, beta and linear when in solution. It is only at the stage of making a linkage that they are fixed into one of them. So, the creation of the linkage is important. But, I also see your point, and it is possible to have, say, a crystal of DGlcpa1-OH.

I have no issues with splitting out the anomeric config from the linkage data. It might be better to separate it entirely rather than to put it back into the monosaccharide annotation, though I cringe at greater verbosity.

Let's brainstorm a bit about all the reps.

Re the images, my first thought is to put the a & b nearer the glyphs and have the 2-1 float more in the center.

More detail of the underlying issue here; #165. For now I've added this code to add the anomericity of the rightmost ano-ano sugar back into the "orderedSequence" generated by gmml. Gems uses the orderedSequence and not the user supplied one, which is why it was not able to build ano-ano, while gmml could.

gitoliver · 2024-01-05T12:39:37Z

I can't do this at the sequenceParser level as it doesn't know which connection is the ano-ano for cases like DGlcpa1-2[LFucpa1-1]DFrufb. i.e. should it be a1-1a or a1-2a? There aren't atoms yet so I can't guess which one is the anomeric atom. Either I bring in metadata telling me it's 2 for Fru, or I do it at a later point when I know what the sugars are.

Lachele · 2024-01-09T03:24:58Z

Is there a reason not to bring in the metadata? I think this is one big reason for the metadata to exist.

…

On Fri, Jan 5, 2024 at 7:39 AM Oliver ***@***.***> wrote: I can't do this at the sequenceParser level as it doesn't know which connection is the ano-ano for cases like DGlcpa1-2[LFucpa1-1]DFrufb. i.e. should it be a1-1a or a1-2a? There aren't atoms yet so I can't guess which one is the anomeric atom. Either I bring in metadata telling me it's 2 for Fru, or I do it at a later point when I know what the sugars are. — Reply to this email directly, view it on GitHub <#165 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCV7D76E64AJGQMWMDTDODYM7YBHAVCNFSM6AAAAAA77KMR2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZYGYYDGOBTGI> . You are receiving this because you were assigned.Message ID: ***@***.***>

-- :-) Lachele ***@***.*** lachelefoley.com I'm happy people are learning that sleep is important. Now, please stop trying to kill the nocturnal folks.

gitoliver · 2024-01-09T09:08:49Z

I wrote the comment as I'm not going to do it immediately, so just writing out both options.
If I do it in sequenceParser I will have to create metadata specifically for this step telling me the anomeric atom for each residue we handle. Note at that level it's just parsing the string into a graph structure using our rules so you can do DCowpa1-2LMoob1-OH and it won't throw an error until later when it tries to find a Glycam prep entry for "LMoob". sequenceParser was separated from the other logic on purpose, it's a standalone thing that drawGlycan uses. You can draw a 2D SNFG graph of DCowpa1-2LMoob1-OH and other things we don't support in Glycam.
If I go the other way and fix the label when I have the atomic structures then it's easy, but I won't be able to have it be correct in drawGlycan. No-one (including us) is using drawGlycan, and we might end up using 3rd party software anyway, so I'm planning to leave this open for a while to see where we end up. For now it's not impacting anything that's in use so it's ok to leave it.

gitoliver added the question label Nov 29, 2023

gitoliver assigned Lachele and gitoliver Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

gitoliver commented Nov 29, 2023

Lachele commented Nov 29, 2023

gitoliver commented Jan 5, 2024

Lachele commented Jan 9, 2024 via email

gitoliver commented Jan 9, 2024

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165

Comments

gitoliver commented Nov 29, 2023

Lachele commented Nov 29, 2023

gitoliver commented Jan 5, 2024

Lachele commented Jan 9, 2024 via email

gitoliver commented Jan 9, 2024