Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to represent clonal lineages in AIRR format? #816

Open
Thopic opened this issue Feb 11, 2025 · 6 comments
Open

Best way to represent clonal lineages in AIRR format? #816

Thopic opened this issue Feb 11, 2025 · 6 comments

Comments

@Thopic
Copy link

Thopic commented Feb 11, 2025

Hi AIRR community,

I'm trying to determine the best way to represent a clonal lineage of B-cells in the AIRR format—that is, a set of cells originating from the same VDJ recombination event but potentially diverging due to affinity maturation.

From what I understand, the current standard suggests using clone_id, but that field is also used to denote actual clones—cells with identical genomes. Additionally, if I understand well, each clone_id is associated with a unique value for junction while different clones in the same clonal lineage can have (and often do have) different junctions.

For context, we're developing a program that identifies clonal lineages from an AIRR-formatted file. The input files often already have a clone_id column marking all reads from the same exact sequence (though potentially from different cells). I want to retain this information but am unsure what to rename this column to—recombination_id seems deprecated.

Semantically, I also find it a bit off to use clone for grouping cells with different genomes. Something like clonal_lineage_id would be clearer (but it's a minor point).

Sorry if something similar was already asked, and thanks for all the great work.

Best,
Thomas

@scharch
Copy link
Contributor

scharch commented Feb 11, 2025

Unfortunately, clone is an ambiguous term in B cell biology - many people use it to denote common descent (ie lineage), while others restrict it, as you do, to "monoclonal" --identical-- sequences.
In the AIRR schema, we use it in the former sense, hence the reference to "inferred" clones here (and more detail on the Clone and Lineage Tree schema).
In bulk data, identical sequences should be collapsed using some combination of duplicate_count and/or umi_count. For single-cell data, cells whose receptors have identical sequences obviously meet both definitions of "clone", but clone_id is not meant to be restricted only to that case.

@Thopic
Copy link
Author

Thopic commented Feb 11, 2025

Thanks for the quick answer, this makes sense!

What about the junction field, what to do if more than one junction belong to the same clone_id ? Define a consensus junction ?

@bcorrie
Copy link
Contributor

bcorrie commented Feb 11, 2025

Yes, that is the intent. The Clone object in the schema has a sequences attribute, which is an array of sequence_id's which allows you to track which sequences your Clone is derived from if you so choose.

@scharch
Copy link
Contributor

scharch commented Feb 11, 2025

it's per rearrangement, so does not have to (should not) be the same across all sequences with the same clone_id

@Thopic
Copy link
Author

Thopic commented Feb 11, 2025

Maybe I misunderstand the Clone and Lineage Tree schema, but I thought clone_id having the identifier attribute meant that there could be only one line per clone?

@scharch
Copy link
Contributor

scharch commented Feb 11, 2025

Sorry. In the Rearrangement schema, junction is per rearrangement. You are correct that in the Clone schema it is per clone. In that case it is supposed to represent the hypothetical germline/naive ancestor rather than the consensus, but you could use the consensus instead if that works better for you.
Please note that the Clone schema is likely to change for v2 of the standard, see #778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants