-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugs in airr-schema.yaml / airr-schema-openapi3.yaml #813
Comments
Yes, this is a good point that we haven't addressed in #773 yet, and is a criteria we applied to the germline gene set schema. Adding a new class is the solution, though I'm not sure the best name. Maybe |
Hi @LonnekeScheffer , we need to differentiate these issues between v1.5 and v2.0. If v2.0 ends up being broken for awhile, that's ok, all the data is currently in v1.5 so that's the priority. Maybe some of these issues are only present in v2.0 so we can ignore them for now. |
Hi @schristley, I have now updated the list to separate the airr v1.5 issues (which also occur in v2.0) from the v2.0-only issues |
For Likewise for |
@bussec @bcorrie This seems to be incorrect usage by AIRR standards. Gene names are not ontology-based exactly, i.e. each individual gene is not an ontology term. This is more like a controlled vocabulary from an external source. I don't think we have a similar situation elsewhere in the AIRR schema, so I think we need to define this differently. Likewise for |
@LonnekeScheffer this is actually correct, it is just uncommon. What this means is that instead of the keys/properties of the dictionary being pre-defined, they are dynamic. The For LinkML, this gets translated into a |
@bussec any thoughts on this. These are CURIEs with a label, not Ontologies. Should we create a
The format is identical to the Ontology object which is not ideal. The only difference between an Ontology and Labeled Curie is that in the AIRR attributes the So we could get rid of |
I don't think this is a problem in the AIRR Standard. I think this is a problem/limitation with LinkML and the way it represents the slots. In my opinion at least, it is incredibly limiting to have to have a slot named |
These are both Once we resolve how to handle LabeledCURIE and Ontology this should be resolved. |
We thought about this, but... The two different definitions for the same field in different objects is the same problem as here: #813 (comment) and I think needs to be resolved at the AKC/LinkML end. The AIRR Standard doesn't have a problem with having two field names the same with different definitions in different objects. |
I could implement a workaround such as a list with 'illegal' AKC slot names (which would for now only contain 'unit' and 'value'), where if the new slot has such an illegal name, the class name is concatenated to it ('time_point_unit'). My only concern with this is that in any current or future conversion to/from AKC and AIRR, this needs to be explicitly taken into account and some kind of similar or reverse workaround needs to be implemented. The difficulty is not the implementation itself, but to remember (/be aware that it's necessary) to do it.
The same thing goes here. I can implement this, but it is a hack, and the more such explicit workarounds I implement for specific cases, the more likely that it will cause conversion problems in the future. |
I don't consider it a problem either with the AIRR Standard, just a discussion about conventions. I feel like we've always had this informal guideline to make field names unique, thus why we have explicitly named id fields instead of just calling them all Also, these are new fields, so we aren't breaking anything by changing the names. If we want them unique then now is the best time to change them.
I would argue that we still do this in AIRR Standards, regardless of AKC and linkml. We call the field I get your point about AKC and linkml, but setting that aside. Is there any reason why we cannot do this for AIRR Standards? What will break or why is bad to change the names? |
I have always thought this was a problem with LinkML - not sure if it is just the way we are using it or a fundamental issue. Having a global list of slots that are used throughout all classes is like have a global list of variables in a large complex piece of software. It just seems wrong. The fact that within the AKC Schema we can't have two classes A and B with slots named the same unless they are of identical type seems ridiculous to me. If this problem went away then there would be no problems with the schema transformation. |
I don't think that is a good example - repertoire_id is always talking about the same thing. It is an ID for a repertoire, and it appears in many objects, so the name of field identifies what type of ID it is. The only place where there is an extra qualifier on the _id is in the Repertoire object where we could have I would suggest that we gave up on unique field names in the AIRR Standard as soon as we started having complex object structures beyond just having Repertoire and Rearrangement. We now have complex time, quantity, person objects all of which represent similar but different things. As such they have similar fields that are sometime subtly different. |
I don't think it is practical or logical to have unique field names across all objects. Objects encapsulate concepts, and those concepts have attributes. The attributes should be named such that they make sense in the context of the Object, not globally. It makes no sense to me that if we want to add a |
??? but we do have them, there's |
This is an abstract and theoretical concern. In practicality, we haven't run into this situation. Partially, I would argue, because we've been using unique names for semantically distinct concepts, and not using polymorphism such that the same name is used across objects to refer to completely different concepts. All of the other objects avoid this ambiguity, except for these new objects where |
I will let others comment... 8-) |
Some issues encountered when auto-generating LinkML from airr-schema.yaml / airr-schema-openapi3.yaml
These are all relatively easy/small fixes.
Issues for both AIRR version 1.5 and 2.0
release_version
has inconsistent type. InAlleleDescription
, type=integer; inGermlineSet
, type=number. Should be integer in both cases.species
is missing source node in some places. This ontology is correctly defined inSubject
, but inAlleleDescription
andGermlineSet
, the expected fieldx-airr/ontology/top_node/id
is missing.locus
(contains 'null' inRearrangement
, but not inAlleleDescription
,GermlineSet
,Genotype
);mhc_class
(contains 'null' inReactivity
, not inMHCGenotype
)x-airr/ontology/top_node/id
is missing.antigen
in class. 'ReceptorReactivity' (v1.5) / 'Reactivity' (v2.0) the expected fieldx-airr/ontology/top_node/id
is missing.nodes
of classTree
should have$ref: '#/Node'
directly undernodes
, now it is placed under a field calledadditionalProperties
which is not used anywhere elseRepertoireGroup
contains propertyrepertoires
. This is not simply an array ofRepertoire
objects, but rather, an array of items which are unnamed (!) objects, where each object contains arepertoire_id
,repertoire_description
,time_point
. There are no other comparable occurrences in the schema files where new objects are defined in such a way. Since this object is listed under 'items', I cannot infer a name for this object when auto-generating LinkML. Therefore, I would suggest creating a new named class, e.g.,RepertoireReference
, which contains these three fields, and is used as a 'type' for this array.Issues for AIRR version 2.0 only
unit
occurs as an ontology across different classes. This ontology does not have the same meaning. E.g.:TimePoint
(unit=UO:0000003),TimeInterval
(unit=UO:0000033),PhysicalQuantity
(unit=UO:0000024),TimeQuantity
(unit=UO:0000033). The field 'unit' should be renamed to something unique (time_unit, time_interval_unit, physical_quantity_unit)orcid_id
in classContributor
the expected fieldx-airr/ontology/top_node/id
is null.affiliation
in classContributor
the expected fieldx-airr/ontology/top_node/id
is null.reactivity_ref
is defined with two different types. Inside classReactivity
, the type is array. Inside classRearrangement
, the type is string, but the description explicitly states that this string should be a comma separated list. I would propose changing the type to array. (and besides,Rearrangement
also containsreactivity_id
which is a comma separated list as string, the type for that should then probably change as well.)The text was updated successfully, but these errors were encountered: