This document contains notes and minutes from conference meetings of the AIRR Common Repository Working Group.
- Bookkeeping
- US Government call for comments on repository requirements
- ADC API issues discussion
- AIRR Data Commons (ADC) API Paper
- On the horizon...
- Bookkeeping
- ADC API issues discussion
- AIRR Data Commons (ADC) API Paper
- On the horizon...
- Bookkeeping
- Gene naming and gene fields
- API discussion
- AIRR Data Commons (ADC) API Paper
- On the horizon...
-
Bookkeeping
-
API discussion
- Updates
- Testing
- Issues
-
AIRR Data Commons (ADC) API Paper
-
On the horizon...
-
Bookkeeping
-
API discussion
- Updates
- Testing
- GitHub repository for testing: https://github.com/airr-community/adc-api-tests
- Issues
-
AIRR Data Commons (ADC) API Paper
- Paper has been ratified by AIRR Community
- Next steps?
-
On the horizon...
-
Bookkeeping
-
Meeting for AIRR Exec meeting
- Presented status at Exec meeting
-
API discussion
- Updates
- Testing
- GitHub repository for testing: https://github.com/airr-community/adc-api-tests
- Issues
-
AIRR Data Commons (ADC) API Paper
- Paper has been ratified by AIRR Community
- Next steps?
-
Bookkeeping
- None
-
API Discussion
-
AIRR Data Commons (ADC) API Paper
- Updates/Discussion
-
Ongoing topics (not on the agenda for discussion):
- API Documentation
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- AIRR Data Commons Registry
- Discussion?
- Issues
- Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
- Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- Ontologies
- API Documentation
-
Bookkeeping
- None
-
API Discussion
- Updates
-
AIRR Data Commons (ADC) API Paper
- Authorship discussion
- Take vote on approval of paper, record vote in minutes
- Take vote on approval to send paper to AIRR Exec and then AIRR-C for endorsement, record vote in minutes
- Discussion points
- Publishing time line
- Maturity of the implementation in the paper
- Limited rearrangement query fields
- See discussion around 2018.04 in this document
- Consenus on approach
- Do we have the right fields (Table in paper)
- Publishing time line
-
Ongoing topics (not on the agenda for discussion):
- API Documentation
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- AIRR Data Commons Registry
- Discussion?
- Issues
- Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
- Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- Ontologies
- API Documentation
-
Bookkeeping
- None
-
API discussion
- Updates
- Test suite for queries
- Issues
-
AIRR Data Commons (ADC) API Paper
- Authorship discussion
- Next steps for writing?
-
Ongoing topics (not on the agenda for discussion):
- API Documentation
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- AIRR Data Commons Registry
- Discussion?
- Issues
- Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
- Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- Ontologies
- API Documentation
-
Book keeping
- None
-
AIRR Recommendations
- Merge of revisions branch to master done, v0.6.0
- Take this off of the agenda?
-
API discussion
- Updates
- Documentation feedback
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- Test suite for queries
- Issues
- Document describing queries
- Summary stats in API responses
-
AIRR Data Commons (ADC) API Paper
- Authorship discussion
- Writing approach
- Should the paper take into account and discuss data privacy and data sharing (in particular GDPR related issues)
-
AIRR Data Commons Registry
- Discussion?
- Issues
-
Ongoing topics (not on the agenda for discussion):
- Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
- Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- Ontologies
- Metadata structure discussion
-
Book keeping
- None
-
AIRR Recommendations
- Recommendataions were ratified
- Need to merge in revisions on branch to master
-
API discussion
- Updates
- Documentation feedback
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- Test suite for queries
- Issues
- Document describing queries
- Summary stats in API responses
-
AIRR CRWG API Paper
-
AIRR Data Commons Registry
- Issues
-
Ongoing topics (not on the agenda for discussion):
- Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
- Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- Ontologies
- Metadata structure discussion
-
Book keeping
- New member introductions
-
AIRR Community Meeting
- Meeting updates from attendees
- Panel update from attendees
-
AIRR Recommendations
- Recommendataions were ratified
- Need to merge in revisions on branch to master
- Review and discuss changes for May AIRR Community meeting
- Created a branch from master for edits
-
API discussion
- Updates
- iReceptor Plus API Hackathon - Genoa Meeting
- iReceptor implementation update
- Documentation feedback
- We have a docs branch that has API documentation - feedback welcome
- https://github.com/airr-community/airr-standards/tree/metadata-docs
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
- Issues
- Document describing queries
- Summary stats in API responses
- Updates
-
Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Related issues:
- Proposed that this JSON structure is what the API returns
-
New Working Group on meta-analysis
- Report on discussions at AIRR Meeting Genoa.
-
Ongoing topics (not on the agenda for discussion):
- Next StepsData Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- AIRR CRWG API Paper
- AIRR Data Commons Registry
- Issues
- Ontologies
- Next StepsData Provenence (Changelogs for repositories)
-
Book keeping
- None
-
AIRR Community Meeting
- Plans for the Sunday panel
- Plans for our working session Saturday morning
- Plans for the presentation Sunday
-
AIRR Recommendations
- Review and discuss changes for May AIRR Community meeting
- Created a branch from master for edits
-
API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Issues
-
Documentation
- Scott has started documenting the API
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
-
Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Review and discuss
- Old issue: airr-community/airr-standards#144
- New issue: airr-community/airr-standards#181
- New issue: airr-community/airr-standards#188
- ramification of the "not 1-to-n" relation betwen
sample
andrepertoire
on MiAIRR and NCBI compatibility. - Link between an entity in the rearrangements API response (repertoire_id, rearrangement_set_id/software_processing_id) and the repertoire API reponse.
- Proposed that this JSON structure is what the API returns
-
New Working Group on meta-analysis
- Proposal from MiniStd (see EMail)
- Any other discussion?
-
Ongoing topics (not on the agenda for discussion):
- Next StepsData Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- AIRR CRWG API Paper
- AIRR Data Commons Registry
- Issues
- Ontologies
- Next StepsData Provenence (Changelogs for repositories)
-
Book keeping
- None
-
AIRR Community Meeting
- Plans for the Sunday panel
- Plans for our working session Saturday morning
- Plans for the presentation Sunday
-
AIRR Recommendations
- Review and discuss changes for May AIRR Community meeting
- Created a branch from master for edits
-
Documentation
- Scott has started documenting the API
- http://docs.airr-community.org/en/metadata-docs/api/overview.html
-
Metadata structure discussion
- Proposed that this JSON structure is what the API returns
- See Scott's documentation above
- Review and discuss
- Old issue: airr-community/airr-standards#144
- New issue: airr-community/airr-standards#181
- New issue: airr-community/airr-standards#188
- ramification of the "not 1-to-n" relation betwen
sample
andrepertoire
on MiAIRR and NCBI compatibility. - Link between an entity in the rearrangements API response (repertoire_id, rearrangement_set_id/software_processing_id) and the repertoire API reponse.
- Proposed that this JSON structure is what the API returns
-
New Working Group on meta-analysis
- Proposal from MiniStd (see EMail)
- Any other discussion?
-
Ongoing topics (not on the agenda for discussion):
- API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Issues
- Next StepsData Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- AIRR CRWG API Paper
- AIRR Data Commons Registry
- Issues
- Ontologies
- API discussion
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- Four confirmed, but only one person has registered for the meeting
- CRWG page on web site
- Request to make change made...
- Panel invitations update
-
AIRR Recommendations
- Review and discuss changes for May AIRR Community meeting
- Created a branch from master for edits
-
Metadata structure discussion
- Review and discuss
- Old issue: airr-community/airr-standards#144
- New issue: airr-community/airr-standards#181
- ramification of the "not 1-to-n" relation betwen
sample
andrepertoire
on MiAIRR and NCBI compatibility. - issue around not having 1-1 relationship between a
rearrangement_set_id
and asample
through the repertoire object.
- Review and discuss
-
New Working Group on meta-analysis
- Proposal from MiniStd (see EMail)
-
Ongoing topics:
- API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Issues
- Next StepsData Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
- AIRR CRWG API Paper
- AIRR Data Commons Registry
- Issues
- Ontologies
- API discussion
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- Comms committee has asked us to look at web site (in prep for May Meeting)
- Our goals are listed for 2018 (as of the 2017 December meeting)
- Should we simply change this to current goals?
- Do we need any other changes?
-
AIRR Recommendations
- Review and discuss
- Created a branch from master for edits
-
Metadata structure discussion
- Review and discuss
- ramification of the "not 1-to-n" relation betwen
sample
andrepertoire
on MiAIRR and NCBI compatibility.
-
API discussion
- Updates
- Issues
- Document describing queries
- Summary stats in API responses
- Next Steps
-
Data Provenence (Changelogs for repositories)
- #26
- Provenance tracking on iReceptor repositories started...
-
Ongoing topics:
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- Comms committee has asked us to look at web site (in prep for May Meeting)
- Our goals are listed for 2018 (as of the 2017 December meeting)
- Should we simply change this to current goals?
- Do we need any other changes?
-
AIRR Recommendations
- Review and discuss
- Created a branch from master for edits
-
API discussion
- Updates
- Issues
- Document describing queries
- Summary stats in API responses
- Next Steps
-
Ontologies
-
Data Provenence (Changelogs for repositories)
-
AIRR Data Commons Registry
-
AIRR CRWG API Paper
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- PIRD and OAS have accepted
- Other candidates?
- Organizing committee has asked us to nominate a chair for the panel discussion
- Panel invitations update
-
Ontologies
-
API discussion
- Updates
- iReceptor has started an implementation (very early days)
- Issues
- Document describing queries
- Summary stats in API responses
- Next Steps
- Updates
-
Data Provenence (Changelogs for repositories)
-
AIRR Data Commons Registry
-
AIRR CRWG API Paper
-
Priorities for next 4 months (May meeting)
- AIRR API
- AIRR Ontologies
- Future of CRWG after May meeting
- Article/paper submission
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- PIRD and OAS have accepted
- Panel invitations update
-
Ontologies
-
API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Next Steps
- Issues
-
Data Provenence (Changelogs for repositories)
-
AIRR Data Commons Registry
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
- PIRD and OAS have accepted
- Panel invitations update
-
Ontologies
-
API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Next Steps
- Implementations for ontologies (see above)
- VDJServer implementation update
- iReceptor implementation plans
- Issues
-
Data Provenence (Changelogs for repositories)
-
AIRR Data Commons Registry
-
Book keeping
- None
-
AIRR Community Meeting
- Panel invitations update
-
Ontologies
-
API discussion
- Issues
- Document describing queries
- Summary stats in API responses
- Issues
-
Data Provenence (Changelogs for repositories)
-
AIRR Data Commons Registry
-
Book keeping
- None
-
AIRR Community Meeting
- Currently two reporting and panel sessions
- https://www.antibodysociety.org/airrc/meetings/communityiv/
- Organizing committee is expectiing two panels - one per session
- One on Repositories/Standards
- One on Germline
- Discussion - see notes
- What do we want
- Focus of the panel?
- How many people?
- Who?
- Repository updates
- OAS: may be attending, have asked if interested in panel
- PIRD: Responded to Felix around visit, will be attending, probably willing to discuss on panel.
- Currently two reporting and panel sessions
-
Ontologies
- AIRR Vocabulary/Ontology sub-working group
- Meetings have occurred on a regular basis, progress being made
- Will fall on us (and Data Rep) to implement
- AIRR Vocabulary/Ontology sub-working group
-
API discussion
- New document created to describe queries
- Summary stats in API responses
-
Book keeping
- None
-
AIRR Community Meeting
- Currently two reporting and panel sessions
- https://www.antibodysociety.org/airrc/meetings/communityiv/
- Organizing committee is expectiing two panels - one per session
- One on Repositories/Standards
- One on Germline
- Other working groups are not doing panels
- Request around moving panels to one session probably not required
- Repository updates
- OAS: may be attending, have asked if interested in panel
- PIRD: Responded to Felix around visit, will be attending, probably willing to discuss on panel.
- Currently two reporting and panel sessions
-
Ontologies
- See: Github issues:
- AIRR Vocabulary/Ontology sub-working group
- Formed, first meeting later today
- Defer discussion here until provided with recommendations from Vocab group?
- MiniStd discussion document here:
-
API discussion
- New document created to describe queries
- Asynchronous API back on the agenda
- Is this urgent or should we focus on the basic API implementation
- Part of the push for the paper, or leave it until later?
- Summary stats in API responses
-
CRWG publication planning
- Where, when, what?
-
Outstanding issues:
-
Repository lists
- See: Github issue #20
- Brian created a list here:
- https://b-t.cr/t/publicly-available-airr-seq-data-repositories/610
- Added 4 repositories
- Next steps?
-
Book keeping
- None
-
AIRR Community Meeting
- Currently two reporting and panel sessions
- CRWG, MiniStd, Biological Reagents scheduled together - report and panel
- Should we suggest CRWG, MiniStd, DataRep together as more logically linked
- Check with Encarnita and Uri/Scott if agreeable?
- Repository updates
- OAS: may be attending, have asked if interested in panel
- PIRD: Responded to Felix around visit, will be attending, probably willing to discuss on panel.
-
Ontologies
- See: Github issues:
- AIRR Vocabulary/Ontology sub-working group
- MiniStd interested in starting a focussed sub-working group
- Focus on short term results, getting things crossed off the list
- Two focussed sessions to get a lot done - November and January
- Report to CRWG and MiniStd, work independently
- Interested parties from all groups welcome...
- MiniStd discussion document here:
-
Repository lists
- See: Github issue #20
- Brian created a list here:
- https://b-t.cr/t/publicly-available-airr-seq-data-repositories/610
- Added 4 repositories
- Next steps?
-
API discussion
- New document created to describe queries
- Discussion around Scott's demonstration
- Asynchronous API back on the agenda
- Next steps
- Summary stats in API responses
-
CRWG publication planning
- Where, when, what?
-
Outstanding issues:
-
Book keeping
- Change of Co-Chair
-
Use of the term "AIRR Data Commons"
- See: GitHub issue #19
- Issue has been closed
-
AIRR Community Meeting - Updates Corey?
- Asked to prepare a panel for discussion around repositories and standards
- Possible repositories:
- AIRR internal: iReceptor, VDJServer, ImmuneDB, SciReptor, VDJdb, ...
- AIRR external: OAS, PIRD, ...
- Actions for interacting with external repositories
- PIRD is using its own standards, how do we reach out to help them see the light 8-)
- Need to coordinate contact with them (other WG have this on their agenda)
-
Repository lists
- See: Github issue #20
- Brian created a list here:
- https://b-t.cr/t/publicly-available-airr-seq-data-repositories/610
- Added 4 repositories
- Discussion
-
API demonstratoin/discussion
- Scott to demonstrate his recent work on the API.
-
Ontologies
- See: Github issues:
- General discussion
- Update on API and MiAIRR Work
- Brian meeting with Christian Friday to discuss
- How to represent them in the API (see above issues)
- In terms of API queries
- In terms of API responses
- More complex examples that are useful to help us with representation
- Cell subset
- Strain
- Ethnicity
- Tissue type
- Disease state/donor status
-
Outstanding issues:
-
Chair person discussion
- Corey stepping down
- Brian has volunteered
- Discussion
-
Use of the term "AIRR Data Commons" - Brian
- See: GitHub issue #19
- Exec has been informed, no objections at this time
- Expected to approve this and all other changes at next AIRR Meeting
-
Book keeping
- Decisions document created
-
Use of the term "AIRR Data Commons" - Brian
- See: GitHub issue #19
- See: New recommendation document - https://github.com/airr-community/common-repo-wg/blob/issue-19/recommendations.md
- Sent to Exec to add to agenda as an information item - Meeting Sep 4th
-
AIRR Community Meeting
- Asked to prepare a panel for discussion around repositories and standards
- Volunteers to coordinate this?
- Possible repositories:
- AIRR internal: iReceptor, VDJServer, ImmuneDB, SciReptor, VDJdb, ...
- AIRR external: OAS, PIRD, ...
- Actions for interacting with external repositories
- PIRD is using its own standards, how do we reach out to help them see the light 8-)
- Need to coordinate contact with them (other WG have this on their agenda)
- Asked to prepare a panel for discussion around repositories and standards
-
Ontologies
- See: Github issues:
- General discussion
- How to represent them in the API (see above issues)
- In terms of API queries
- In terms of API responses
- How to represent them in the API (see above issues)
- Low hanging fruit
- Species: further discussion?
- More complex examples that are useful to help us with representation
- Cell subset
- Strain
- Ethnicity
- Tissue type
- Disease state/donor status
-
Outstanding issues:
- Repertoire and Rearrangement definitions
- See: Github issue #17
- Relate to Minimal Standards and Data Representation working group outcomes?
- Repository lists
- See: Github issue #20
- Create a list of repositories on TCR site
- Eventually have a section for AIRR compliant repositories
- AIRR Repository registry
- See: Github issue #18
- Repertoire and Rearrangement definitions
-
Book keeping
- Brian: Would like to move the decision part of minutes into a separate decision document. Any objections?
- Minutes document is really and agenda document, should we rename?
-
Use of the term "AIRR Data Commons" - Brian
- See: GitHub issue #19
- See: New recommendation document - https://github.com/airr-community/common-repo-wg/blob/issue-19/recommendations.md
- Discussion - approval sought after Lindsay's edits
- Send to executive for approval to make it formal (assume no vote required)
-
Ontologies - Lindsay
- See: Github issue #21
- Minimal standards update - Lindsay/Brian/Christian
- Low hanging fruit
- Species: further discussion?
- More complex examples that are useful to help us with representation
- Cell subset
- Strain
- General discussion
- How to represent them in the API
- In terms of API queries
- In terms of API responses
- How to represent them in the API
-
API Discussion - Brian/Scott
- Repertoire and Rearrangement definitions
- See: Github issue #17
- Relate to Minimal Standards and Data Representation working group outcomes?
- Repository lists
- See: Github issue #20
- Create a list of repositories on TCR site
- Eventually have a section for AIRR compliant repositories
- AIRR Repository registry
- See: Github issue #18
- Repertoire and Rearrangement definitions
-
Use of the term "AIRR Data Commons" - Brian
- See: GitHub issue #19
- See: New recommendation document - https://github.com/airr-community/common-repo-wg/blob/issue-19/recommendations.md
- Discussion - approval sought
- Send to executive for approval to make it formal (assume no vote required)
-
API Discussion - Brian/Scott
- Transition of iReceptor compliant repositories to AIRR compliant repositories
- Repertoire and Rearrangement definitions
- See: Github issue #17
- Relate to Minimal Standards and Data Representation working group outcomes?
- Repository lists
- Create a list of repositories on TCR site (or on docs site)
- Comprehensive list of AIRR related repositories
- Eventually have a section for AIRR compliant repositories
- Create a list of repositories on TCR site (or on docs site)
- Repository performance
- Can we suggest performance levels?
- Where performance is required to accomplish use cases
- Can we report?
- Should we add a performance metric (optional?) to the /info API entry point?
- Or do we just let the repositories do what they do?
- Can we suggest performance levels?
-
Ontologies
- How to define them?
- How does the API use them?
- Are ontology terms required for API requests?
- Are ontology terms required for API responses?
- Is an ontology term acceptable (e.g. Homo sapiens) or is a taxonomy ID required (e.g. 9606)
-
Use of the term "AIRR Data Commons" - Brian
- See: GitHub issue #19
- Definition discussion - critical points
- AIRR Data Commons is a distributed data commons.
-
API Discussion - Brian/Scott
- Transition of iReceptor compliant repositories to AIRR compliant repositories
- Repertoire and Rearrangement definitions
- See: Github issue #17
- Relate to Minimal Standards and Data Representation working group outcomes?
- Asynchronous API
- Registry
-
Tool integration with AIRR Repostiroies
- how to avoid the intermediate download step for large query results
-
Ontologies
- How do we want to tackle this?
-
2019 AIRR Community Agenda
- Discussed at length - feedback provided to AIRR Meeting organizing committee (see email to mailing list).
-
Use of the term "AIRR Data Commons" See: GitHub issue #19
- Discussed briefly, agreement in general that this was a good idea
- Need to ensure that the term is well defined, in particular that the AIRR Data Commons is a distributed data commons.
- Tabled for further discussion at the next meeting.
In previous call, we discussed and agreed on the following:
-
We discussed an asynchronous query mode interface. Whether it should be a single entrypoint that handles both synchronous and asynchronous queries, or separate entry points. The consensus was that separate entry points was a simpler design. We still need to work out details of the interface.
-
We discussed the difference between query fields and returned fields. A discovery interface could inform the client what fields may be queried upon. We can define a core set of fields that all repositories must support query upon, but a specific repository might allow additional.
-
Do we need to define a discovery interface? Does providing the OpenAPI spec have all the information that is needed?
-
Asynchronous query mode interface. We need to hash out details of the interface.
-
Define DataMed submission standard. What will be the query to return the list of AIRR repositories?
-
Continue developing API specification.
- Define an asynchronous query mode interface.
- Define a “discovery” interface
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Specify fields for alternative analytical pipeline data.
- Specify fields for INSDC accession numbers.
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
In previous call, we discussed and agreed on the following:
- Should queries be digital objects stored by the repository and given their own identifiers? This might solve some issues and enable some functionality.
- We will not require that repositories store the query and give it a unique identifier.
- However, we do see the desire for some repositories to support an asychronous query operation versus sychronous. We agreed that defining an asynchronous query mode would be useful but it places a larger burden (computational and infrastructure overhead) on the repository over a simple synchronous query mode. Therefore, we will not require repositories to support it.
- There is a difference between query fields and fields that are returned from a query. CWRG can define two sets, a large set of fields for return and a smaller set of fields for query. Repositories may support additional fields.
- We need to define a “discovery” interface for retrieving the set of fields that can be queried upon.
- Asynchronous query mode interface. Should it be a separate entrypoint(s) or a parameter to the same entrypoint? I think the interaction flow between client/server is different enough between synchronous and asynchronous that it’s better to have separate entrypoint(s).
-
Define DataMed submission standard. What will be the query to return the list of AIRR repositories?
-
Continue developing API specification.
- Define an asynchronous query mode interface.
- Define a “discovery” interface
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Specify fields for alternative analytical pipeline data.
- Specify fields for INSDC accession numbers.
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
In previous call, we discussed and agreed on the following:
-
Discussion about repertoire_id versus rearrangement_set_id. We agreed that we need the capabilities of 1) how the study design defines the repertoire and 2) allow user queries to define those repertoires differently (e.g. CD4 subset). The actual technical solution is still being considered.
-
In a related issue, should queries be digital objects stored by the repository and given their own identifiers? This might solve some issues and enable some functionality:
- By “storing query”, we mean storing the parameters and their values that make up the query request, not the resultant data returned from the query.
- Expensive queries can be run asynchronously. Doing a query returns a query_id instead of the actual query results, and the user polls the service with the query_id to see when the data is available.
- Queries with DOIs would allow those queries to be referenced in journal publications.
- A challenge with query DOIs is whether it should always return the same data, or all data including new data that’s been added to the repository. Returning the same data implies some level of versioning on the data so that new data can be excluded.
- We decided that requiring a query to return the same data was too heavy of a requirement on the repository.
- Show initial prototype of API
- The service is up and running but doesn’t return data.
-
There is a difference between query fields and fields that are returned from a query. CWRG can define two sets, a large set of fields for return and a smaller set of fields for query. Repositories may support additional fields.
-
An initial set of query fields for /repertoire entrypoint. We should walk through MiAIRR fields to see if we want to add any more.
- repertoire_id - as defined by study designer
- rearrangement_set_id - to allow users to query for subsets of the repertoire; should this persist?
- study.title
- study.insdc_id
- study.pub_ids
- subject.organism
- subject.sex
- subject.age
- subject.race
- subject.strain
- subject.study_group_description
- subject.disease_diagnosis
- subject.disease_stage
- subject.immunogen
- sample.insdc_id
- sample.tissue
- sample.anatomic_site
- sample.disease_state_sample
- sample.cell_subset
- sample.cell_phenotype
- sample.template_class
- software.software_version
- software.alternative_analysis
- software.germline_database
- An initial set of query fields for /rearrangement entrypoint.
- rearrangement_id
- rearrangement_set_id
- productive
- locus
- v_call
- d_call
- j_call
- c_call
- junction
- junction_aa
- duplicate_count
- consensus_count
- Defining a larger set of fields for return but which cannot be queried might be useful for rearrangements.
- Sequence
- Alignment information
- Region coordinates
-
Define DataMed submission standard. What will be the query to return the list of AIRR repositories?
-
Continue developing API specification.
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Specify fields for alternative analytical pipeline data.
- Specify fields for INSDC accession numbers.
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
In previous call, we discussed and agreed on the following:
- Should RDF be a return format?
- RDF very expressive
- Let each repo decide, and if one implements it and it proves to be useful, then we can move it into the standard
-
An initial set of query fields for /repertoire and /rearrangement entrypoints.
-
Discussion about repertoire_id versus rearrangement_set_id. We want to provide both capabilities of 1) how the study design defines the repertoire and 2) allow user queries to define those repertoires differently (e.g. CD4 subset), and at the same time avoid confusion about whether a specific data set is the whole repertoire or just a subset.
- Say a user queries two subsets, productive and unproductive, from the same repertoire. If only have repertoire_id (which is identical for the two subsets), there is no identifier that keeps the two subsets “separate”. With rearrangement_set_id, each subset would get a different rearrangement_set_id and thus can be distinguished from each other.
- repertoire_id versus rearrangement_set_id
- After some more thought, the key point is to have an identifier that distinguishes between different queried subsets of a repertoire. Maybe rearrangement_set_id is the wrong name, and query_id is better? Query_id indicates that a set of data are all part of the same query.
- Should we require repositories to store their queries? Should they have DOIs?
- There is precedent for this for provenance/reproducibility. Should we require an exact query to be retrieved given its DOI?
- How to handle a query spanning the /repertoire and /rearrangement entrypoints. Should the same query_id be used? Do you pass the query_id returned by /repertoire into /rearrangement? They are fundamentally different queries though so one query_id represents two queries which is confusing.
-
There is a difference between query fields and fields that are returned from a query. CWRG can define two sets, a large set of fields for return and a smaller set of fields for query. Repositories may support additional fields.
-
An initial set of query fields for /repertoire entrypoint. We should walk through MiAIRR fields to see if we want to add any more.
- repertoire_id - as defined by study designer
- rearrangement_set_id - to allow users to query for subsets of the repertoire; should this persist?
- study.title
- study.insdc_id
- study.pub_ids
- subject.organism
- subject.sex
- subject.age
- subject.race
- subject.strain
- subject.study_group_description
- subject.disease_diagnosis
- subject.disease_stage
- subject.immunogen
- sample.insdc_id
- sample.tissue
- sample.anatomic_site
- sample.disease_state_sample
- sample.cell_subset
- sample.cell_phenotype
- sample.template_class
- software.software_version
- software.alternative_analysis
- software.germline_database
- An initial set of query fields for /rearrangement entrypoint.
- rearrangement_id
- rearrangement_set_id
- productive
- locus
- v_call
- d_call
- j_call
- c_call
- junction
- junction_aa
- duplicate_count
- consensus_count
- Defining a larger set of fields for return but which cannot be queried might be useful for rearrangements.
- Sequence
- Alignment information
- Region coordinates
-
Define DataMed submission standard. What will be the query to return the list of AIRR repositories?
-
Continue developing API specification.
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Specify fields for alternative analytical pipeline data.
- Specify fields for INSDC accession numbers.
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
In previous call, we discussed and agreed on the following:
- We will not define a Registry API. Instead, we will utilize the DataMed (BioCaddie) discovery index. However, CRWG will define a submission standard for DataMed so that AIRR repositories can be easily retrieved.
https://datamed.org/index.php
- The same raw data may be processed through multiple analytical pipelines, (e.g., for comparative purposes) with results from all pipelines in the repository. How do we indicate this and coordinate what gets returned?
- Conclusion: each repository will return only one as the default, but each repository will determine its own default. CRWG API will specify the fields that get returned to alert the user that there are alternatives.
- Will we coordinate repertoire and rearrangement IDs across repositories? If yes, how?
- Conclusion: it’s challenging to enforce a globally unique ID, so CRWG API will require that INSDC accession numbers (e.g., Bioproject, Biosample, SRA) are provided so users can check for duplicates. If those accession numbers aren’t available for a specific study, the user will have to rely upon manually reviewing titles, publications, abstract, etc.
- Should RDF be a return format?
- RDF very expressive
- Perhaps we let each repo decide, and if one implements it and it proves to be useful, then we can move it into the standard
- An initial set of fields for /repertoire entrypoint. CWRG will define a minimal set that all repositories must support for query/return. Alternatively, CWRG can define two sets, a large set of fields for return and a smaller set of fields for query. Repositories may support additional fields. We should walk through MiAIRR fields to see if we want to add any more.
- repertoire_id - as defined by study designer
- rearrangement_set_id - to allow users to query for subsets of the repertoire; should this persist?
- study.title
- study.insdc_id
- study.pub_ids
- subject.organism
- subject.sex
- subject.age
- subject.race
- subject.strain
- subject.study_group_description
- subject.disease_diagnosis
- subject.disease_stage
- subject.immunogen
- sample.insdc_id
- sample.tissue
- sample.anatomic_site
- sample.disease_state_sample
- sample.cell_subset
- sample.cell_phenotype
- sample.template_class
- software.software_version
- software.alternative_analysis
- software.germline_database
- An initial set of fields for /rearrangement entrypoint. CWRG will define a minimal set that all repositories must support for query/return. Alternatively, CWRG can define two sets, a large set of fields for return and a smaller set of fields for query. Repositories may support additional fields.
- rearrangement_id
- rearrangement_set_id
- productive
- locus
- v_call
- d_call
- j_call
- c_call
- junction
- junction_aa
- duplicate_count
- consensus_count
- Defining a larger set of fields for return but which cannot be queried might be useful for rearrangements.
- Sequence
- Alignment information
- Region coordinates
-
Define DataMed submission standard. What will be the query to return the list of AIRR repositories?
-
Continue developing API specification.
- Decide what “entities” we have, and which ones are “linked” through each endpoint?
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Specify fields for alternative analytical pipeline data.
- Specify fields for INSDC accession numbers.
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
In previous calls, we agreed on the following:
- RESTful API, JSON, using the OpenAPI specification. Initial implementation in Github
https://github.com/airr-community/airr-standards/blob/CRWG-API/specs/common_repository_api.yaml
- Two primary endpoints (“gettable” objects with unique ids)
/repertoire
A “sample repertoire” with associated study metadata.
/rearrangement
A “rearrangement object” with associated annotations. A rearrangement object is associated with a sample repertoire by the repertoire’s unique id.
- We will build on the GDC API
https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/
-
It has a well-defined and expressive query language
-
Clients can limit/request what data fields to be returned (versus all data being returned)
-
It has a “loose” data model, which just indicates which entities are “linked” through an endpoint and can thus be queried upon, without specifying how that link is implemented. For example, the “case” endpoint is linked to the “diagnosis” entity, so a query can be performed on diagnosis data by having “diagnosis.” as a prefix to the field name, such as “diagnosis.state”.
-
It has a “facets” parameter which requests a limited form of aggregation capability, specifically for driving summary graphics (pie charts, bar plots, etc.) on a web interface.
-
JSON and TSV formats for return data.
-
It has a “_mapping” endpoint which is a discovery mechanism which provides metadata about the API itself. We probably need to expand upon it.
-
We will specify relationships between entities in the API data model, but each repository can have a different backend data model, which they will need to map to the API data model.
- Key repeating elements to prioritize for computationally precise standardization
- Donor species (e.g., homo sapiens) (subject)
- Donor health status (e.g., diabetes) (subject, primary sample)
- Tissue type (e.g. PBMC) (primary sample)
- Cell subset (e.g. T-cell)
- Sequence type (e.g., TRB) (also from primer selection - experimental protocol)
- Gene usage (e.g., IGHV1-69)
- CDR3 sequence ( e.g., “CASSYIKLN”)
- Receptor specificity (e.g., HIV virus)
- The same raw data may be processed through multiple analytical pipelines, (e.g., for comparative purposes) with results from all pipelines in the repository. How do we indicate this and coordinate what gets returned?
- Ignore this and return everything as if independent? No
- Return only a single one, and indicate alternatives are available? If return only one, which one? Most recent, “official” repo one?
- Give the user the choice up front to query all output or only a single output for each data set.
- Conclusion: each repository will return only one as the default, but each repository will determine its own default. CRWG API will specify the fields that get returned to alert the user that there are alternatives.
- Will we coordinate repertoire and rearrangement IDs across repositories? If yes, how?
- Won’t force repositories to interact to know about duplicates ahead of time, but will regard that relevant fields are well-defined and populated.
- cBioPortal has a mechanism for doing this that may be worth investigating.
- Use INSDC accession numbers (e.g., Bioproject, Biosample, SRA)
- What about when don’t have those?
- PubMed identifier
- After that probably have to manually review titles, abstracts, etc
https://datamed.org/search.php?query=b+cell&searchtype=data
- Should the registry be part of the repository API, or a separate API?
- Same: simpler case, a repository can easily have a default, return itself as a registry and return itself as a repository.
- Separate: requires a separate web service to handle the registry API, which may be overkill for the simple functionality
- Use DataMed (BioCaddie) to register
- DATS for tagging
- Should RDF be a return format?
- Continue developing API specification.
- Decide what “entities” we have, and which ones are “linked” through each endpoint?
- Continue work on ontology specifications.
- Review existing work such as IEDB and NIAID GSCID/BRC metadata.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus make available on https://docs.airr-community.org
We condensed all of the discussions and comments from the brainstorming document.
- RESTful API, JSON, using the OpenAPI specification. Initial implementation in Github
https://github.com/airr-community/airr-standards/blob/CRWG-API/specs/common_repository_api.yaml
- Two primary endpoints (“gettable” objects with unique ids)
/repertoire
A “sample repertoire” with associated study metadata.
/rearrangement
A “rearrangement object” with associated annotations. A rearrangement object is associated with a sample repertoire by the repertoire’s unique id.
- The GDC API looks to be a good specification to build upon
https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/
-
It has a well-defined and expressive query language
-
Clients can limit/request what data fields to be returned (versus all data being returned)
-
It has a “loose” data model, which just indicates which entities are “linked” through an endpoint and can thus be queried upon, without specifying how that link is implemented. For example, the “case” endpoint is linked to the “diagnosis” entity, so a query can be performed on diagnosis data by having “diagnosis.” as a prefix to the field name, such as “diagnosis.state”.
-
It has a “facets” parameter which requests a limited form of aggregation capability, specifically for driving summary graphics (pie charts, bar plots, etc.) on a web interface.
-
JSON and TSV formats for return data.
-
It has a “_mapping” endpoint which is a discovery mechanism which provides metadata about the API itself. We probably need to expand upon it.
- Key repeating elements to prioritize for computationally precise standardization
- Donor species (e.g., homo sapiens) (subject) ontology: NCBI taxonomy
- Donor health status (e.g., diabetes) (subject, primary sample) ontology: Disease Ontology, others?
- Tissue type (e.g. PBMC) (primary sample) ontology: Uberon, others?
- Cell subset (e.g. T-cell) ontology: Cell Ontology, others?
- Sequence type (e.g., TRB) (also from primer selection - experimental protocol) ontology: IMGT, OBI
- Gene usage (e.g., IGHV1-69) ontology: IMGT, Sequence Ontology
- CDR3 sequence ( e.g., “CASSYIKLN”) ontology: IMGT
- Receptor specificity (e.g., HIV virus) ontology: IEDB
The API cannot be completely data model agnostic, but we don’t want to force a specific data model on the repository. The GDC API has a specification for how to do that. We still need to decide what “entities” we have, and which ones are “linked” through each endpoint? We also want to allow repositories to provide their own specialized entities. The discovery mechanism should provide the ability for the client to learn about them. Do we need to place any requirements on this?
- Analysis output (clones, lineage trees)
- Raw data/sequences
- Computation/Analysis workflow
The same raw data may be processed through multiple analytical pipelines, (e.g., for comparative purposes) with results from all pipelines in the repository, do we need to do anything special to indicate this situation?
GDC allows for simple wildcard (*) but do we want the more expressive regular expressions for CDR3 searches? Should that capability be mandatory or optional? Regular expression capability on other fields, like free text fields, as mandatory/optional?
Registry implementation. Should each repository be a registry as well? Should we define a separate API for a registry?
-
Continue developing API specification.
-
Integrate CRWG recommendations, CRWG design documentation, and API documentation into the airr-standards documentation structure, and thus made available on https://docs.airr-community.org
-
Continue work on ontology specifications.
- Review existing work such as NIAID GSCID/BRC metadata.
- Review what IEDB is using.
- Based on review of the above, recommend ontologies for key elements listed above.
- How will these ontologies be integrated in an implementation (client interface, web service API)?
- Who is going to implement the initial API spec and provide feedback to CRWG? How much of a “turnkey” system do/can we provide?