-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HELP-290 HELP-334 GlyTouCan IDs masterlist for June submission #2
Comments
[email protected] Based on the June 15 email (see below) I have checked the filtered-out proteins in GlyGen and prepared the reason why they were filtered out (you have also provided the example reasons). Main reason for this issue is asynchronicity between UniProt version releases used by GlyGen and CFDE. This issue will arise when the accessions are obsolete or merged with other entries. We will filter out such accession in the future. However, there is no reason found for the three accessions. They are unreviewed mouse proteins and exist in UniProt. If you know the reason why they were filtered out please let us know. D3YTX5 Edit Delete [email protected] Yes thanks for the explanation for excluded protein entries. As I mentioned it is because of the different versions. For eg Q6ZW33 is a protein in GlyGen but has been recently replaced by O94851. Once the submission is done. I will look into all the excluded entries. For G06850XD yes it is not in the mapping file. I am looking at what file it is in and why it is not a part of the masterlist and mapping file. No action needed from your end for these entries. We are omitting these out for now. Again, thanks for your help and explanation. Edit Delete 👍 Arthur Brady I addressed these below; please reread the comment here for an explanation of why those IDs are failing. As I said below, you will need to remove IDs that have been deleted from UniProt (e.g. Q6ZRZ4); remove IDs that came from the wrong database entirely (e.g. Q6ZW33 is a PRO ID, not a UniProtKB accession); and ensure that you use only primary accessions and not secondary accessions (e.g. P62861 is the primary accession for P35544). As for the glycan G06850XD, there is no mention of that ID in the GlyTouCan ID mapping file you sent me. Best, Arthur Edit Delete [email protected] Edit Delete [email protected] Thanks Arthur, While running the prep_script we came across 1 glycan and 25 proteins that were flagged. I am looking into it and find a reason for their exclusion. We have for now submitted 18 files and there are couple other issues that need to be resolved. These may be arising because of the different UniProt versions we both are using. Glycan Protein |
This is related to glygener/glygen.cfde.generator#18. |
Closing this case: as per my recent comment in issue #4, I've confirmed that Arthur incorporated the attached mapping file (gtc_pubchem_xref_status.csv, MD5 b6e820ac60c0ba0b2633cbb1a58938a8) back in mid June of 2022. See issue #4 for progress/updates on incorporating the latest version from the GlyGen-provided URL. With respect to the 3 mouse UniProt accessions mentioned in the August 17 e-mail above (D3YTX5, D3Z7A4, E9Q7U8), I don't know if this was resolved, but to me it looks like--strictly speaking--those are prefixes of UniProt names, rather than UniProt accessions per se. If I search the protein.tsv.gz file for those 3 ids, I can find them, but not in the 'id' column, only in the 'name' column, with "_MOUSE" as a suffix:
I'd suggest using the actual accession numbers (E9Q8U4, E9Q2E3, E9QA13), as this looks like it may be another instance of the issue that Arthur had already flagged in his June 23rd e-mail, namely ensuring that only primary accessions are used to reference UniProt proteins. |
Attached is the csv dataset that contains all GlyGen GlyTouCans with their status and xrefs with regards to PubChem mapping. Here are a few rows. Let me know if this works.
Best,
Jeet Vora
Senior Research Associate
Scientific Coordinator for GlyGen.org
Project Manager for Glycosciences-NIH CFDE
The George Washington University
Ross Hall, Room 559
2300 Eye Street N.W.
Washington, DC 20052
[email protected]
Pronouns - He/him/his
On Fri, Jun 3, 2022 at 2:37 PM Jeet Vora <[email protected]> wrote:
Attached is the csv dataset that contains all GlyGen GlyTouCans with their status and xrefs with regards to PubChem mapping. Here are a few rows. Let me know if this works.glytoucan_ac
status
xref_id
xref_key
G00023MO
PubChem crossref exists
91846235:252277270
glycan_xref_pubchem_compound:glycan_xref_pubchem_substance
G00024MO
PubChem crossref exists
11375554:252288623
glycan_xref_pubchem_compound:glycan_xref_pubchem_substance
G00025AJ
PubChem crossref exists
91857678:252290930
glycan_xref_pubchem_compound:glycan_xref_pubchem_substance
G00025MO
PubChem crossref exists
5288428:252293186
glycan_xref_pubchem_compound:glycan_xref_pubchem_substance
G00025YC
No PubChem crossref exists
G00026MO
No PubChem crossref exists
G00027JG
No PubChem crossref exists
G00027MO
PubChem crossref exists
91859643:252293273
glycan_xref_pubchem_compound:glycan_xref_pubchem_substance
Best,
Jeet Vora
Senior Research Associate
Scientific Coordinator for GlyGen.org
Project Manager for Glycosciences-NIH CFDE
The George Washington University
Ross Hall, Room 559
2300 Eye Street N.W.
Washington, DC 20052
[email protected]
Pronouns - He/him/his
On Fri, Jun 3, 2022 at 2:37 PM Jeet Vora <[email protected]> wrote:
Hi Arthur,
I can provide you with the dataset as requested. For this release I will share it via email or from online folder but for the next release it will have a stable URL from data.glygen.org
Will share the dataset once compiled.
Best,
Jeet Vora
Senior Research Associate
Scientific Coordinator for GlyGen.org
Project Manager for Glycosciences-NIH CFDE
The George Washington University
Ross Hall, Room 559
2300 Eye Street N.W.
Washington, DC 20052
[email protected]
Pronouns - He/him/his
On Thu, Jun 2, 2022 at 2:32 PM Rene Ranzinger <[email protected]> wrote:
From: Arthur Brady <[email protected]>
Sent: Thursday, June 2, 2022 1:28 PM
To: Rene Ranzinger
Subject: HELP-290: GlyTouCan IDs masterlist for June submissionIN PROGRESS GlyTouCan IDs masterlist for June submission
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]
—-—-—-—
Reply above this line.
Arthur Brady commented:
Summary: we need a way to access an up-to-date map from GlyTouCan IDs to equivalent PubChem IDs. You can provide it to us however you would like.* I would request that whatever format you choose (API or file) be able to express whether or not a given GlyTouCan ID exists at all: i.e. it should recognize GlyTouCan terms with no associated PubChem ID and return a “no PubChem crossref exists” response which is distinct from the “requested GlyTouCan ID doesn’t exist” response.
*as long as your API can handle either (1) lots of little queries, fast, or (2) a bulk query for the whole dataset, because I’ll need to essentially grab the whole thing so we can properly process any incoming IDs.
View requesthttps://cfde.atlassian.net/servicedesk/customer/portal/2/HELP-290?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJxc2giOiJhODBlMGQ3MjNlZjQyZGNhZTViZTA4YzY5YzNjMDMzY2U5OGI4ZWU4MTU4YWY1YzIzNzkzZTA0NjFhMzA5NTJiIiwiaXNzIjoic2VydmljZWRlc2stand0LXRva2VuLWlzc3VlciIsImNvbnRleHQiOnsidXNlciI6IjEwMjc1IiwiaXNzdWUiOiJIRUxQLTI5MCJ9LCJleHAiOjE2NTY2MTAxMzQsImlhdCI6MTY1NDE5MDkzNH0.Ew8Sk0IoFEKnsUF1DYLHgMdsMGEXATwrOUop_rqwAIM&sda_source=notification-email · Turn off this request's notificationshttps://cfde.atlassian.net/servicedesk/customer/portal/2/HELP-290/unsubscribe?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJxc2giOiI2YjdkNmY2YThhODk5MDM5NWM2ODlkOGZiYWQ4ODNmZjQ5ZDg1ZjA0YWRhNDkxMzVmODE1NjAzNTk4ZmU5MDBmIiwiaXNzIjoic2VydmljZWRlc2stand0LXRva2VuLWlzc3VlciIsImNvbnRleHQiOnsidXNlciI6InFtOjAzZDc2NWE2LTdlMDctNGEwYi04ZGUxLThmZmRjMjI2ODc4Zjo4YTUzYmZiYi04NmY5LTRkNzgtYmZiZS0yOGZkNjMwMTg2YzkiLCJpc3N1ZSI6IkhFTFAtMjkwIn0sImV4cCI6MTY1NjYxMDEzNCwiaWF0IjoxNjU0MTkwOTM0fQ.phXc-0Oj38SWlGYd_EKEhKrNvVRXmcVOeivCMIlmTnY
This is shared with [email protected].
Powered by Jira Service Managementhttps://www.atlassian.com/software/jira/service-desk/powered-by?utm_medium=jira-in-product&utm_source=jira_service_desk_email_footer&utm_content=cfde
Sent on June 2, 2022 5:28:54 PM GMT
The text was updated successfully, but these errors were encountered: