Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMID:27856494 transcriptome data (Kuang,Boeke,Canzar) #2296

Open
ValWood opened this issue Dec 9, 2024 · 9 comments
Open

PMID:27856494 transcriptome data (Kuang,Boeke,Canzar) #2296

ValWood opened this issue Dec 9, 2024 · 9 comments

Comments

@ValWood
Copy link
Member

ValWood commented Dec 9, 2024

Column Contents (Example)
1 (A) Data type (Mandatory: Transcripts, Chromatin binding, Nucleosome positioning, Poly(A) sites, Replication origins) Transcripts
2 (B) Track label (Mandatory: See below for format and examples***) Transcripts, meiosis (add track specific differentia) strand-Kuang (2016)
3 (C) Assayed gene product n/a
4 (D) Strain background (Mandatory: h- cdc25-22, leu1-1; Multiple entries allowed) ?
5 (E) WT or mutant (Mandatory: WT; strains with only background mutations are considered WT)
6 (F) Mutant alleles (clr4delta, dfp1-3A; Multiple entries allowed) n/a?
7 (G) Conditions (YES, high temperature; glucose MM, standard temperature + HU)
8 (H) Comment (Free-text field for additional information; Multiple entries allowed)
9 (I) Growth phase or response (Mandatory: Vegetative growth, meiosis, quiescence, glucose starvation, oxidative stress, heat shock; Multiple entries allowed if the track combines data) meiosis
10 (J) Strand (Forward, reverse)
11 (K) Assay type (Mandatory: Tiling microarray, RNA-seq, HT sequencing) RNA-seq
12 (L) First author (surname) (Mandatory: Soriano) Kuang
13 (M) Publication year (Mandatory: 2020) 2016
14 (N) PubMed ID (Mandatory: 31077324) PMID: 27856494
15 (O) Database (GEO, ArrayExpress) GEO
16 (P) Study ID (GSE110976, PRJEB7403) GSE79802
17 (Q) Sample ID (GSM3019628, ERS555567; Multiple entries allowed)
18 (R) Data file type (Mandatory: bigwig, bed)
19 (S) File name (Mandatory: Name given to submitted data file relevant to the track)

https://pubmed.ncbi.nlm.nih.gov/27856494/

  • Format for track label “Assayed gene product” “Data type” “in mutant” “during Growth phase or response” “additional experimental detail of importance (Conditions, Strain background)” “; repeat” “(strand)” “- First author (Publication year)”

  • URL for dataset

  • Other info

transferred from #1441 (comment)

Li-Lin has lab has mapped the reads of this dataset to the reference genome. We will be happy to provide the mapping results to PomBase for loading into the genome browser.

Some novel genes are reported (but I think we likely have them all from Dannys data IIRC)

@ValWood ValWood changed the title PMID: 31077324 transcrotome data (Boake) PMID: 31077324 transcriptome data (Boake) Dec 9, 2024
@ValWood ValWood changed the title PMID: 31077324 transcriptome data (Boake) PMID: 31077324 transcriptome data (Kuang,Boeke,Canzar) Dec 9, 2024
@kimrutherford
Copy link
Member

Is that the wrong PMID? 31077324 is Grech et al.

https://pubmed.ncbi.nlm.nih.gov/27856494/
https://genome.cshlp.org/content/27/1/145.short

@ValWood ValWood changed the title PMID: 31077324 transcriptome data (Kuang,Boeke,Canzar) PMID:27856494 transcriptome data (Kuang,Boeke,Canzar) Dec 10, 2024
@pombase pombase deleted a comment from kimrutherford Dec 10, 2024
@ValWood
Copy link
Member Author

ValWood commented Dec 10, 2024

fixed PMID in header. I had used the example PMID. The template probably doesn't need examples of PMIDs!

@PCarme
Copy link
Contributor

PCarme commented Dec 18, 2024

@ValWood Not entirely sure of what feedback you expect here, but I made a version of the template where I have filled the informations I could find.
dataset_template_PMID_27856494.xlsx
I can use it as a reference to make a general one we could provide to users via the documentation pages

@ValWood
Copy link
Member Author

ValWood commented Dec 24, 2024

Sorry I didn't explain this well. Can discuss on the first call int eh New Year

@kimrutherford
Copy link
Member

Li-Lin Du has sent us the mapped BAM files for this paper. (At least I think that's what he sent to us)

I've processed the BAM files a bit to change the chromosome IDs in the files match what JBrowse needs. I changed "MTR" to "mating_type_region" and "MT" to "mitochondrial".

@kimrutherford
Copy link
Member

The BAM files from Li-Lin Du's group are raw mapped PacBio reads from transcripts. I don't think we have a dataset like that in JBrowse. The screenshot below is from a short read dataset. The PacBio reads will be much more joined-up, often spanning the whole gene. It will interesting to see how that looks in JBrowse.

The paper is partly about their new tool ("SpliceHunter"). It sounds like it clusters the reads into possible isoforms and annotates the isoforms exon and intron changes like exon skipping and intron retention. I'm getting that from this diagram:
https://genome.cshlp.org/content/27/1/145/F1.large.jpg

The dataset in GEO has this supplementary file with details of the possible isoforms: GSE79802_isoforms.txt.gz
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE79802

I haven't worked out what all the columns mean yet. :-)

We might want to consider turning that isoforms file into a GFF3 file (or several) which would summarise the reads and be faster to load.

image

@kimrutherford
Copy link
Member

We might want to consider turning that isoforms file into a GFF3 file (or several) which would summarise the reads and be faster to load.

Maybe not. I've just re-read the email from Li-Lin (subject: "The dynamic landscape of fission yeast meiosis alternative-splice isoforms"). He think the summary might lose information:

I think assembling the reads into transcripts perhaps would result in the loss of the transcript abundance information.

In the email he also has a screenshot of the reads loaded into JBrowse so we can get some idea about how it would look.

The screenshot also has a coverage track which would good to show an overview of read depth. Note to self, there is a guide for that at the end of the docs page: https://github.com/pombase/website/wiki/Formatting-data-files-for-JBrowse#generating-bigwig-coverage-graphs-for-use-at-lower-zoom-levels

I was looking in his email to understand why the GEO record has 18 samples (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE79802), but there are only 6 BAM files. Perhaps there are replicates that have been combined. I should read the paper. :-)

@PCarme
Copy link
Contributor

PCarme commented Jan 7, 2025

We might want to consider turning that isoforms file into a GFF3 file (or several) which would summarise the reads and be faster to load.

I think it would make for a clearer track, although as Li-Lin mentioned, we would lose the abundance measure. I guess there wouldn't really be a way to combine both on the same track ?
Otherwise, we could just host both the isoforms GFF file and the BAM file separately

kimrutherford added a commit to pombase/pombase-config that referenced this issue Jan 8, 2025
kimrutherford added a commit to pombase/pombase-config that referenced this issue Jan 8, 2025
@kimrutherford
Copy link
Member

I've made a start on showing the reads in JBrowse.

Pascal, could you have a look at the metadata file when you have time to see if you can improve it?:
https://github.com/pombase/pombase-config/blob/master/website/jbrowse_track_metadata.csv

The black lines are cases where a read has been mapped to the sequence with a large gap. I think we might want to filter those.

https://www.pombase.org/jbrowse/?loc=I%3A1181495..1197381&tracks=Forward%20strand%20features%2CReverse%20strand%20features%2CTranscripts%20during%20meiosis%20(4h)%20-%20Kuang%20et%20al.%20(2017)&tracklist=1&nav=1&overview=1&highlight=

image

PCarme added a commit to pombase/pombase-config that referenced this issue Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants