-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvertToRefFlat fails while running Drop-seq_tools-2.4.0 #113
Comments
ls, |
Thanks for reporting, indeed, dropseq-tools was apparently developed with human/mouse in mind, which seems to be not what you use (Zea maize?). In particular, some of the annotations out there seems to have missing I'll have to go through your error log in detail, but in the meantime, maybe this is of help: For Arabidopsis, some names have a |
thanks for the response, |
Indeed, non unique gene names are a big problem and probably to be brought to attention to the annotation databases/consortia! So good spot. Did you write code to make those gene names unique, if yes, you could add it here or in issue #109 or so? Maybe this can be integrated in the pipeline as an option. |
i wrote some python code (so not Java), and it is not that nice code (true python programmers will have a lot of comments!). Not really suitable for a production environment. But it works for me, and only when the first attribute in the 9th column is the gene_id (often the case fortunately). Now I just overwrite ALL gene_names, preferably you want to keep the "nice" gene names, many people use those to get biological info. So this should be, I think handled. |
I have problems creating the refflat from the gtf file. It has been reported more often, I followed all suggestion, and as far as I can see the gtfs's look OK. I have one where it works:
with the following error:
Both gf's have the same fields in column 9.
I see: "GTFParser Seen many non-increasing record positions"
what does this exactly mean?
So, what is wrong here? Is the order of the gene_id, transcript_id, gene_name, transcript_name important.
I assume the gene lines do not need a transcript id (btw adding this information to the gene lines did not solve my problems).
For transcript_name I simply used transcript_id
Also the one entry where there is a ";" in the gene name was removed.
And finally, there was a mismatch in the fasta and the gtf wrt the scaffold names (for Mt and Pt) this was also corrected.
Any suggestion is greatly appreciated,
Raymond
The text was updated successfully, but these errors were encountered: