-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty CDS file from gff2seq #36
Comments
Hi, thanks for giving AnchorWave a try. Do you have the GFF file for another genome, please? |
@baoxingsong Hi Baoxing, some of us in the Buckler Lab recently encountered this and it caused quite a bit of confusion since AnchorWave provides no error or message to explain why the CDS file is empty. Would it be possible to add a message (e.g. "Please check your GFF for genes with a unique ID") when |
I've received correspondence related to this open issue, and want to post further details on how I solved my situation in case it helps others. After comparing with a working GFF, I discovered that the non-working GFF lacked a For others, inspect your non-working GFF and compare how it associates
Props to @agostof for help with this |
Hi, I've encountered the above issue with several different assemblies from independent groups downloaded directly from NCBI, for example one here. There is nothing obviously wrong with the GFF files, and in this case it doesn't lack a Here's the command I'm using
There is no error message, giving the impression the command worked, but the resulting file is empty. It's unclear to me what about the GFF format of this file Anchorwave doesn't like. Any suggestions? |
@jgroh My guess is that AnchorWave's regex is stumbling on the pipes
The And this is the result I get with AnchorWave v1.2.3:
You might have to manually edit your GFF unless @baoxingsong updates the regular expression for you. |
I took a second look at this, and it's actually not just the pipes but the name itself. For example, none of the following
Only |
OK yeah I do think it's a regex issue with the pipes My hunch is that the Edit: So to clarify, the previous three values from this comment actually do work. You just have to update the |
Hello, I have a question for you. When I run the code |
Thanks @matthewwiese for digging into this, finally getting back around to it. For the specific gff I reference above, the issue was fixed by editing the gff to replace the pipe character: @baoxingsong I agree it would be helpful to include an error message for cases like this along the lines of "mRNA and CDS records in GFF contain unrecognized characters" or that links to the proper regex |
Hello,
I am getting an empty CDS file from running gff2seq with no error message.
My command is
anchorwave gff2seq -i GCF_003119195.2_ASM311919v2_genomic.gff -r GCF_003119195.2_ASM311919v2_genomic.fna -o cds.fa
The genomic data is directly from NCBI https://www.ncbi.nlm.nih.gov/genome/95891?genome_assembly_id=1473191
I am running it on Ubuntu 20.04.4 in a conda environment after only installing anchorwave.
The text was updated successfully, but these errors were encountered: