Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 simplifier refactoring and tests #332

Merged
merged 76 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
54048c2
Reorder functions
MatBarba Mar 12, 2024
a8b44da
Load exception
MatBarba Mar 12, 2024
c912b4a
Fail type as a set
MatBarba Mar 12, 2024
61fa945
Move gff loading in record, add test
MatBarba Mar 13, 2024
9dc0eb7
test loading invalid gff3
MatBarba Mar 13, 2024
4ce2c1c
Fixes
MatBarba Mar 13, 2024
1183164
First test for simplifier
MatBarba Mar 13, 2024
a05e757
Add tests for renaming checks
MatBarba Mar 13, 2024
d168504
Add test for skip_unrecognized option
MatBarba Mar 13, 2024
b20f3eb
Die if non-gene non supported
MatBarba Mar 13, 2024
4c6f09f
Fix: mobile element clean to get source
MatBarba Mar 13, 2024
d446e61
Add gff3_feature tests
MatBarba Mar 13, 2024
c20e69a
Ignore notimplemented failsafe
MatBarba Mar 13, 2024
94a29ac
format
MatBarba Mar 13, 2024
f9364e7
simple rename test gff
MatBarba Mar 13, 2024
a562146
Add lone tests
MatBarba Mar 13, 2024
7c65ae6
Multiple fixes for simplifier
MatBarba Mar 13, 2024
3e2cda9
One method to check 1 feature
MatBarba Mar 13, 2024
ea71007
Use same check method
MatBarba Mar 13, 2024
ae53317
Fixes
MatBarba Mar 13, 2024
4b00565
Test lone CDS, fix make basic IDs
MatBarba Mar 13, 2024
2322d2a
Remove lone* from main gff3
MatBarba Mar 13, 2024
f3b22eb
support CDS pseudo
MatBarba Mar 14, 2024
bd3a4f1
Rearrange mobile element code
MatBarba Mar 14, 2024
761d222
Add mobile tests + fixes
MatBarba Mar 14, 2024
2eda364
Test product if any
MatBarba Mar 14, 2024
5b9b6b0
Fixes
MatBarba Mar 14, 2024
e12d328
Move test files in subfolders
MatBarba Mar 15, 2024
4f6d7ee
Show file name in diff
MatBarba Mar 15, 2024
75d4429
Add clean_gene tests
MatBarba Mar 15, 2024
6edc29e
remove checks: would fail anyway
MatBarba Mar 15, 2024
d4d23d5
remove phase check: invalid gff3 anyway
MatBarba Mar 15, 2024
e11633e
Test gene segments
MatBarba Mar 15, 2024
a113f7c
Segments tests
MatBarba Mar 15, 2024
f06d18e
Move remove_cds to restructure + tests
MatBarba Mar 15, 2024
98a9efc
Remove unneeded check
MatBarba Mar 15, 2024
ce28207
Change default skip, update skip tests
MatBarba Mar 21, 2024
883f064
Merge branch 'hackathon/feb24' into mbarba/hack/gff_simplifier
MatBarba Mar 22, 2024
3fef844
Apply suggestions from code review
MatBarba Mar 22, 2024
699a179
Apply suggestions from code review
MatBarba Mar 22, 2024
d55f451
Remove duplicated code, moved to restructure
MatBarba Mar 22, 2024
ce132bc
Fix indentation
MatBarba Mar 22, 2024
193539a
format
MatBarba Mar 22, 2024
72413e2
Remove unnecessary expectations
MatBarba Mar 22, 2024
5113884
Exclude non tested things from the expectation
MatBarba Mar 22, 2024
4fe6adf
Merge branch 'hackathon/feb24' into mbarba/hack/gff_simplifier
MatBarba Mar 25, 2024
540301d
remove 1 test, not very useful
MatBarba Mar 25, 2024
197ab66
Test not implemented
MatBarba Mar 25, 2024
65bc58e
Apply suggestions from code review
MatBarba Mar 25, 2024
3dd4bab
Update how miRNA are processed + test
MatBarba Mar 26, 2024
c604371
More cases, tweak IDs
MatBarba Mar 26, 2024
e99cdf7
More checks, improve code
MatBarba Mar 26, 2024
617f77e
Remove obsolete methods
MatBarba Mar 26, 2024
42f1edd
format
MatBarba Mar 26, 2024
70fcae0
Merge branch 'mbarba/hack/gff_simplifier' into mbarba/hack/gff_mirna
MatBarba Mar 26, 2024
245d91f
More tests
MatBarba Mar 26, 2024
cb5221b
Test miRNA split support
MatBarba Mar 26, 2024
36022b2
Add pseudogene check
MatBarba Mar 27, 2024
412f59b
Add pseudogene miRNA
MatBarba Mar 27, 2024
230de4e
Reorganize miRNA test files
MatBarba Mar 27, 2024
9803dd1
Fix: no change, don't return modified gene
MatBarba Mar 27, 2024
acf8e61
Merge branch 'hackathon/feb24' into mbarba/hack/gff_simplifier
MatBarba Mar 27, 2024
4a5f251
Bugfix: properly store annotation for translations
MatBarba Mar 27, 2024
61ffb3f
Add test for fix
MatBarba Mar 27, 2024
40ea4e6
simplify code to avoid using flags
JAlvarezJarreta Apr 10, 2024
c926678
Restrict what is in the with section
MatBarba Apr 12, 2024
5469ba7
unify test_from_gff
MatBarba Apr 12, 2024
a6033ef
Restrict with
MatBarba Apr 12, 2024
58cf4c3
simp creation outside with
MatBarba Apr 12, 2024
aae6246
tmp_dir -> tmp_path
MatBarba Apr 12, 2024
3f89211
Apply suggestions from code review
MatBarba Apr 12, 2024
d410b98
Typo
MatBarba Apr 12, 2024
a948a90
Update doc
MatBarba Apr 12, 2024
a05f1de
Exception instead of None for simpler_gff3_feature
MatBarba Apr 12, 2024
2195607
check_one_feature no return
MatBarba Apr 12, 2024
efa874b
Update src/python/ensembl/io/genomio/gff3/simplifier.py
MatBarba Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/python/ensembl/io/genomio/gff3/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,22 @@

__all__ = [
"GFFParserError",
"IgnoredFeatureError",
"UnsupportedFeatureError",
]


class GFFParserError(Exception):
"""Error when parsing a GFF3 file."""

def __init__(self, message):
super().__init__(message)
self.message = message


class IgnoredFeatureError(GFFParserError):
"""GFF3 feature can be ignored."""


class UnsupportedFeatureError(GFFParserError):
"""GFF3 feature is not supported."""
9 changes: 3 additions & 6 deletions src/python/ensembl/io/genomio/gff3/extract_annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,13 +298,10 @@ def store_gene(self, gene: SeqFeature) -> None:
"""Record the functional_annotations of a gene and its children features."""
self.add_feature(gene, "gene")

cds_found = False
for transcript in gene.sub_features:
self.add_feature(transcript, "transcript", gene.id)
for feat in transcript.sub_features:
if feat.type != "CDS":
continue
# Store CDS functional annotation only once
if not cds_found:
cds_found = True
if feat.type == "CDS":
self.add_feature(feat, "translation", transcript.id)
# Store CDS functional annotation only once
break
26 changes: 26 additions & 0 deletions src/python/ensembl/io/genomio/gff3/restructure.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
"move_only_exons_to_new_mrna",
"move_cds_to_existing_mrna",
"remove_extra_exons",
"remove_cds_from_pseudogene",
]

from collections import Counter
Expand Down Expand Up @@ -264,3 +265,28 @@ def remove_extra_exons(gene: SeqFeature) -> None:
gene.sub_features += others
else:
raise GFFParserError(f"Can't remove extra exons for {gene.id}, not all start with 'id-'")


def remove_cds_from_pseudogene(gene: SeqFeature) -> None:
"""Removes the CDSs from a pseudogene.

This assumes the CDSs are sub features of the transcript or the gene.

"""
if gene.type != "pseudogene":
return

gene_subfeats = []
for transcript in gene.sub_features:
if transcript.type == "CDS":
logging.debug(f"Remove pseudo CDS {transcript.id}")
else:
new_subfeats = []
for feat in transcript.sub_features:
if feat.type == "CDS":
logging.debug(f"Remove pseudo CDS {feat.id}")
else:
new_subfeats.append(feat)
transcript.sub_features = new_subfeats
gene_subfeats.append(transcript)
gene.sub_features = gene_subfeats
Loading