Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update compare fasta files #461

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Update compare fasta files #461

wants to merge 25 commits into from

Conversation

Dishalodha
Copy link
Contributor

Updated compare fasta from runnables into a module. This module is mainly focused on comparing 2 fasta files, mainly with INSDC

  1. Updated the logic for comparing Ns in fasta file to identify the low quality data
  2. Removed the comparison for seq_region.json file
  3. Created unit tests

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Member

@ens-LCampbell ens-LCampbell Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add triple quotes description of module here, then use doc in main() description to point to this description.

Copy link
Member

@ens-LCampbell ens-LCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look disha, just a few things I think could be tweaked. I will leave it here for now and return. Do let me know if you disagree or find any issues with my suggested edits

Returns:
argparse.Namespace: Parsed arguments as an argparse Namespace object.
"""
parser = ArgumentParser(description="Compare sequences between two genomes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Compare sequences between two genomes, replace with __doc__

Additionally, Id recommend briefly touching on what is the basis of said comparison just to make the description of the module a little clearer without digging into each diff function.

src/python/ensembl/io/genomio/fasta/compare.py Outdated Show resolved Hide resolved
default=Path.cwd(),
help="Directory to store the comparison report. Defaults to the current working directory.",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line new line break

action="store_true",
help="Enable compare seq_region mode, i.e. use seq_region for seqence comparison",
)
return parser.parse_args(arg_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before return of parser, don't forgot to add parser.add_log_arguments()

src/python/ensembl/io/genomio/fasta/compare.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/fasta/compare.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/fasta/compare.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/fasta/compare.py Outdated Show resolved Hide resolved
"""
Compare two FASTA files for common, unique, and differing sequences.
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a comment to add "Reading fasta" like in below cases ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants