jbloomlab · jbloom · May 23, 2024 · May 23, 2024 · May 23, 2024 · May 23, 2024
diff --git a/.flake8 b/.flake8
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -0,0 +1,45 @@
+name: Run tests
+
+on:
+  push:
+    branches:
+      - master
+  pull_request:
+    branches:
+      - master
+
+jobs:
+  test:
+    name: Run tests
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    steps:
+      - name: checkout
+        uses: actions/checkout@v4
+
+      - name: build conda environment
+        uses: conda-incubator/setup-miniconda@v3
+        with:
+          activate-environment: alignparse
+          environment-file: environment.yml
+          auto-activate-base: false
+          auto-update-conda: true
+          channel-priority: strict
+
+      - name: install package and dependencies
+        # NOTE: must specify the shell so that conda init updates bashrc see:
+        #      https://github.com/conda-incubator/setup-miniconda#IMPORTANT
+        shell: bash -el {0}
+        run: pip install -e . && pip install -r test_requirements.txt
+
+      - name: lint code with ruff
+        shell: bash -el {0}
+        run: ruff check .
+
+      - name: check code format with black
+        shell: bash -el {0}
+        run: black --check .
+
+      - name: test code with `pytest`
+        shell: bash -el {0}
+        run: pytest
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,7 @@ _temp*
 !.travis.yml
 !.flake8
 !.nojekyll
+!.github
 
 *.pyc
 docs/alignparse.*

diff --git a/.travis.yml b/.travis.yml
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -6,6 +6,25 @@ All notable changes to this project will be documented in this file.
 
 The format is based on `Keep a Changelog <https://keepachangelog.com>`_.
 
+0.6.3
+-----
+
+Fixed
++++++
+* Fix bug in handling ``minimap2`` errors ([see this issue](https://github.com/jbloomlab/alignparse/issues/99))
+* Pass formatting with new ``black`` version
+* Pass tests with new ``pandas`` version.
+* Fixed ``simple_mut_consensus`` for newer versions of ``pandas`` when goruping by just one variable.
+
+Changed
++++++++
+* Change code linting to ``ruff`` rather than ``flake8``.
+* Test with GitHub Actions rather than Travis CI.
+* Remove ``mybinder`` examples.
+* Test on Python 3.11 rather than 3.9.
+* Don't allow ``pysam`` version 0.22.1 as it was causing some type of OPENSSL import error.
+* Test with ``minimap2`` version 2.22
+
 0.6.2
 -----
 

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -40,7 +40,7 @@ Formatting
 ++++++++++
 The code is formatted using `Black <https://black.readthedocs.io/en/stable/index.html>`_, which you can install using `pip install "black[jupyter]"`.
 You may also wish to install a Black extension in your editor to, for example, auto-format upon save.
-In any case, please run Black using `black .` before submitting your PR, because the Travis tests will not pass unless the files have been formatted.
+In any case, please run Black using `black .` before submitting your PR, because the tests will not pass unless the files have been formatted.
 Note that this will change files/notebooks that you may be actively editing.
 
 Versions and CHANGELOG
@@ -57,15 +57,6 @@ When you add code that uses a new package that is not in the standard python lib
 `See here <https://packaging.python.org/discussions/install-requires-vs-requirements/>`_ for information on how to do this, and how to specify minimal required versions.
 As described in the above link, you should **not** pin exact versions in `install_requires` in `setup.py <setup.py>`_ unless absolutely necessary.
 
-Notebooks on mybinder
------------------------
-The `Jupyter notebooks`_ in notebooks_ can be run interactively on mybinder_ by going to the following link:
-https://mybinder.org/v2/gh/jbloomlab/alignparse/master?filepath=notebooks
-
-In order for this to work, you need to keep the `environment.yml <environment.yml>`_ configuration file up to date with the dependencies for running these notebooks as `described here <https://mybinder.readthedocs.io/en/latest/config_files.html>`_.
-Note that unlike for the `install_requires` in `setup.py <setup.py>`_, you may want to pin exact versions here to get reproducible installations.
-Look into the `pip freeze <https://pip.pypa.io/en/stable/reference/pip_freeze/>`_ and `conda env export <https://packaging.python.org/discussions/install-requires-vs-requirements>`_ commands on how to automatically create such a configuration file.
-
 Testing
 ---------
 
@@ -87,29 +78,21 @@ If these are not installed, install them with::
 
     pip install -r test_requirements.txt
 
-Then use flake8_ to `lint the code <https://en.wikipedia.org/wiki/Lint_%28software%29>`_ by running::
+Then use ruff_ to `lint the code <https://en.wikipedia.org/wiki/Lint_%28software%29>`_ by running::
 
-    flake8
+    ruff check .
 
-If you need to change the flake8_ configuration, edit the `.flake8 <.flake8>`_ file.
+If you need to change the ruff_ configuration, edit the `ruff.toml <ruff.toml>`_ file.
 
 Then run the tests with pytest_ by running::
 
     pytest
 
 If you need to change the pytest_ configuration, edit the `pytest.ini <pytest.ini>`_ file.
 
-Automated testing on Travis
-+++++++++++++++++++++++++++
-The aforementioned flake8_ and pytest_ tests will be run automatically by the Travis_ continuous integration system as specified in the `.travis.yml <.travis.yml>`_ file.
-Note that running the Travis_ tests requires you to register the project with Travis_.
-
-If the tests are passing, you will see this on the Travis_ badge on GitHub repo main page.
-
-Slack notifications of test results
+Automated testing with GitHub Actions
 +++++++++++++++++++++++++++++++++++++
-You can configure Travis_ to provide automatic Slack notifications of the test results.
-To do that, follow the `instructions here <https://docs.travis-ci.com/user/notifications/#configuring-slack-notifications>`_.
+The aforementioned ruff_ and pytest_ tests will be run automatically by GitHub Actions using the test in [.github/workflows/test.yml](.github/workflows/test.yml).
 
 
 Building documentation
@@ -133,8 +116,7 @@ Finally, upload to PyPI_ with twine_ as `described here <https://github.com/pypa
 Note that this requires you to have registered the package on PyPI_ if this is the first version of the package there.
 
 .. _pytest: https://docs.pytest.org
-.. _flake8: http://flake8.pycqa.org
-.. _Travis: https://docs.travis-ci.com
+.. _ruff: https://github.com/astral-sh/ruff
 .. _PyPI: https://pypi.org/
 .. _pip: https://pip.pypa.io
 .. _sphinx: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

diff --git a/README.rst b/README.rst
@@ -5,18 +5,21 @@ alignparse
 .. image:: https://img.shields.io/pypi/v/alignparse.svg
         :target: https://pypi.python.org/pypi/alignparse
 
-.. image:: https://app.travis-ci.com/jbloomlab/alignparse.svg?branch=master
-        :target: https://app.travis-ci.com/github/jbloomlab/alignparse
-
-.. image:: https://mybinder.org/badge_logo.svg
-        :target: https://mybinder.org/v2/gh/jbloomlab/alignparse/master?filepath=notebooks
+.. image:: https://github.com/jbloomlab/alignparse/actions/workflows/test.yml/badge.svg
+        :target: https://github.com/jbloomlab/alignparse/actions/workflows/test.yml
 
 .. image:: https://zenodo.org/badge/194140958.svg
    :target: https://zenodo.org/badge/latestdoi/194140958
 
 .. image:: https://joss.theoj.org/papers/10.21105/joss.01915/status.svg
    :target: https://doi.org/10.21105/joss.01915
 
+.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
+    :target: https://github.com/psf/black
+
+.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json
+    :target: https://github.com/astral-sh/ruff
+
 ``alignparse`` is a Python package written by `the Bloom lab <https://research.fhcrc.org/bloom/en.html>`_. 
 It is designed to align long sequencing reads (such as those from PacBio circular consensus sequencing) to targets, filter these alignments based on user-provided specifications, and parse out user-defined sequence features.
 For each read that passes the filters, information about the features (e.g. accuracy, sequence, mutations) is retained for further analyses. 

diff --git a/alignparse/__init__.py b/alignparse/__init__.py
@@ -7,5 +7,5 @@
 
 __author__ = "`the Bloom lab <https://research.fhcrc.org/bloom/en.html>`_"
 __email__ = "[email protected]"
-__version__ = "0.6.2"
+__version__ = "0.6.3"
 __url__ = "https://github.com/jbloomlab/alignparse"
diff --git a/alignparse/ccs.py b/alignparse/ccs.py
@@ -8,7 +8,6 @@
 
 """
 
-
 import collections
 import io
 import itertools
@@ -75,7 +74,7 @@ def __init__(self, name, fastqfile, reportfile):
         self.name = name
         self.fastqfile = fastqfile
         if not os.path.isfile(fastqfile):
-            raise IOError(f"cannot find `fastqfile` {fastqfile}")
+            raise OSError(f"cannot find `fastqfile` {fastqfile}")
 
         ccs_stats = get_ccs_stats(self.fastqfile)
         self.passes = ccs_stats.passes
@@ -87,7 +86,7 @@ def __init__(self, name, fastqfile, reportfile):
         if reportfile:
             self.reportfile = reportfile
             if not os.path.isfile(reportfile):
-                raise IOError(f"cannot find `reportfile` {reportfile}")
+                raise OSError(f"cannot find `reportfile` {reportfile}")
             self.zmw_stats = report_to_stats(self.reportfile)
             zmw_stats_nccs = self.zmw_stats[
                 self.zmw_stats["status"].str.match("^Success")
@@ -672,7 +671,7 @@ def report_to_stats(reportfile):
         if df is not None:
             return df
 
-    raise IOError(f"Cannot match report in {reportfile}")
+    raise OSError(f"Cannot match report in {reportfile}")
 
 
 def _reportfile_version_check(reportfile, pattern):

diff --git a/alignparse/consensus.py b/alignparse/consensus.py
@@ -8,7 +8,6 @@
 
 """
 
-
 import collections
 import io  # noqa: F401
 import itertools
@@ -461,9 +460,11 @@ def empirical_accuracy(
         .assign(
             **{
                 mutation_col: (
-                    lambda x: x[mutation_col].map(lambda s: " ".join(sorted(s.split())))
-                    if sort_mutations
-                    else x[mutation_col]
+                    lambda x: (
+                        x[mutation_col].map(lambda s: " ".join(sorted(s.split())))
+                        if sort_mutations
+                        else x[mutation_col]
+                    )
                 )
             }
         )
@@ -475,7 +476,7 @@ def empirical_accuracy(
         .rename("_ngroups")
         .reset_index()
         # get error rate
-        .groupby(upstream_group_cols)
+        .groupby(upstream_group_cols)[["_n", "_u", "_ngroups"]]
         .apply(
             lambda x: 1
             - _LnL_error_rate(
@@ -696,7 +697,7 @@ def simple_mutconsensus(
     dropped = []
     consensus = []
     for g, g_df in df.groupby(group_cols, observed=True)[mutation_col]:
-        if len(group_cols) == 1:
+        if len(group_cols) == 1 and isinstance(g, str):
             g = [g]
 
         nseqs = len(g_df)

diff --git a/alignparse/constants.py b/alignparse/constants.py
@@ -7,7 +7,6 @@
 
 """
 
-
 CBPALETTE = (
     "#999999",
     "#E69F00",

diff --git a/alignparse/cs_tag.py b/alignparse/cs_tag.py
@@ -11,7 +11,6 @@
 
 """
 
-
 import functools
 
 import numpy

diff --git a/alignparse/minimap2.py b/alignparse/minimap2.py
@@ -152,7 +152,7 @@ class Mapper:
     m54228_181120_212724/4194376/ccs 0 refseq 1 1 63M * 0 0
     ATGCAAAATGATGCATAGTATTAGCATAAATAGGATAGCCATAAGGTTACTGCATAAGAGTAT
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-    NM:i:4 ms:i:111 AS:i:111 nn:i:3 tp:A:P cm:i:7 s1:i:41 s2:i:0 de:f:0.0167
+    NM:i:4 ms:i:111 AS:i:111 nn:i:3 tp:A:P cm:i:7 s1:i:41 s2:i:0 de:f:0.0635
     cs:Z::6*na*na*nt:49*ga:4 rl:i:0
     >>> print(tag_names)
     ['NM', 'ms', 'AS', 'nn', 'tp', 'cm', 's1', 's2', 'de', 'cs', 'rl']
@@ -168,7 +168,7 @@ class Mapper:
     m54228_181120_212724/4194376/ccs 0 refseq 1 1 63M * 0 0
     ATGCAAAATGATGCATAGTATTAGCATAAATAGGATAGCCATAAGGTTACTGCATAAGAGTAT
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-    NM:i:4 ms:i:111 AS:i:111 nn:i:3 tp:A:P cm:i:7 s1:i:41 s2:i:0 de:f:0.0167
+    NM:i:4 ms:i:111 AS:i:111 nn:i:3 tp:A:P cm:i:7 s1:i:41 s2:i:0 de:f:0.0635
     cs:Z::6*na*na*nt:49*ga:4 rl:i:0 np:i:127
     >>> print(tag_names)  # doctest: +NORMALIZE_WHITESPACE
     ['NM', 'ms', 'AS', 'nn', 'tp', 'cm', 's1', 's2', 'de', 'cs', 'rl', 'np']
@@ -227,7 +227,7 @@ def map_to_sam(self, targetfile, queryfile, samfile):
         """
         for fname, f in [("target", targetfile), ("query", queryfile)]:
             if not os.path.isfile(f):
-                raise IOError(f"cannot find `{fname}file` {f}")
+                raise OSError(f"cannot find `{fname}file` {f}")
 
         if os.path.splitext(samfile)[1] != ".sam":
             raise ValueError(f"`samfile` lacks extension '.sam': {samfile}")

diff --git a/alignparse/targets.py b/alignparse/targets.py
@@ -8,7 +8,6 @@
 
 """
 
-
 import contextlib
 import copy
 import itertools
@@ -160,7 +159,7 @@ def __init__(
                 )
             if not (allow_extra_features or (feature_name in allow_features)):
                 raise ValueError(f"feature {feature_name} not allowed feature")
-            if bio_feature.strand != 1:
+            if bio_feature.location.strand != 1:
                 raise ValueError(
                     f"feature {feature_name} of {self.name} is - "
                     "strand, but only + strand features handled"
@@ -902,7 +901,7 @@ def map_func(f, *args):
                 if overwrite:
                     os.remove(tup.samfile)
                 else:
-                    raise IOError(f"file {tup.samfile} already exists")
+                    raise OSError(f"file {tup.samfile} already exists")
         _ = map_func(
             self.align, df[queryfile_col], df["samfile"], itertools.repeat(mapper)
         )
@@ -953,7 +952,7 @@ def map_func(f, *args):
         for f in list(filtered.values()) + list(aligned.values()):
             if os.path.isfile(f):
                 if not overwrite:
-                    raise IOError(f"file {f} already exists.")
+                    raise OSError(f"file {f} already exists.")
                 else:
                     os.remove(f)
 
@@ -1131,7 +1130,7 @@ def parse_alignment(
             }
             filenames = list(filtered.values()) + list(aligned.values())
             if (not overwrite_csv) and any(map(os.path.isfile, filenames)):
-                raise IOError(f"existing file with name in: {filenames}")
+                raise OSError(f"existing file with name in: {filenames}")
         else:
             filtered = {t: [] for t in self.target_names}
             aligned = {t: [] for t in self.target_names}

diff --git a/alignparse/utils.py b/alignparse/utils.py
@@ -5,7 +5,6 @@
 
 """
 
-
 import math
 import numbers
 import re
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,7 +7,6 @@ @@
     """
     CBPALETTE = (
         "#999999",
         "#E69F00",
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
Expand Up		@@ -11,7 +11,6 @@

		"""


		import functools

		import numpy
Expand Down