Skip to content

Commit

Permalink
Merge pull request #13 from not-a-feature/dev
Browse files Browse the repository at this point in the history
Update to v2.0.1: two-key structure
  • Loading branch information
not-a-feature authored Mar 13, 2023
2 parents 993020b + c36e1f5 commit 05e055e
Show file tree
Hide file tree
Showing 8 changed files with 4,117 additions and 3,165 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ['3.7', '3.10', '3.11']

steps:
Expand Down
8 changes: 4 additions & 4 deletions DEPENDENCIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Any information relevant to third-party vendors listed below are collected using

## Dependencies

### [mypy (0.930)](http://www.mypy-lang.org/)
### [mypy](http://www.mypy-lang.org/)

#### Declared Licenses
MIT *OR* Python-2.0
Expand All @@ -17,7 +17,7 @@ MIT *OR* Python-2.0
**Package Homepage**: http://www.mypy-lang.org/
---

### [pytest (7.0.0rc1)](http://pytest.org)
### [pytest](http://pytest.org)

#### Declared Licenses

Expand All @@ -27,7 +27,7 @@ MIT *OR* Python-2.0

---

### [pytest-cov (3.0.0)](https://github.com/pytest-dev/pytest-cov)
### [pytest-cov](https://github.com/pytest-dev/pytest-cov)

#### Declared Licenses
MIT
Expand All @@ -38,7 +38,7 @@ MIT

---

### [tox (4.0.0a9)](https://tox.readthedocs.org/)
### [tox](https://tox.readthedocs.org/)

#### Declared Licenses
MIT
Expand Down
20 changes: 14 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The BLOcks SUbstitution Matrices (BLOSUM) are used to score alignments between p
Reading such matrices is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.
And why do we need to implement the same reader again if there is a simple module for that.

`blosum` offers a robust and easy-to-expand implementation without relying on third-party libraries.
`blosum` offers a robust and easy-to-expand implementation without relying on third-party libraries.


## Installation
Expand All @@ -35,22 +35,24 @@ conda install blosum
```
## How to use

### Default BLOSUM
### Default BLOSUM
This package provides the most commonly used BLOSUM matrices.
You can choose from BLOSUM 45, 50, 62, 80 and 90.

To load a matrix:
```python
import blosum as bl
matrix = bl.BLOSUM(62)
```
val = matrix["A"]["Y"]
```

### Custom matrix
In addition, own matrices can be loaded. For this, the path is given as an argument.

```python
import blosum as bl
matrix = bl.BLOSUM("path/to/blosum.file")
val = matrix["A"]["Y"]
```

The matrices are required to have following format:
Expand All @@ -70,18 +72,24 @@ Once loaded the `matrix` behaves like a `defaultdict`.
To get a value use:

```python
val = matrix["AY"]
val = matrix["A"]["Y"]
```
To get a defaultdict of the row with a given key use:

```python
val_dict = matrix["A"]
```


If the key cannot be found, the default value is returned. It is `float("-inf")`.
If the key cannot be found, the default value `float("-inf")` is returned.
It is possible to set a custom default score:
```python
matrix = bl.BLOSUM(62, default=0)
```

## License
```
Copyright (C) 2022 by Jules Kreuer - @not_a_feature
Copyright (C) 2023 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = blosum
version = 1.2.2
version = 2.0.1
description = A simple BLOSUM toolbox without dependencies.
long_description = file: README.md
long_description_content_type=text/markdown
Expand Down
51 changes: 33 additions & 18 deletions src/blosum/_blosum.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,33 +14,39 @@


class BLOSUM(defaultdict): # type: ignore
def __init__(self, n, default: float = float("-inf")):
def __init__(self, n: Union[int, str], default: float = float("-inf")):
"""
Object to easily access a blosum matrix.
This reader supports asymetric data.
Input:
Either n ϵ {45,50,62,80,90} or path
Input
-----
Either n ϵ {45,50,62,80,90} or path
n: Int, which BLOSUM Matrix to use.
Choice between: 45,50,62,80 and 90
Data gathered from https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/
n: int, which BLOSUM Matrix to use.
Choice between: 45,50,62,80 and 90
Data gathered from https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/
path: String, path to a Blosum matrix.
File in a format like:
https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/BLOSUM62
path: string, path to a Blosum matrix.
File in a format like:
https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/BLOSUM62
default: float, default -inf
"""

self.n = n
self.default = default

# Using default matrix
if n in [45, 50, 62, 80, 90]:
super().__init__(lambda: default, default_blosum[n])
if isinstance(n, int) and n in [45, 50, 62, 80, 90]:
matrix = {}
for k, v in default_blosum[n].items():
matrix[k] = defaultdict(lambda: default, v)
super().__init__(lambda: defaultdict(lambda: default), matrix)

# load custom matrix
elif isinstance(n, str):
super().__init__(lambda: default, loadMatrix(n))
super().__init__(lambda: defaultdict(lambda: default), loadMatrix(n))
else:
raise (
BaseException(
Expand Down Expand Up @@ -73,23 +79,31 @@ def __repr__(self) -> str:
return f"BLOSUM({n}, default={d})"


def loadMatrix(path: str) -> Union[Dict[str, int], Dict[str, float]]:
def loadMatrix(
path: str,
default: float = float("-inf"),
) -> DefaultDict[str, DefaultDict[str, float]]:
"""
Reads a Blosum matrix from file.
File in a format like:
https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/BLOSUM62
Input:
Input
-----
path: str, path to a file.
default: float, default value "-inf"
Returns:
Returns
-------
blosumDict: Dictionary, The blosum dict
"""

with open(path, "r") as f:
content = f.readlines()

blosumDict = {}
blosumDict: DefaultDict[str, DefaultDict[str, float]] = defaultdict(
lambda: defaultdict(lambda: default)
)

header = True
for line in content:
Expand Down Expand Up @@ -117,9 +131,10 @@ def loadMatrix(path: str) -> Union[Dict[str, int], Dict[str, float]]:

# Add Line/Label combination to dict
for index, lab in enumerate(labelslist, start=1):
blosumDict[f"{linelist[0]}{lab}"] = float(linelist[index])
blosumDict[linelist[0]][lab] = float(linelist[index])

# Check quadratic
if not len(blosumDict) == len(labelslist) ** 2:
if not len(blosumDict) == len(labelslist):
raise EOFError("Blosum file is not quadratic.")

return blosumDict
Loading

0 comments on commit 05e055e

Please sign in to comment.