Enables easy fetch and parsing of NCBI pubmed abstracts into tabular format, and then tags them with gene symbols/accession id's.
On linux (ubuntu), firt install conda or miniconda, install mamba for speed, and then create the environment and activate gpubs environment
mamba env create -f conda_env.yml
conda activate gpubs
from gpubs.models import ReferenceData
from gpubs.api import pipeline
# set num_abstract_xml_files=-1 to get all; will need about 60GB
m=ReferenceData(num_abstract_xml_files=1, version="v1", verbose=2)
pip install gpubs
cd src
Bump the VERSION
variable in release-info.json
, then:
make clean
make
black
the code (changes code in-place to fit style guide)
black src
Lint the code with flak8
flake8 src
Run tests
TBD
cd docs
make clean
make html