This repository contains some simple Python notebooks and CLI scripts to introduce Python to biologists, used for the MMBDTP Masterclass (May 2024).
Most of the scripts here would work in this conda environment:
# Create an environment with the required libraries
conda create -n pystart -y "python>=3.6" biopython pyfastx pandas seaborn matplotlib ipykernel
conda activate pystart
Note that I will write code compatible with Python 3.6, but you should consider using a recent version. At the time of writing this, 3.12 is the stable version.
In addition to Python, the environment will install:
- Biopython a comprehensive set of bioinformatics functions and tools
- pyfastx a fast FASTQ/FASTA parser (note, there are parses available in Biopython, we use a separate module to show how to deal with multiple dependencies)
pandas
,seaborn
andmatplotlib
are used to show the use of Python as a Data Analysis framework (alternative to R)ipykernel
makes it possible to run the examples in a Python notebook
It's common to approach a programming language writing some code that will generate the "Hello, World!" text. This code in Python looks like:
print("Hello, World!")
This command:
- can be executed in an interactive shell (just type
python
and get the shell; Ctrl-D to exit) - saved as a file (script), for example hello.py, and executed as
python hello.py
- run in a Python notebook
Python notebooks, also known as Jupyter notebooks, are interactive documents that allow users to write and execute Python code in a web browser.
They combine code, text, and visualizations, making it easy to create and share data analysis workflows.
Python notebooks are widely used in data science and research communities for exploratory data analysis, prototyping, and documentation. See: using Jupyter Notebooks from the CLIMB-BIG-DATA documentation.
Since we all are CLI Gurus (from Week1 of the MMBDTP training), we want to write a simple command line script to translate sequences.
It is a conversation starter, so to say, and it should be improved during the workshop, or at least tested to identify weaknesses and potential improvements.
There is an immense amount of training resources for Python, so I will list some to cover different media and learning styles:
- Youtube video: Python in 30 minutes: video, covers the basics with clarity
- Think in Python, 3rd edition: online book
Inevitably, you will need to check the
- Official Documentation
- For example to see what isnumeric() does