Skip to content

Commit

Permalink
strings,readme
Browse files Browse the repository at this point in the history
  • Loading branch information
telatin committed May 29, 2024
1 parent b59e282 commit 3eccebc
Show file tree
Hide file tree
Showing 2 changed files with 252 additions and 2 deletions.
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,23 @@ used for the *MMBDTP Masterclass* (May 2024).

## Environment

```
Most of the scripts here would work in this [conda environment](https://telatin.github.io/microbiome-bioinformatics/Install-Miniconda/):

```bash
# Create an environment with the required libraries
conda create -n pystart -y "python>=3.6" biopython pyfastx pandas seaborn matplotlib ipykernel
conda activate pystart
```

Note that I will write code compatible with Python 3.6, but you should consider using a recent version.
At the time of writing this, 3.12 is the stable version.

In addition to Python, the environment will install:
* [Biopython](https://biopython.org/) a comprehensive set of bioinformatics functions and tools
* [pyfastx](https://pyfastx.readthedocs.io/en/latest/) a fast FASTQ/FASTA parser (note, there are parses available in Biopython, we use a separate module to show how to deal with multiple dependencies)
* `pandas`, `seaborn` and `matplotlib` are used to show the use of Python as a Data Analysis framework (alternative to R)
* `ipykernel` makes it possible to run the examples in a Python notebook

## Hello World!

It's common to approach a programming language writing some code that will generate the "Hello, World!" text.
Expand Down Expand Up @@ -56,4 +67,8 @@ It is a conversation starter, so to say, and it should be improved during the wo
There is an immense amount of training resources for Python, so I will list some to cover different media and learning styles:

* [Youtube video: Python in 30 minutes](https://youtu.be/kqtD5dpn9C8?si=JzurDYRFLrKs7x3Q): video, covers the basics with clarity
* [Think in Python, 3rd edition](https://allendowney.github.io/ThinkPython/): online book
* [Think in Python, 3rd edition](https://allendowney.github.io/ThinkPython/): online book

Inevitably, you will need to check the
* [Official Documentation](https://docs.python.org/3/index.html)
* For example to see what [isnumeric()](https://docs.python.org/3.9/library/stdtypes.html?highlight=isnumeric#str.isnumeric) does
235 changes: 235 additions & 0 deletions first-steps/04-strings.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Strings\n",
"\n",
"Some examples of strings methods"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['gattaca', 'MMBDTP', 'charles darwin at home', '122']\n"
]
}
],
"source": [
"SomeStrings = [\"gattaca\", \"MMBDTP\", \"charles darwin at home\", \"122\"]\n",
"print(SomeStrings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use a **loop** to print each element of the list:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gattaca\n",
"MMBDTP\n",
"charles darwin at home\n",
"122\n"
]
}
],
"source": [
"for string in SomeStrings:\n",
" print(string)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`len()` returns the length of the string:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gattaca 7\n",
"MMBDTP 6\n",
"charles darwin at home 22\n",
"122 3\n"
]
}
],
"source": [
"for string in SomeStrings:\n",
" print(string, len(string))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can convert the string to **uppercase** or **lowercase** (this can be useful for example when searching motifs in a query sequence avoiding case sensitivity traps):"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gattaca -> GATTACA -> gattaca -> Gattaca\n",
"MMBDTP -> MMBDTP -> mmbdtp -> Mmbdtp\n",
"charles darwin at home -> CHARLES DARWIN AT HOME -> charles darwin at home -> Charles darwin at home\n",
"122 -> 122 -> 122 -> 122\n"
]
}
],
"source": [
"for string in SomeStrings:\n",
" print(string, \" -> \", string.upper(),\" -> \", string.lower(),\" -> \", string.capitalize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"is the string a number?"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gattaca \n",
"\t is decimal? False \n",
"\t is digit? False \n",
"\t is numeric? False\n",
"MMBDTP \n",
"\t is decimal? False \n",
"\t is digit? False \n",
"\t is numeric? False\n",
"charles darwin at home \n",
"\t is decimal? False \n",
"\t is digit? False \n",
"\t is numeric? False\n",
"122 \n",
"\t is decimal? True \n",
"\t is digit? True \n",
"\t is numeric? True\n"
]
}
],
"source": [
"for string in SomeStrings:\n",
" print(string, \"\\n\\t is decimal?\", string.isdecimal(),\"\\n\\t is digit?\", string.isdigit(), \"\\n\\t is numeric?\", string.isnumeric())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Substrings?\n",
"\n",
"The `find()` method will return the (zero based) positio of a substring:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'gattaca': Found at position 1\n",
"'MMBDTP': 'at' not found\n",
"'charles darwin at home': Found at position 15\n",
"'122': 'at' not found\n"
]
}
],
"source": [
"for string in SomeStrings:\n",
" if string.find(\"at\") == -1:\n",
" print(f\"'{string}': 'at' not found\")\n",
" else:\n",
" print(f\"'{string}': Found at position {string.find('at')}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also **replace** strings"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The first string is gattaca\n",
"Replace 'a' with 'A' in the first string: gXttXcX\n"
]
}
],
"source": [
"print(\"The first string is\", SomeStrings[0])\n",
"print(\"Replace 'a' with 'A' in the first string:\", SomeStrings[0].replace(\"a\", \"X\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pystart",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 3eccebc

Please sign in to comment.