Skip to content

Commit

Permalink
Proof sections 4-8
Browse files Browse the repository at this point in the history
  • Loading branch information
WillHannon-MCB committed Sep 20, 2024
1 parent 259258b commit 3dc3829
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 27 deletions.
2 changes: 1 addition & 1 deletion sections/coding-best-practices/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ aside: true

# {{ $frontmatter.title }}

Hopefully, you're already aware of the importance of [properly organizing your projects](), [tracking your code](), and [organizing complex analyses into automated workflows](). In this final section, I'll highlight some important language-agnostic tips for improving your coding.
Hopefully, you're already aware of the importance of [properly organizing your projects](/sections/organizing-your-projects/), [tracking your code](/sections/tracking-your-code/), and [organizing complex analyses into automated workflows](/sections/creating-workflows-and-pipelines/). In this final section, I'll highlight some important language-agnostic tips for improving your coding.

## Following conventions

Expand Down
18 changes: 9 additions & 9 deletions sections/managing-software-environments/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ aside: true

# {{ $frontmatter.title }}

Biological programming typically involves a combination of writing your own code and using tools developed by others. This can lead to projects with lots of different dependencies, each of which has specific version requirements. It's crucial for reproducibility that these software _environments_ are tracked and managed. Otherwise, you'd be the only one who can run your analysis pipeline. Managing this by hand would be difficult. Thankfully, there are tools like conda for managing software environments.
Biological programming typically involves a combination of writing your own code and using tools developed by others. This can lead to projects with many dependencies, each of which has specific version requirements. It's crucial for reproducibility that these software _environments_ are tracked and managed. Otherwise, you'd be the only one who can run your analysis pipeline. Managing this by hand would be difficult. Thankfully, there are tools like Conda for managing software environments.

## What is Conda?

[Conda](https://docs.conda.io/en/latest/) is an open-source tool for managing both software packages and environments. It allows you to install software and their dependencies in isolated virtual environments. This means you can easily switch between projects, each with its own set of package versions, without worrying about conflicts between dependencies.
[Conda](https://docs.conda.io/en/latest/) is an open-source tool for managing both software packages. It allows you to install software and their dependencies in isolated virtual environments. This means you can easily switch between projects, each with its own set of package versions, without worrying about conflicts between dependencies.

### With Conda, you can

Expand All @@ -23,9 +23,9 @@ We make extensive use of Conda in the Bloom Lab. However, there are alternative

### Conda vs. Mamba

[Mamba](https://mamba.readthedocs.io/en/latest/) is a re-implementation of Conda in C++ that offers the same functionality but is often faster, especially when resolving complex dependencies or handling large environments. Mamba resolves dependencies _much_ faster and often uses less memory. I tend to use Mamba for building environments and Conda for quick operations like activating and deactivating an environment. However, this is just a habit and Mamba's version of Conda's commands are basically identical.
[Mamba](https://mamba.readthedocs.io/en/latest/) is a re-implementation of Conda in C++ that offers the same functionality but is often faster, especially when resolving complex dependencies or handling large environments. I tend to use Mamba for building environments and Conda for quick operations like activating and deactivating an environment. However, this is just a habit, and Mamba's version of Conda's commands are basically identical.

Which should you use? Conda works fine for small, simple environments. Mamba can save you time for larger or more complex environments. If you're setting up Conda from scratch [using the instructions below](), use Mamba. If you've already been using Conda with [miniforge] and you would need to install Mamba, stick with Conda.
Which should you use? Conda works fine for small, simple environments. Mamba can save you time for larger or more complex environments. If you're setting up Conda from scratch [using the instructions below](#installing-miniforge), use Mamba. If you've already been using Conda with an installation of [miniconda](https://docs.anaconda.com/miniconda/), and you would need to install Mamba, stick with Conda.

## Setting up Conda on `Rhino`

Expand All @@ -37,7 +37,7 @@ Conda isn't pre-installed on `Rhino`, so you'll have to set it up yourself. The
A more comprehensive version of the following instructions is available [here](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install).
:::

We're going to install `Mabma` and `Conda` on `Rhino` by downloading `miniforge` into our home directory (the default directory when you log into `Rhino`, i.e. `/home/username`).
We're going to install Mamba and Conda on `Rhino` by downloading `miniforge` into our home directory (the default directory when you log into `Rhino`, i.e. `/home/username`).

Run the following command in home (`/home/username`) to fetch and run the latest `miniforge` build script from the web.

Expand Down Expand Up @@ -80,15 +80,15 @@ After creating the environment, you can activate it using:
conda activate myenv
```

This switches your shell into the isolated environment, allowing you to install packages specific to that project. You'll see the prompt change from something like `(base) username@rhino02:~$ ` to `(myenv) username@rhino02:~$ ` to reflect that `myenv` is active. To deactivate and return to your `base` environment:
This switches your terminal into the isolated environment, allowing you to install packages specific to that project. You'll see the prompt change from something like `(base) username@rhino02:~$ ` to `(myenv) username@rhino02:~$ ` to reflect that `myenv` is active. To deactivate and return to your `base` environment:

```bash
conda deactivate
```

### Making an environment from a file

If someone has already created an environment and shared it with you, or if you want to share yours with others, you can easily set up an environment from an environment file (usually `environment.yml`). This is very common practice in the Bloom lab since we're often running [analysis pipelines]() with similar dependencies.
If someone has already created an environment and shared it with you, or if you want to share yours with others, you can easily set up an environment from an environment file (usually `environment.yml`). This is very common practice in the Bloom lab since we're often running [analysis pipelines](/sections/creating-workflows-and-pipelines/) with similar dependencies.

To create an environment from a file, use:

Expand Down Expand Up @@ -118,7 +118,7 @@ It specifies the name of the environment, like places (channels) to search for e
## Common issues
While Conda is a helpful tool, you're bound to run into issues every once and awhile. Below are some common issues you might encounter and how to troubleshoot them.
While Conda is a helpful tool, you're bound to run into issues occasionally. Below are some common issues you might encounter and how to troubleshoot them.
### My environment is taking a **very** long time to solve
Expand Down Expand Up @@ -154,4 +154,4 @@ Channels are repositories that Conda searches for package versions. Conda search
conda config --set channel_priority strict
```

This means that Conda will always search through the channels you specified in the order they're specified. If that doesn't fix the issue, change the order of the channels until the environment can solve. If that doesn't work, check that the dependencies actually exist in the channels you're searching.
This means that Conda will always search through channels in the order they're specified. If that doesn't fix the issue, change the order of the channels until the environment can solve. If that doesn't work, check that the dependencies actually exist in the channels you're searching.
29 changes: 18 additions & 11 deletions sections/tracking-your-code/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ I use version control for nearly all of the programming I do. It's a crucial ski
By preserving a history of incremental changes over time, you can go back and see what you've done, find out when and why something stopped working, make multiple changes to your project in parallel, and more. In this section, I'll introduce Git, a common tool for version control developed by Linus Torvalds.

:::tip
This section is only a quick introduction to Git. [This free book](https://git-scm.com/book/en/v2) is the definitive resource for people learning Git. I highly recommend skimming chapters 1-3.
[This free book](https://git-scm.com/book/en/v2) is the definitive resource for learning Git. I highly recommend skimming chapters 1-3.
:::

## How does git work?
Expand All @@ -29,15 +29,18 @@ Here's a high-level overview of how `git` tracks your files:

3. **Committing Changes**: After staging, you create a commit with `git commit`. This action records a snapshot of the staged changes and adds it to your project's history. Each commit is uniquely identified, allowing you to track and revert to specific versions.

4. **Local and Remote Repositories**: Git stores your project's history locally, but, _optionally_, you can also link your repository to a [remote server like GitHub or Bitbucket]() which enables collaboration, backup, and access from different machines. I'll talk more about this in the next section.
4. **Local and Remote Repositories**: Git stores your project's history locally, but, _optionally_, you can also link your repository to a remote server like GitHub or Bitbucket which enables collaboration, backup, and access from different machines. I'll talk more about this in the [next section](/sections/working-collaboratively/).

![Working tree](https://git-scm.com/book/en/v2/images/areas.png)
> *Working tree, staging area, and Git directory taken from [Pro Git](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F)*
## Basic git

To get started with Git, you'll need to familiarize yourself with a couple key commands:

**Initialize a Repository**:

You can either create a new Git repository by making a directory and telling git to start tracking it.
You create a new Git repository by telling Git to start tracking the code in a directory.

```bash
mkdir my-project
Expand All @@ -53,12 +56,12 @@ cd my_project
```

:::tip
A quick note about cloning a remote repository. There are several methods for doing this, some of which require a little set up, and those will be covered in the [next section]().
A quick note about cloning a remote repository. There are several methods for doing this, some of which require a little set up, and those will be covered in the [next section](/sections/working-collaboratively/).
:::

**Check Repository Status**:

After you've made some changes to your directory, you can check to see what these are using the following command.
After you've made some changes in your directory, you can check what these are using the following command.

```bash
git status
Expand All @@ -80,7 +83,7 @@ Stages specific files for the next commit. Use `git add .` to stage all changes.
git commit -m "Your commit message"
```

Records a snapshot of the staged changes. The commit message should briefly describe what you've done. It's important to write a good commit message.
Records a snapshot of the staged changes. The commit message should briefly describe what you've done. [It's important to write a good commit message](https://cbea.ms/git-commit/).

**View Commit History**:

Expand Down Expand Up @@ -126,11 +129,11 @@ Branches are a powerful feature in Git that allow you to diverge from the main c

Branching your code with Git allows for parallel development of multiple features or fixes simultaneously without interference. While you're editing the code on a branch, the `main` branch remains stable, and new code is only merged after it's ready. Here are some things you might use branches for:

1. Feature Development: When adding new features, you can create a branch to isolate your work. This way, the main codebase remains unaffected until the feature is complete and tested.
1. **Feature Development**: When adding new features, you can create a branch to isolate your work. This way, the main codebase remains unaffected until the feature is complete and tested.

2. Bug Fixes: For fixing bugs, especially in a production environment, branches let you address issues without disrupting ongoing development work.
2. **Bug Fixes**: For fixing bugs, especially in a production environment, branches let you address issues without disrupting ongoing development work.

3. Experimentation: If you're trying out new ideas or approaches, branches provide a safe space to experiment without the risk of breaking existing code.
3. **Experimentation**: If you're trying out new ideas or approaches, branches provide a safe space to experiment without the risk of breaking existing code.

**See all active branches**:

Expand All @@ -150,9 +153,9 @@ If you want to switch (or checkout) a different branch, run:
git checkout [branch-name]
```

This witches your working directory to the specified branch.
This switches your working directory to the specified branch.

**Create and Switch to a New Branch**:
**Create _and_ Switch to a New Branch**:

To make a new branch, run:

Expand All @@ -177,3 +180,7 @@ git merge [source-branch]
```

This merges the changes in the [source-branch] into [target-branch].

:::tip Note
While this approach works well for merging branches locally, in collaborative projects, you'll typically merge branches using pull requests. I'll cover this in more detail in the [next section](/sections/working-collaboratively/).
:::
39 changes: 33 additions & 6 deletions sections/working-collaboratively/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ aside: true

# {{ $frontmatter.title }}

In [the last section](), I talked about tracking your code with Git. In principle, this can all happen locally in your repository, but in many cases, you want to make a _remote_ copy of a repository and its history. That's where services like [GitHub](https://github.com/) come in. GitHub is a website that acts as a remote repository for your code, allowing you to store it online, collaborate with others, and take advantage of various tools for version control and project management.
In [the last section](/sections/tracking-your-code/), I talked about tracking your code with Git. In principle, this can all happen locally in your repository, but in many cases, you want to make a _remote_ copy of a repository and its history. That's where services like [GitHub](https://github.com/) come in. GitHub is a website that acts as a remote repository for your code, allowing you to store it online, collaborate with others, and take advantage of various tools for version control and project management.

## Using GitHub as a 'Remote'

Expand All @@ -23,7 +23,7 @@ A _remote_ in Git is simply a version of your project that's hosted on the inter
- **Version Control**: Keeps track of every change made to the codebase, allowing you to revert to previous versions if needed.
- **Documentation Hosting**: With [GitHub Pages](https://pages.github.com/), you can host project documentation or even entire websites directly from your repository.

In our lab, we use GitHub to coordinate coding efforts, especially for projects that will be part of publications. By connecting your local git repository to GitHub, you make it easier to share your work and collaborate effectively.
In our lab, we use GitHub to coordinate coding efforts, especially for projects that will be part of publications. By connecting your local Git repository to GitHub, you make it easier to share your work and collaborate effectively.

### Bloom Lab GitHub Organizations

Expand All @@ -40,19 +40,46 @@ But how do you decide when to host your local project remotely? Generally, I hos

### Connecting a _local_ repository to a _remote_

As I mentioned in [the previous section](), Git is the version control system you use locally on your machine, while GitHub is the remote hosting service where you can store your repositories online. To connect a local repository to a GitHub remote you:
As I mentioned in [the previous section](/sections/tracking-your-code/), Git is the version control system you use locally on your machine, while GitHub is the remote hosting service where you can store your repositories online. To connect a local repository to a GitHub remote you:

1. **Initialize a Local Repository**: You can start from scratch by creating a local Git repository with `git init`. However, you can also connect an existing repository to a GitHub remote repository at any time.
1. **Initialize a Local Repository**: You can start from scratch by creating a local Git repository with `git init`. However, **you can also connect an existing repository to a GitHub remote repository at any time**.

2. **Create a Remote Repository**: On GitHub, you create a new repository to host your project in the appropriate organization or account.

3. **Link Local and Remote Repositories**: When you create a new GitHub repository, you'll get instructions for connecting your local repository to the GitHub repository.

4. **Push and Pull Changes**: Now, you can use `git push` to upload your local commits to GitHub and `git pull` to update your local repository with changes from GitHub.

### Cloning a _remote_ repository _locally_

You can also clone an existing repository on GitHub onto your local machine. The local clone will be automatically connected to the remote repository it was cloned from, allowing you to pull updates or push changes.

To clone a repository, you can use the following command in your terminal, replacing [repository-url] with the URL of the repository you want to clone:

```bash
git clone [repository-url]
```

There are two main protocols for cloning repositories: HTTP and SSH.

- **HTTP**: No extra configuration, but you’ll need to enter your username and a personal access token every time you push changes.
- **SSH**: A more secure and convenient method, which allows you to authenticate using SSH keys (no passwords!).

:::tip
Only fools use HTTP.
:::

#### Setting up SSH for GitHub

Using SSH is preferred because it’s more secure and doesn’t require you to input credentials each time you push or pull changes. Once you’ve set up your SSH keys, GitHub will trust your machine, allowing you to interact with your repositories seamlessly.

Setting up SSH for GitHub is similar to [setting up SSH for `Rhino`](/sections/using-remote-resources/). The main difference is that you'll add your public key to your account on GitHub. You should set this up for any computer you use GitHub with.

Here are detailed and up-to-date instructions for [connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh).

## Collaborative Workflows

How do you work on code with others productively? It requires some structure to prevent conflicts and ensure code quality. Note, the following workflow mainly applies to projects where you're actively coding with other people. Small projects or projects in their early stages don't require this level of organization.
How do you work on code with others productively? It requires some structure to prevent conflicts and ensure code quality. Note, the following workflow mainly applies to projects where you're actively coding with other people. Small projects or projects in their early stages don't require this level of organization.

### Best Practices

Expand Down Expand Up @@ -85,7 +112,7 @@ When multiple collaborators edit the same parts of a file, Git may not be able t
3. **Edit the File**: Remove the conflict markers and make the necessary edits.
4. **Commit the Resolution**: After resolving the conflicts, commit the changes to complete the merge.

Generally, you're unlikely to run into merge conflicts while coding in the Bloom Lab.
Generally, you're unlikely to run into many merge conflicts while coding in the Bloom Lab.

### Issues and Discussions

Expand Down

0 comments on commit 3dc3829

Please sign in to comment.