title	author	date
Introduction to ChIP-Seq and directory setup	Mary Piper, Radhika Khetani, Meeta Mistry	June 28, 2017

Approximate time: 30 minutes

Learning Objectives

understanding the experimental setup and design for ChIP-Seq experiments

Introduction to ChIP-Seq

Chromatin immunoprecipitation (ChIP) experiments isolate the chromatin from a cell and immunoprecipitate (IP) DNA fragments bound to a protein of interest. In ChIP-Seq, the DNA fragments are sequenced, enriched regions of DNA or peaks are determined, and over-represented sequence motifs and functional annotations can be identified.

During this session we will be performing a complete workflow for ChIP-Seq analysis, starting with experimental design and generation of the raw sequencing reads and ending with functional enrichment analyses and motif discovery.

Experimental design and library preparation

Several steps are involved in the library preparation of protein-bound DNA fragments for sequencing:

After the chromatin is isolated from the cell, proteins are cross-linked to the DNA
The DNA is sheared into fragments (sonication)
A protein-specific antibody is used to immunoprecipitate the protein-bound DNA fragments
The crosslink is reversed and DNA purified
DNA fragments are size selected and amplified using PCR

Within the DNA fragments enriched for the regions binding to a protein of interest, only a fraction correspond to actual signal. The proportion of DNA fragments containing the actual binding site of the protein depends on the number of active binding sites, the number of starting genomes, and the efficiency of the IP.

In addition, when performing ChIP-Seq, some sequences may appear enriched due to the following:

Open chromatin regions are fragmented more easily than closed regions
Repetitive sequences might seem to be enriched (copy number inaccuracies in genome assembly)
Uneven distribution of sequence reads across the genome

Therefore, proper controls are essential. A ChIP-Seq peak should be compared with the same region of the genome in a matched control.

The same starting material should be divided to be used for both the protein-specific IP and the control. The control sample can be generated by one of the following recommended techniques:

No IP (input DNA)
No antibody ("mock IP")
Non-specific antibody (IgG "mock IP")

Introduction to example data

Our goal for this session is to compare the the binding profiles of Nanog and Pou5f1 (Oct4). The ChIP was performed on H1 human embryonic stem cell line (h1-ESC) cells, and sequenced using Illumina. The datasets were obtained from the HAIB TFBS ENCODE collection. These 2 transcription factors are involved in stem cell pluripotency and one of the goals is to understand their roles, individually and together, in transriptional regulation.

Two replicates were collected and each was divided into 3 aliquots for the following:

Nanog IP
Pou5f1 IP
Control input DNA

For these 6 samples, we will be using reads from only a 32.8 Mb of chromosome 12 (chr12:1,000,000-33,800,000), so we can get through the workflow in class.

Below is the workflow that we will be using today, similar to RNA-Seq, each step in the workflow will require the data to be in a specific type of standardized format.

Set-up

Before we get started with the analysis, we need to set up our directory structure.

Login to Orchestra and start an interactive session with two cores:

<<<<<<< HEAD
$ bsub -Is -n 2 -q interactive bash

Change directories to the ngs_course directory:

$ cd ~/ngs_course

Create a chipseq directory and change directories into it:

$ mkdir chipseq

$ cd chipseq

Now let's setup the directory structure, we are looking for the following structure within the chipseq directory:

chipseq/
├── logs/
├── meta/
├── raw_data/
├── reference_data/
├── results/
│   ├── bowtie2/
│   ├── trimmed/
│   ├── trimmed_fastqc/
│   └── untrimmed_fastqc/
└── scripts/

$ mkdir -p raw_data reference_data scripts logs meta

$ mkdir -p results/untrimmed_fastqc results/trimmed results/trimmed_fastqc results/bowtie2

Now that we have the directory structure created, let's copy over the data to perform our quality control and alignment, including our FASTQ files and reference data files:

$ cp /groups/hbctraining/ngs-data-analysis-longcourse/chipseq/raw_fastq/*fastq raw_data/

$ cp /groups/hbctraining/ngs-data-analysis-longcourse/chipseq/reference_data/chr12* reference_data/

You should have bcbio in you path, but please check that it is:

$ echo $PATH

If /opt/bcbio/centos/bin is not part of $PATH, add it by adding the following line within your ~/.bashrc file and then run source ~/.bashrc:

export PATH=/opt/bcbio/centos/bin:$PATH

This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_Intro_chipseq_and_setup.md

01_Intro_chipseq_and_setup.md

Learning Objectives

Introduction to ChIP-Seq

Experimental design and library preparation

Introduction to example data

Set-up

Files

01_Intro_chipseq_and_setup.md

Latest commit

History

01_Intro_chipseq_and_setup.md

File metadata and controls

Learning Objectives

Introduction to ChIP-Seq

Experimental design and library preparation

Introduction to example data

Set-up