Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assemble library candidates from results #6

Open
lzamparo opened this issue Dec 7, 2017 · 2 comments
Open

Assemble library candidates from results #6

lzamparo opened this issue Dec 7, 2017 · 2 comments

Comments

@lzamparo
Copy link
Owner

lzamparo commented Dec 7, 2017

We have 324561 exons split across 94838 transcripts.
We have 242027 unique exons to cover.
We have 1534276 guides that are specific enough for our purposes that cover those exons.

Now we have to choose a guide per exon to make up a feasibly sized library. There are far more guides than are required, and even then too many exons to cover even if we choose one guide / exon.

Will discuss with Turgut tomorrow, but for now I'll see how many exons we need to cover if we consider just the first two exons of each Tx.

@lzamparo
Copy link
Owner Author

lzamparo commented Dec 11, 2017

We have decided to not cover the whole transcriptome, since it does not seem feasible to do so with one person (too many exons to cover). Instead, we are going to cover the Brie exome from the Doench et al library, but with Guide-Scan chosen gRNAs.

This means from 68318 exons, we can target 54641 using guides with no perfect alignments in either hg19 or mm10 allowing up to two mismatches. When allowing up to three mismatches, there do not exist enough guides to hit each exon, but we can choose the guides with the minimal sum of hits in mm10 and hg19 allowing up to three mismatches. Below is the distribution of the sum of the number of genomic hits when allowing up to three mismatches:
guide_specificity

So we need to decide whether to use up our budget for # guides, or to choose these, along with the number of controls.

@lzamparo
Copy link
Owner Author

We would like to have every gene covered by four guides, by convention. To arrive at this total, we agreed to allow the following relaxations:

  1. Allow multiple high-specificity guides per exon
  2. Find guides for all exons (not just first four) in genes not yet targeted by any guides
  3. Relax the constraints on specificity in human

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant