update

brouwern · Sep 18, 2021 · a4313d0 · a4313d0
1 parent 642c457
commit a4313d0
Show file tree

Hide file tree

Showing 75 changed files with 6,539 additions and 9,137 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/...ew/00g-review-nucleic_acids-LibreText.Rmd → ...ew/00g-review-nucleic_acids-LibreText.Rmd b/...ew/00g-review-nucleic_acids-LibreText.Rmd → ...ew/00g-review-nucleic_acids-LibreText.Rmd
diff --git a/...biology_review/00h-proteins-LibreText.rmd → ...biology_review/00h-proteins-LibreText.rmd b/...biology_review/00h-proteins-LibreText.rmd → ...biology_review/00h-proteins-LibreText.rmd
diff --git a/003-NCBI/01-NCBI_overview.Rmd → 004-NCBI/01-NCBI_overview.Rmd b/003-NCBI/01-NCBI_overview.Rmd → 004-NCBI/01-NCBI_overview.Rmd
diff --git a/003-NCBI/02-NCBI_genebank_fasta.Rmd → 004-NCBI/02-NCBI_genebank_fasta.Rmd b/003-NCBI/02-NCBI_genebank_fasta.Rmd → 004-NCBI/02-NCBI_genebank_fasta.Rmd
@@ -101,9 +101,9 @@ To view the GenBank entry for the DEN-1 Dengue virus, follow these steps:
 The GenBank entry for an accession contains a LOT of information about the sequence, such as papers describing it, features in the sequence, etc. The **DEFINITION** field gives a short description for the sequence. The **ORGANISM** field in the NCBI entry identifies the species that the sequence came from. The **REFERENCE** field contains scientific publications describing the sequence. The **FEATURES** field contains information about the location of features of interest inside the sequence, such as regulatory sequences or genes that lie inside the sequence. The **ORIGIN** field gives the sequence itself.
 
 
-# ```{r, echo = F, eval = F}
-# knitr::include_graphics(here::here("images/NCBI_accesssion_NC_001477_genbank.png"))
-# ```
+<!-- # ```{r, echo = F, eval = F} -->
+<!-- # knitr::include_graphics(here::here("images/NCBI_accesssion_NC_001477_genbank.png")) -->
+<!-- # ``` -->
 
 
 

diff --git a/003-NCBI/03-NCBI_seqdata_by_GUI1.Rmd → 004-NCBI/03-NCBI_seqdata_by_GUI1.Rmd b/003-NCBI/03-NCBI_seqdata_by_GUI1.Rmd → 004-NCBI/03-NCBI_seqdata_by_GUI1.Rmd
diff --git a/003-NCBI/04-uniprot_by_GUI-AC07-01.Rmd → 004-NCBI/04-uniprot_by_GUI-AC07-01.Rmd b/003-NCBI/04-uniprot_by_GUI-AC07-01.Rmd → 004-NCBI/04-uniprot_by_GUI-AC07-01.Rmd
diff --git a/...ing_seq_data_as_FASTA/11a-FASTA_files.Rmd → 004-NCBI/05-FASTA_files.Rmd b/...ing_seq_data_as_FASTA/11a-FASTA_files.Rmd → 004-NCBI/05-FASTA_files.Rmd
@@ -1,36 +1,29 @@
----
-output: html_document
-editor_options: 
-  chunk_output_type: console
----
-# Introducing FASTA Files {#introducing-FASTA}
+# Introducing FASTA Files {#introducingFASTA}
 
 <!-- TODO: Add images / examples -->
 
 Adapted from [Wikipedia](https://en.wikipedia.org/wiki/FASTA_format): https://en.wikipedia.org/wiki/FASTA_format
 
 <!-- begin wikipedia -->
-"In bioinformatics, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format allows for sequence names and comments to precede the sequences. The format originates from the FASTA alignment software, but has now become a near universal standard in the field of bioinformatics.
+In bioinformatics, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format allows for sequence names and comments to precede the sequences. The format originates from the FASTA alignment software, but has now become a near universal standard in the field of bioinformatics.
 
-"The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages like the R programming language and Python.
+The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages like the R programming language and Python.
 
-"The first line in a FASTA file starts with a ">" (greater-than) symbol and holds summary information about the sequence, often starting with a unique accession number and followed by information like the name of the gene, the type of sequence, and the organism it is from.
+The first line in a FASTA file starts with a ">" (greater-than) symbol and holds summary information about the sequence, often starting with a unique accession number and followed by information like the name of the gene, the type of sequence, and the organism it is from.
 
-"On the  next is the  sequence itself in a standard one-letter character string. Anything other than a valid character is be ignored (including spaces, tabs, asterisks, etc...).
+On the  next is the  sequence itself in a standard one-letter character string. Anything other than a valid character is be ignored (including spaces, tabs, asterisks, etc...).
 
-"A multiple sequence FASTA format can be obtained by concatenating several single sequence FASTA files in a common file (also known as multi-FASTA format). 
+A multiple sequence FASTA format can be obtained by concatenating several single sequence FASTA files in a common file (also known as multi-FASTA format). 
 
-"Following the header line, the actual sequence is represented. Sequences may be protein sequences or nucleic acid sequences, and they can contain gaps or alignment characters. Sequences are expected to be represented in the standard amino acid and nucleic acid codes.  Lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap character; and in amino acid sequences, U and * are acceptable letters.
+Following the header line, the actual sequence is represented. Sequences may be protein sequences or nucleic acid sequences, and they can contain gaps or alignment characters. Sequences are expected to be represented in the standard amino acid and nucleic acid codes.  Lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap character; and in amino acid sequences, U and * are acceptable letters.
 
-"FASTQ format is a form of FASTA format extended to indicate information related to sequencing. It is created by the Sanger Centre in Cambridge.
+FASTQ format is a form of FASTA format extended to indicate information related to sequencing. It is created by the Sanger Centre in Cambridge.
 
-"Bioconductor.org's Biostrings package can be used to read and manipulate FASTA files in R
+Bioconductor.org's Biostrings package can be used to read and manipulate FASTA files in R
 
 <!-- end wikipedia -->
 
-from  https://zhanglab.dcmb.med.umich.edu/FASTA/
-
-"FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length."
+>"FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length." (https://zhanglab.dcmb.med.umich.edu/FASTA/)
 
 ## Example FASTA file
 
@@ -102,18 +95,18 @@ QA~~~~~~~~~~~~~~~~~~~")
 
 Adapted from [Wikipedia](https://en.wikipedia.org/wiki/FASTQ_format): https://en.wikipedia.org/wiki/FASTQ_format
 
-"FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.
+FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.
 
-"It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA formatted sequence and its quality data, but has recently become the de facto standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer.
+It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA formatted sequence and its quality data, but has recently become the de facto standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer.
 
-"A FASTQ file normally uses four lines per sequence.
+A FASTQ file normally uses four lines per sequence.
 
 * Line 1 begins with a `@` character and is followed by a sequence identifier and an optional description (like a FASTA title line).
 * Line 2 is the raw sequence letters.
 * Line 3 begins with a `+` character and is optionally followed by the same sequence identifier (and any description) again.
 * Line 4 encodes the **quality values** for the sequence in Line 2 of the file, and must contain the same number of symbols as letters in the sequence.
 
-"A FASTQ file containing a single sequence might look like this:"
+A FASTQ file containing a single sequence might look like this:"
 
 ```{r eval = T}
 cat("@SEQ_ID
@@ -123,7 +116,7 @@ GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
 ```
 
 
-"Here are the quality value characters in left-to-right increasing order of quality (ASCII):"
+Here are the quality value characters in left-to-right increasing order of quality (ASCII):"
 
 ```{r eval = F}
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

diff --git a/003-NCBI/_09-advanced_NCBI_with_seqinr.Rmd → 004-NCBI/_09-advanced_NCBI_with_seqinr.Rmd b/003-NCBI/_09-advanced_NCBI_with_seqinr.Rmd → 004-NCBI/_09-advanced_NCBI_with_seqinr.Rmd
diff --git a/006-downloading_seq_data_as_FASTA/.DS_Store b/006-downloading_seq_data_as_FASTA/.DS_Store
diff --git a/004-downloading_seq_data_as_FASTA/Icon → 006-downloading_seq_data_as_FASTA/Icon b/004-downloading_seq_data_as_FASTA/Icon → 006-downloading_seq_data_as_FASTA/Icon
diff --git a/...data_as_FASTA/11b-download_FASTA_in_R.Rmd → ...ata_as_FASTA/_11b-download_FASTA_in_R.Rmd b/...data_as_FASTA/11b-download_FASTA_in_R.Rmd → ...ata_as_FASTA/_11b-download_FASTA_in_R.Rmd
diff --git a/..._as_FASTA/12-clean_FASTA_in_R-AC03-01.Rmd → ...as_FASTA/_12-clean_FASTA_in_R-AC03-01.Rmd b/..._as_FASTA/12-clean_FASTA_in_R-AC03-01.Rmd → ...as_FASTA/_12-clean_FASTA_in_R-AC03-01.Rmd
diff --git a/...NA_sequence_descriptive_stats-AC03-02.Rmd → ...NA_sequence_descriptive_stats-AC03-02.Rmd b/...NA_sequence_descriptive_stats-AC03-02.Rmd → ...NA_sequence_descriptive_stats-AC03-02.Rmd
diff --git a/...s_FASTA/14-download_protein_sequences.Rmd → ..._FASTA/_14-download_protein_sequences.Rmd b/...s_FASTA/14-download_protein_sequences.Rmd → ..._FASTA/_14-download_protein_sequences.Rmd
diff --git a/..._data_as_FASTA/15-changes_to_database.Rmd → ...data_as_FASTA/_15-changes_to_database.Rmd b/..._data_as_FASTA/15-changes_to_database.Rmd → ...data_as_FASTA/_15-changes_to_database.Rmd
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-157-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-157-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-164-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-164-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-165-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-165-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-166-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-166-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-167-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-167-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-84-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-84-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-85-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-85-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-87-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-87-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-88-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-88-1.png
diff --git a/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-90-1.png b/_bookdown_files/lbrb_files/figure-html/unnamed-chunk-90-1.png
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-157-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-157-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-164-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-164-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-165-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-165-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-166-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-166-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-167-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-167-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-84-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-84-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-85-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-85-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-87-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-87-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-88-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-88-1.pdf
diff --git a/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-90-1.pdf b/_bookdown_files/lbrb_files/figure-latex/unnamed-chunk-90-1.pdf