Skip to content

Commit

Permalink
Fix spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
richelbilderbeek committed Jan 16, 2024
1 parent d9548e8 commit 8a27b4c
Show file tree
Hide file tree
Showing 16 changed files with 310 additions and 22 deletions.
286 changes: 286 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,290 @@ txt
varepsilon
wikipage
xy
config
json
mlc
wordlist
yml
Canidae
Canis
Cerdocyon
Corsac
Dusicyon
Lycaon
Otocyon
Vulpes
chama
corsac
familiaris
latrans
macrotis
megalotis
pictus
sp
velox
vulpes
BASHing
au
datafix
www
eV
gaussian
genbank
getline
Hm
Korona
linenums
Misspelled words:
NCBI
Pavol
Pement
phylogenetic
pre
printf
rOH
RSA
taxonID
unformatted
VCF
asc
binwidth
chr
comul
CONVFMT
Dask
de
elemnts
embl
gz
hg
ianother
itol
kb
lc
musculus
Newick
nh
nohead
nokey
noytics
nq
num
phyloP
PROCINFO
quantile
quantiles
quartile
rgb
scientificNames
sprintf
taxdump
Uncomment
aaa
abe
adn
Amrei
analyse
argv
Arsenophonus
athe
atsym
backreference
Backreferences
backreferences
BDGP
bedops
bigWigToWig
bigwigtowig
Binzer
Bioawk
bioawk
Bioinformaticians
boolean
Borreliella
bp
Buitrón
bulkm
burgdorferi
bzip
CDHit
CDHIT
cdhit
CDSs
CHGCAR
clstr
cmd
cn
Codename
CoDing
Conda
consts
coord
cov
criterium
csh
decrypt
developerWorks
dgrp
douglasgscofield
dows
Drosophila
dvr
dx
edu
EF
encodeproject
execut
Fasta
fasta
FASTA
FBtr
filedata
fmax
fmin
FNR
fontawesome
Frc
freqs
funtion
FWHM
gauss
GaussView
gcd
genomic
GFF
gff
Gnuawk
goldenPath
González
grymoire
gsub
GTF
gzip'ed
Hellström
Heng
hgdownload
hl
Homebrew
html
http
ide
INDEL
indel
INDELs
INDELS
indels
inet
infile
init
integerlist
Inten
ints
ir
isnt
Jmol
kemi
Kepp
Kernighan
Kernighan's
len
Loma
MacOS
Mahesh
Martín
Matti
maxx
md
melanogaster
Mitev
mitev
Multiline
MultiZ
Murnaghan
Myxococcales
nARGC
nasoniae
nclass
nd
neds
neighbours
nfreq
nok
np
Ntypes
numpy
OFS
os
outf
outfile
overrepresentation
pallidum
Panchal
parallelisation
Pavlin
pavlin
Pavlin's
pdf
perl
permutate
PHAST
phastCons
POSCAR
preprocess
ProLiant
ps
py
quartiles
Quilmes
readthedocs
resample
Rhizobiales
rnd
rosettacode
rtl
Scofield
se
sed
soe
sparkline
ss
stackexchange
stackoverflow
str
strfunc
subsp
sutprised
sys
tcsh
tdef
tinyutils
Transcriptome
transcriptome
Treponema
tself
UCSC
ucsc
unix
unparsed
UPPMAX
uppmax
usung
uu
valueable
VASP
vcf
ver
vibmatrix
wamt
Wannier
waterX
webarchived
wget
wikibooks
xyz
molden
htm
decyphered
pertenue


4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pmitev/to-awk-or-not/master)
![ci](https://github.com/pmitev/to-awk-or-not/workflows/ci/badge.svg)

# to-awk-or-not
This repositiory serves as an auxiliary material to an gawk course/seminar web page

This repository serves as an auxiliary material to an gawk course/seminar web page

[https://pmitev.github.io/to-awk-or-not/](https://pmitev.github.io/to-awk-or-not/)

Expand Down
2 changes: 1 addition & 1 deletion docs/1.Simple_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ ed only when the criteria is met** - i.e. awk will print the values of columns 3

??? "Discussion and exercises"
- Can you find all "silver" coins older than 1986? One can use grep to filter the silver coins and pipe the result to awk or do it all together in awk.
- Unfortunatelly, awk does not have a way to print/address all fields after or before a selected one. How can one print all remaining fields?
- Unfortunately, awk does not have a way to print/address all fields after or before a selected one. How can one print all remaining fields?
- A `TAB` separated version 'coins.tab' is more appropriate in such cases and rather common, for the same reason, in many bioinformatics file formats `gff|bed|sam|vcf`.

## What about some math? Can I manipulate or analyze the data?
Expand Down
2 changes: 1 addition & 1 deletion docs/2.Teasing_with_grep.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ At the `#!awk END` awk will run **{action_E}**. Perfect to print the collected d

??? "Exercises"
- Can you add a header `# metal | weight in ounces | date minted | country of origin | description` for the output of the coins older than 1986? Use this shorter `# header` in the beginning, until you get it working.
- What wil happen if you do not provide file as input to the above exercise?
- What will happen if you do not provide file as input to the above exercise?


And here is the teaser ;-).
Expand Down
2 changes: 1 addition & 1 deletion docs/Bio/NCBI-taxonomy.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ $ ./01.tabulate-names.awk <(bzcat names.dmp.bz2) | sort -g -k 1 | bzip2 -c > na
function Cap (string) { return toupper(substr(string,0,1))substr(string,2) }
```

Note that this script will keep the last values for any match of the same ID. It appers that the database have repeated lines that does not contain complete information and the tabulated data get destroyed. To prevent this, we need to take care that any subsequent match will be ignored.
Note that this script will keep the last values for any match of the same ID. It appears that the database have repeated lines that does not contain complete information and the tabulated data get destroyed. To prevent this, we need to take care that any subsequent match will be ignored.


``` bash
Expand Down
8 changes: 4 additions & 4 deletions docs/Case_studies/List.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Here is a collection of mine and contributed awk scripts.
* **[Fasta file format tips](Fasta_tips.md)**
_worth to know if working often with files in multi-fasta format_
* **[Multiline fasta to single line fasta](Multi2single_fasta.md)**
_single cryptic-looking line that will decriphered during the workshop_
_single cryptic-looking line that will decyphered during the workshop_
* **[Sequence clustering with awk](Sequence_clustering.md)**
_apllication of the multiple files approach - contribution by Martín González Buitrón_
_application of the multiple files approach - contribution by Martín González Buitrón_
* **[Substitute scientific with common species names in a phylogenetic tree file](../Bio/NCBI-taxonomy.md)**
* **[Statistics on very large columns of values](../Bio/Stat-large-files.md)**
* **[Manipulating and getting statistics for .vcf and .gff files](manipulating_vcf.md)**
Expand All @@ -39,11 +39,11 @@ Here is a collection of mine and contributed awk scripts.

## Physics oriented
* **[Dipole moment example](Dipole_moment.md)**
_simple calulations should not be difficult to code - here is an example_
_simple calculations should not be difficult to code - here is an example_
* **[Multiple files - VASP CHGCAR difference](CHGCAR_diff.md)**
_an simplified example on how to read multiple files (bzip-ed) line-by-line simultaneously to save memory_
* **[POSCAR: reorder atom types](POSCAR_reorder.md)**
_simple task creates programing nightmare_
_simple task creates programming nightmare_

## Primarily used as reference
* **[Awk and Gnuplot](awk_gnuplot.md)**
Expand Down
2 changes: 1 addition & 1 deletion docs/Case_studies/awk-jmol.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ draw ID vector (atomno=1) {x,y,z}
```

For larger molecules this quickly becomes quite a tedious work to type all this commands... so let awk write it for us.
The output is printed to the sceen and saved in file `vectors.spt` that will later run in Jmol.
The output is printed to the screen and saved in file `vectors.spt` that will later run in Jmol.

``` awk hl_lines="1"
$ awk '{i++;printf ("draw v%i vector (atomno=%i) {%f,%f,%f}\n",i,i,$1,$2,$3)}' vectors.dat | tee vectors.spt
Expand Down
2 changes: 1 addition & 1 deletion docs/Case_studies/awk_gnuplot.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

I have written this script a long time ago, before Gnuplot had the options to print its own variables on the plot. Nowadays, it is possible to make the fit entirely from Gnuplot, although it will be still tricky to make some decisions if you want to align some labels.

Perhaps the most valueable part is the demonstartion of simultaneous output/input to external program (Gnuplot in this case) `#!awk while ((gnu |& getline) > 0)` and for future reference.
Perhaps the most valueable part is the demonstration of simultaneous output/input to external program (Gnuplot in this case) `#!awk while ((gnu |& getline) > 0)` and for future reference.

``` awk
#!/usr/bin/awk -f
Expand Down
4 changes: 2 additions & 2 deletions docs/Exercises/Advanced_data_analysis.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Advanced data analisys ****
# Advanced data analysis ****
You are given a file with numbers on each row - 5 in this case.

!!! note "data1"
Expand All @@ -11,7 +11,7 @@ You are given a file with numbers on each row - 5 in this case.

Then you are given 5 numbers (let's say "1, 3, 5, 6 and 7") and you want to find how many of these numbers are matching a number on each line - think like you are about to check your lottery tickets ;-)

The solution bellow is using an "assicative arrays" trick to make it easier to loop over the reference numbers.
The solution bellow is using an "associative arrays" trick to make it easier to loop over the reference numbers.

??? "Possible solution"
Not very elegant but illustrates nicely a convenient use of associated arrays as list - if ($i in n) :
Expand Down
2 changes: 1 addition & 1 deletion docs/Exercises/Difficult_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ O103.H461 O103.H462
![input](../images/pdata2.png)


??? "Posible solutions:"
??? "Possible solutions:"
``` awk
awk -F '[][,]' '{printf("O%03d.H%03d O%03d.H%03d\n",$2,$3,$2,$4)}' data
```
Expand Down
Loading

0 comments on commit 8a27b4c

Please sign in to comment.