Andres Legarra, INRA Toulouse
YARP is a series of scripts that I found useful for manipulating and recoding data. Most of them are written in awk
. In most cases they are run like
./xxx.awk file1 [file2] > file_out
These two programs take file1
, read animals e.g. in data, and from a pedigree file2
they extract all ancestors of individuals in file1
.
For some reason that I cannot remember subped.awk
is more efficient (or contains no errors) than extract_subped.awk
This program takes file1
, creates a list of animals, and prints out records in file2
if animals in the first column are included in file1
- It takes the pedigree file and put ancestors animals in the first column if they are absent (unless they <=0),
- It also assigns year of birth (needed to create upg) for missing animals, here yob = yob(progeny)-3
Remove duplicates and individuals with two identical parents
Renumbers the pedigree on file1
so that parents precede offspring. Conceived for populations with no unknown parent groups, so missing ancestors are 0. On output, it generates fileout
with 6 cols, the new and the old pedigree.
Same as above but considering pseudo-generation numbers, so that uncles got numbers before nieces; see Kempthorne's example in which Z may, or may not, be numbered before F.
A 0 0
B 0 0
D A B
E A D
F B E
Z A B
It assings unknown parent groups based on year of birth and kind of missingness (both unknown or only sire unknown). In this case, for Manech Tete Rousse.
Same as renum_order
above but creating files wih upg's that are added after regular animals.
Same as renum_order
above but creating files wih metafounders that come before regular animals.
Same as renum_order_mf
above but considering pseudo-generation numbers creating files wih metafounders that come before regular animals.
It converts a pedigree with metafounders first to another one with upg's coded as negative numbers (so that it fits into renumf90
)
This programs recodes given fields in a data set
example ex_renum_data
the order of print is:
- first recoded columns,
- then "untouched" fields passed to output,
- then the original complete (possibly alphanumeric) line.
In this manner the output can be read as such by blupf90
with alphanumerics included.
It has not really been tested.
This program takes the recoded pedigree with recoded id in $1 and old id in $4, then reads a genotype file and creates a new file with recoded id instead of old id.
This program takes the recoded pedigree with recoded id in $1 and old id in $4, then reads a phenotype file with id in pos posid and creates a new file with recoded id at the last column
This program passes from true genotypes (nucleotide) to pregs format (0/1/2). The reference allele is the first one read on the file for each locus, a table of equivalences is generated on output. It does not check for triallelic SNPs