Skip to content

lukembrowne/readGenalex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

readGenalex

An R package for reading data files in GenAlEx format, as exported from Excel as a delimited text file, into an annotated data.frame, and manipulate it in that form. Several functions are provided for accessing and printing this data. GenAlEx and its documentation are available at http://biology-assets.anu.edu.au/GenAlEx.

readGenalex is available on CRAN:

> install.packages("readGenalex")

The development version is hosted on Github and can always be installed via:

> install.packages("devtools")
> devtools::install_github("douglasgscofield/readGenalex")

To use:

> library(readGenalex)
> refgt <- readGenalex("reference_genotypes.txt")
> refgt
    id Site loc1 loc1.2 loc2 loc2.2 loc3 loc3.2 loc4 loc4.2 loc5 loc5.2
1 ref1    1    3      3    2      3    2      2    3      3    4      3
2 ref2    1    2      3    1      1    2      4    3      3    6      1
3 ref3    1    3      3    2      3    2      2    3      1    4      2
4 ref4    1    3      3    2      1    2      2    3      1    2      3
5 ref5    1    1      1    1      3    2      5    3      3    6      2
6 ref6    1    1      1    2      1    2      5    2      3    3      1
> attributes(refgt)
$names
 [1] "id"     "Site"   "loc1"   "loc1.2" "loc2"   "loc2.2" "loc3"   ...
$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$n.loci
[1] 5

$ploidy
[1] 2

$n.samples
[1] 6

...

It only reads the number of samples specified by the GenAlEX header, and only treats as genotypes the number of genotype columns implied by the GenAlEx header in concert with the stated ploidy level.

It also tries to ignore extra TAB characters that tools such as Excel can insert when exporting TAB-delimited text, otherwise these could imply both additional columns and additional rows. Hopefully the latter is avoided by only reading the number of samples specified by the header.

If there are additional named columns to the right of the genotypes, these are read and stored in a dataframe attached to the attribute extra.columns. The first column of the extra.columns dataframe is the sample name (leftmost column from the genotypes, e.g., the id column from the above example). It attempts to ignore additional unnamed columns scattered amongst the named extra columns.

There are other functions supplied for manipulating population genetic data produced by readGenalex():

is.genalex() : Checks whether the genetic.data.format attribute is set to genalex.

reduceGenalexPloidy() : Reduce the ploidy to 1 by selecting the first allele of each locus.

dropGenalexLoci() : Drop named loci from the data.

printGenalexGenotype() : Print genotypes of specific rows.

reorderGenalexLoci() : Reorder loci into a given order.

computeGenalexColumns() : Return a vector of column numbers for specified loci.

putGenalexLocus() : Replace genotypes of specified locus.

getGenalexLocus() : Return genotypes of specified locus, optionally only for specific populations.

About

R package for reading GenAlEx-format genotype files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 93.4%
  • Makefile 6.6%