readGenalex

An R package for reading data files in GenAlEx format, as exported from Excel as a delimited text file, into an annotated data.frame, and manipulate it in that form. Several functions are provided for accessing and printing this data. GenAlEx and its documentation are available at http://biology-assets.anu.edu.au/GenAlEx.

readGenalex is available on CRAN:

> install.packages("readGenalex")

The development version is hosted on Github and can always be installed via:

> install.packages("devtools")
> devtools::install_github("douglasgscofield/readGenalex")

To use:

> library(readGenalex)
> refgt <- readGenalex("reference_genotypes.txt")
> refgt
    id Site loc1 loc1.2 loc2 loc2.2 loc3 loc3.2 loc4 loc4.2 loc5 loc5.2
1 ref1    1    3      3    2      3    2      2    3      3    4      3
2 ref2    1    2      3    1      1    2      4    3      3    6      1
3 ref3    1    3      3    2      3    2      2    3      1    4      2
4 ref4    1    3      3    2      1    2      2    3      1    2      3
5 ref5    1    1      1    1      3    2      5    3      3    6      2
6 ref6    1    1      1    2      1    2      5    2      3    3      1
> attributes(refgt)
$names
 [1] "id"     "Site"   "loc1"   "loc1.2" "loc2"   "loc2.2" "loc3"   ...
$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$n.loci
[1] 5

$ploidy
[1] 2

$n.samples
[1] 6

...

It only reads the number of samples specified by the GenAlEX header, and only treats as genotypes the number of genotype columns implied by the GenAlEx header in concert with the stated ploidy level.

It also tries to ignore extra TAB characters that tools such as Excel can insert when exporting TAB-delimited text, otherwise these could imply both additional columns and additional rows. Hopefully the latter is avoided by only reading the number of samples specified by the header.

If there are additional named columns to the right of the genotypes, these are read and stored in a dataframe attached to the attribute extra.columns. The first column of the extra.columns dataframe is the sample name (leftmost column from the genotypes, e.g., the id column from the above example). It attempts to ignore additional unnamed columns scattered amongst the named extra columns.

There are other functions supplied for manipulating population genetic data produced by readGenalex():

is.genalex() : Checks whether the genetic.data.format attribute is set to genalex.

reduceGenalexPloidy() : Reduce the ploidy to 1 by selecting the first allele of each locus.

dropGenalexLoci() : Drop named loci from the data.

printGenalexGenotype() : Print genotypes of specific rows.

reorderGenalexLoci() : Reorder loci into a given order.

computeGenalexColumns() : Return a vector of column numbers for specified loci.

putGenalexLocus() : Replace genotypes of specified locus.

getGenalexLocus() : Return genotypes of specified locus, optionally only for specific populations.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
R		R
data		data
inst		inst
man		man
tests/Examples		tests/Examples
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
Makefile		Makefile
NAMESPACE		NAMESPACE
NEWS		NEWS
NEWS.md		NEWS.md
README.md		README.md
TODO.md		TODO.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

readGenalex

About

Releases

Packages

Languages

License

lukembrowne/readGenalex

Folders and files

Latest commit

History

Repository files navigation

readGenalex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages