Skip to content

load_problem

Josh Fogg edited this page Jul 31, 2024 · 2 revisions
load_problem(sigma_filename, mu_filename, omega_filename, sex_filename, nnz_sigma, nnz_omega, dimension, pedigree, issparse)

Used to load variables for a robust or non-robust optimal contribution selection problem into Python.

Parameters

  • sigma_filename : str
    Filename for a file which encodes $\Sigma$, whether that's stored in sparse coordinate format or pedigree format (see more). This must include the extension (not necessarily .ped) and full path from the current working directory.
  • mu_filename : str
    Filename for a file which encodes $\boldsymbol\mu$ if working with a non-robust problem and $\boldsymbol{\bar\mu}$ when working with a robust problem, with a single value per line. This must include the extension (not necessarily .ped) and full path from the current working directory.
  • omega_filename : str, optional
    Filename for a file which encodes $\Omega$, which will be in sparse coordinate format. This does not have to be provided (if for example, you're working with a non-robust problem) or can be skipped by providing None.
  • sex_filename : str, optional
    Filename for a file which encodes the sex data and labels for candidates in the cohort, used to populate $\mathcal{S}$ and $\mathcal{D}$. This does not have to be provided (if for example, you're working with a non-robust problem) or can be skipped by providing None.
  • nnz_sigma : int, optional
    Number of non-zeros in the $\Sigma$ being loaded, which is used to pre-allocate the SciPy sparse matrix it will be stored in if issparse=True. This is optional, but if it's not provided and issparse=True then its value will be computed using an internal function with additional computational overhead.
  • nnz_omega : int, optional
    Number of non-zeros in the $\Omega$ being loaded, which is used to pre-allocate the SciPy sparse matrix it will be stored in if issparse=True. This is optional, but if it's not provided and issparse=True then its value will be computed using an internal function with additional computational overhead.
  • dimension : int, optional
    The size of the problem, i.e. $n$, the number of candidates in the cohort. Specifying this aids in the pre-allocation of all other problem variables, though it is optional. If not provided it will be worked out explicitly from the $\boldsymbol\mu$ or $\boldsymbol{\bar\mu}$ loaded.
  • pedigree : bool, optional
    Signifies whether $\Sigma$ is stored as a pedigree structure (True) or in sparse coordinate format (False). Default value is False.
  • issparse : bool, optional
    Signifies whether $\Sigma$ and $\Omega$ should be loaded into dense NumPy arrays (False) or sparse SciPy matrices (True) in compressed sparse row format. Default value is False.

Returns

  • ndarray or spmatrix
    Covariance matrix of candidates in the cohort ($\Sigma$).
  • ndarray
    Vector of expected values of the expected breeding values of candidates in the cohort ($\boldsymbol\mu$ or $\boldsymbol{\bar\mu}$).
  • ndarray or spmatrix
    Covariance matrix of expected breeding values of candidates in the cohort ($\Omega$). If a filename was not provided via omega_filename, the returned value is None.
  • int
    Dimension of the problem ($n$).
  • ndarray
    Array of indices of sires in the cohort ($\mathcal{S}$). If a filename for sex data was not provided via sex_filename, the returned value is None.
  • ndarray
    Array of indices of dams in the cohort ($\mathcal{D}$). If a filename for sex data was not provided via sex_filename, the returned value is None.
  • ndarray
    Array of names given to the original candidates in the cohort. If a filename for sex data was not provided via sex_filename (which also includes name data), then the returned value is None.

Examples

In cases where all variables are needed, example usage may look like

>>> sigma, mubar, omega, n, sires, dams, names = robustocs.load_problem(
...     sigma_filename="examples/04/A04.txt",
...     mu_filename="examples/04/EBV04.txt",
...     omega_filename="examples/04/S04.txt",
...     sex_filename="examples/04/SEX04.txt",
...     issparse=True
... )
>>> sigma
<4x4 sparse matrix of type '<class 'numpy.float64'>'
	with 4 stored elements in Compressed Sparse Row format>

In cases where only some of the variables are needed (and the corresponding input parameters omitted), the outputs can ignored using the underscore with Python as so

>>> sigma, mubar, omega, n, _, _, _ = robustocs.load_problem(
...     sigma_filename="example.ped",
...     mu_filename="example.ebv",
...     omega_filename="example.omega",
...     pedigree=True
... )
>>> sigma
array([[1.   , 0.5  , 0.75 , 0.875],
       [0.5  , 1.   , 0.75 , 0.625],
       [0.75 , 0.75 , 1.25 , 1.   ],
       [0.875, 0.625, 1.   , 1.375]])

To continue computations with these matrices outside RobustOCS, NumPy (and SciPy if matrices were loaded as sparse) must be imported too.