diff --git a/docs/source/portfolio_optimisation/risk_estimators.rst b/docs/source/portfolio_optimisation/risk_estimators.rst index 309cc3b2b..3e5ca78a5 100644 --- a/docs/source/portfolio_optimisation/risk_estimators.rst +++ b/docs/source/portfolio_optimisation/risk_estimators.rst @@ -35,7 +35,7 @@ Risk Estimators Risk is a very important part of finance and the performance of large number of investment strategies are dependent on the efficient estimation of underlying portfolio risk. There are different ways of representing risk but the most widely used is a covariance matrix. This means that an accurate calculation of the covariances is essential for an accurate representation of risk. -This class provides functions for calculating different types of covariance matrices, de-noising, and other helpful methods. +This class provides functions for calculating different types of covariance matrices, de-noising, de-toning and other helpful methods. .. tip:: |h4| Underlying Literature |h4_| @@ -48,13 +48,12 @@ This class provides functions for calculating different types of covariance matr - **Shrinkage Algorithms for MMSE Covariance Estimation** *by* Y. Chen, A. Wiesel, Y.C. Eldar and A.O. Hero `available here `__. *Introduces the Oracle Approximating shrinkage method.* - **Minimum Downside Volatility Indices** *by* Solactive AG - German Index Engineering `available here `__. *Describes examples of use of the Semi-Covariance matrix.* - **Financial applications of random matrix theory: Old laces and new pieces** *by* Potter M., J.P. Bouchaud, L. Laloux `available here `__. *Describes the process of de-noising of the covariance matrix.* - - **A Robust Estimator of the Efficient Frontier** *by* Marcos Lopez de Prado `available here `__. *Describes the De-noising Covariance/Correlation Matrix algorithm.* + - **A Robust Estimator of the Efficient Frontier** *by* Marcos Lopez de Prado `available here `__. *Describes the Constant Residual Eigenvalue Method for De-noising Covariance/Correlation Matrix.* + - **Machine Learning for Asset Managers** *by* Marcos Lopez de Prado `available here `__. *Describes the Targeted Shrinkage De-noising and the De-toning methods for Covariance/Correlation Matrices.* -Supported Estimators -#################### Minimum Covariance Determinant -****************************** +############################## Minimum Covariance Determinant (MCD) is a robust estimator of covariance that was introduced by P.J. Rousseeuw. @@ -69,17 +68,44 @@ which is then rescaled to compensate for the performed selection of observations Our method is a wrapper for the sklearn MinCovDet class. For more details about the function and its parameters, please visit `sklearn documentation `__. +Implementation +************** + +.. py:currentmodule:: mlfinlab.portfolio_optimization.risk_estimators + +.. autoclass:: RiskEstimators + :members: __init__, minimum_covariance_determinant + + +---- + Maximum Likelihood Covariance Estimator (Empirical Covariance) -************************************************************** +############################################################## Maximum Likelihood Estimator of a sample is an unbiased estimator of the corresponding population’s covariance matrix. +Description of the Empirical Covariance according to the **Scikit-learn User Guide on Covariance estimation**: + +"The covariance matrix of a data set is known to be well approximated by the classical maximum likelihood estimator, +provided the number of observations is large enough compared to the number of features (the variables describing the +observations). More precisely, the Maximum Likelihood Estimator of a sample is an unbiased estimator of the corresponding +population’s covariance matrix". + Our method is a wrapper for the sklearn EmpiricalCovariance class. For more details about the function and its parameters, please visit `sklearn documentation `__. +Implementation +************** + +.. autoclass:: RiskEstimators + :noindex: + :members: empirical_covariance + + +---- Covariance Estimator with Shrinkage -*********************************** +################################### Shrinkage allows one to avoid the inability to invert the covariance matrix due to numerical reasons. Shrinkage consists of reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix. @@ -122,9 +148,18 @@ For more details about the function and its parameters, please visit `sklearn do Shrinkage methods are described in greater detail in the works listed in the introduction. +Implementation +************** + +.. autoclass:: RiskEstimators + :noindex: + :members: shrinked_covariance + + +---- Semi-Covariance Matrix -********************** +###################### Semi-Covariance matrix is used to measure the downside volatility of a portfolio and can be used as a measure to minimize it. This metric also allows measuring the volatility of returns below a specific threshold. @@ -144,8 +179,18 @@ If the :math:`B` is set to zero, the volatility of negative returns is measured. .. tip:: An example of Semi-Covariance usage can be found `here `__. +Implementation +************** + +.. autoclass:: RiskEstimators + :noindex: + :members: semi_covariance + + +---- + Exponentially-Weighted Covariance Matrix -**************************************** +######################################## Each element in the Exponentially-weighted Covariance matrix is the last element from an exponentially weighted moving average series based on series of covariances between returns of the corresponding assets. It's used to give greater weight to most @@ -177,12 +222,30 @@ Where :math:`R_{i}^{t}` is the return of :math:`i^{th}` asset for :math:`t^{th}` and :math:`j^{th}` asset, :math:`EWMA(\sum)_{t}` is the :math:`t^{th}` observation of exponentially-weighted moving average of :math:`\sum`. +Implementation +************** + +.. autoclass:: RiskEstimators + :noindex: + :members: exponential_covariance -De-noising Covariance/Correlation Matrix -**************************************** -The main idea behind de-noising is to separate the noise-related eigenvalues from the signal-related ones. This is achieved -through fitting the Marcenko-Pastur distribution of the empirical distribution of eigenvalues using a Kernel Density Estimate (KDE). +---- + +De-noising and De-toning Covariance/Correlation Matrix +###################################################### + +Two methods for de-noising are implemented in this module: + +- Constant Residual Eigenvalue Method +- Targeted Shrinkage + +Constant Residual Eigenvalue De-noising Method +********************************************** + +The main idea behind the Constant Residual Eigenvalue de-noising method is to separate the noise-related eigenvalues from +the signal-related ones. This is achieved by fitting the Marcenko-Pastur distribution of the empirical distribution of +eigenvalues using a Kernel Density Estimate (KDE). The de-noising function works as follows: @@ -194,54 +257,166 @@ The de-noising function works as follows: - The Marcenko-Pastur pdf is fitted to the KDE estimate using the variance as the parameter for the optimization. -- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula from the "Instability caused by noise" part. +- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula + from the **Instability caused by noise** part of `A Robust Estimator of the Efficient Frontier paper `__. + +- The eigenvalues in the set that are below the theoretical value are all set to their average value. + For example, we have a set of 5 eigenvalues sorted in the descending order ( :math:`\lambda_1` ... :math:`\lambda_5` ), + 3 of which are below the maximum theoretical value, then we set + +.. math:: + + \lambda_3^{NEW} = \lambda_4^{NEW} = \lambda_5^{NEW} = \frac{\lambda_3^{OLD} + \lambda_4^{OLD} + \lambda_5^{OLD}}{3} + +- Eigenvalues above the maximum theoretical value are left intact. + +.. math:: + + \lambda_1^{NEW} = \lambda_1^{OLD} -- The eigenvalues in the set that are above the theoretical value are all set to their average value. For example, we have a set of 5 sorted eigenvalues ( :math:`\lambda_1` ... :math:`\lambda_5` ), 2 of which are above the maximum theoretical value, then we set :math:`\lambda_4^{NEW} = \lambda_5^{NEW} = \frac{\lambda_4^{OLD} + \lambda_5^{OLD}}{2}` + \lambda_2^{NEW} = \lambda_2^{OLD} - The new set of eigenvalues with the set of eigenvectors is used to obtain the new de-noised correlation matrix. + :math:`\tilde{C}` is the de-noised correlation matrix, :math:`W` is the eigenvectors matrix, + and :math:`\Lambda` is the diagonal matrix with new eigenvalues. + +.. math:: + + \tilde{C} = W \Lambda W' + +- To rescale :math:`\tilde{C}` so that the main diagonal consists of 1s the following transformation is made. + This is how the final :math:`C_{denoised}` is obtained. + +.. math:: + + C_{denoised} = \tilde{C} [(diag[\tilde{C}])^\frac{1}{2}(diag[\tilde{C}])^{\frac{1}{2}'}]^{-1} - The new correlation matrix is then transformed back to the new de-noised covariance matrix. (If the correlation matrix is given as an input, the first and the last steps of the algorithm are omitted) .. tip:: + The Constant Residual Eigenvalue de-noising method is described in more detail in the work + **A Robust Estimator of the Efficient Frontier** *by* Marcos Lopez de Prado `available here `_. + + Lopez de Prado suggests that this de-noising algorithm is preferable as it removes the noise while preserving the signal. + +Targeted Shrinkage De-noising +***************************** - The de-noising algorithm is described in more detail in the work **A Robust Estimator of the Efficient Frontier** *by* Marcos Lopez de Prado `available here `_. +The main idea behind the Targeted Shrinkage de-noising method is to shrink the eigenvectors/eigenvalues that are +noise-related. This is done by shrinking the correlation matrix calculated from noise-related eigenvectors/eigenvalues +and then adding the correlation matrix composed from signal-related eigenvectors/eigenvalues. + +The de-noising function works as follows: + +- The given covariance matrix is transformed to the correlation matrix. + +- The eigenvalues and eigenvectors of the correlation matrix are calculated and sorted in the descending order. + +- Using the Kernel Density Estimate algorithm a kernel of the eigenvalues is estimated. + +- The Marcenko-Pastur pdf is fitted to the KDE estimate using the variance as the parameter for the optimization. + +- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula + from the **Instability caused by noise** part of `A Robust Estimator of the Efficient Frontier `__. + +- The correlation matrix composed from eigenvectors and eigenvalues related to noise (eigenvalues below the maximum + theoretical eigenvalue) is shrunk using the :math:`\alpha` variable. + +.. math:: + + C_n = \alpha W_n \Lambda_n W_n' + (1 - \alpha) diag[W_n \Lambda_n W_n'] + +- The shrinked noise correlation matrix is summed to the information correlation matrix. + +.. math:: + + C_i = W_i \Lambda_i W_i' + + C_{denoised} = C_n + C_i + +- The new correlation matrix is then transformed back to the new de-noised covariance matrix. + +(If the correlation matrix is given as an input, the first and the last steps of the algorithm are omitted) + +De-toning +********* + +De-noised correlation matrix from the previous methods can also be de-toned by excluding a number of first +eigenvectors representing the market component. + +According to Lopez de Prado: + +"Financial correlation matrices usually incorporate a market component. The market component is characterized by the +first eigenvector, with loadings :math:`W_{n,1} \approx N^{-\frac{1}{2}}, n = 1, ..., N.` +Accordingly, a market component affects every item of the covariance matrix. In the context of clustering +applications, it is useful to remove the market component, if it exists (a hypothesis that can be +tested statistically)." + +"By removing the market component, we allow a greater portion of the correlation to be explained +by components that affect specific subsets of the securities. It is similar to removing a loud tone +that prevents us from hearing other sounds" + +"The detoned correlation matrix is singular, as a result of eliminating (at least) one eigenvector. +This is not a problem for clustering applications, as most approaches do not require the invertibility +of the correlation matrix. Still, **a detoned correlation matrix** :math:`C_{detoned}` **cannot be used directly for** +**mean-variance portfolio optimization**." + +The de-toning function works as follows: + +- De-toning is applied on the de-noised correlation matrix. + +- The correlation matrix representing the market component is calculated from market component eigenvectors and eigenvalues + and then subtracted from the de-noised correlation matrix. This way the de-toned correlation matrix is obtained. + +.. math:: + + \hat{C} = C_{denoised} - W_m \Lambda_m W_m' + +- De-toned correlation matrix :math:`\hat{C}` is then rescaled so that the main diagonal consists of 1s + +.. math:: + + C_{detoned} = \hat{C} [(diag[\hat{C}])^\frac{1}{2}(diag[\hat{C}])^{\frac{1}{2}'}]^{-1} + +.. tip:: + For a more detailed description of de-noising and de-toning, please read Chapter 2 of the book + **Machine Learning for Asset Managers** *by* Marcos Lopez de Prado. .. tip:: This and above the methods are described in more detail in the Risk Estimators Notebook. -Implementation -############## -.. automodule:: mlfinlab.portfolio_optimization.risk_estimators +Implementation +************** - .. autoclass:: RiskEstimators - :members: +.. autoclass:: RiskEstimators + :noindex: + :members: denoise_covariance - .. automethod:: __init__ +---- Example Code ############ -Below is an example of using the functions from the Risk Estimators module. - .. code-block:: import pandas as pd import numpy as np - from mlfinlab.portfolio_optimization import RiskEstimators + from mlfinlab.portfolio_optimization import RiskEstimators, ReturnsEstimators # Import price data - stock_prices = pd.read_csv(DATA_PATH, index_col='Date', parse_dates=True) + stock_returns = pd.read_csv(DATA_PATH, index_col='Date', parse_dates=True) - # A class that has needed functions + # Class that have needed functions risk_estimators = RiskEstimators() + returns_estimators = ReturnsEstimators() # Finding the MCD estimator on price data min_cov_det = risk_estimators.minimum_covariance_determinant(stock_prices, price_data=True) - + # Finding the Empirical Covariance on price data empirical_cov = risk_estimators.empirical_covariance(stock_prices, price_data=True) @@ -262,9 +437,27 @@ Below is an example of using the functions from the Risk Estimators module. # The bandwidth of the KDE kernel kde_bwidth = 0.01 - # Finding the De-noised Сovariance matrix - cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, - kde_bwidth) + # Series of returns from series of prices + stock_returns = ret_est.calculate_returns(stock_prices) + + # Finding the simple covariance matrix from a series of returns + cov_matrix = stock_returns.cov() + + # Finding the Constant Residual Eigenvalue De-noised Сovariance matrix + const_resid_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, + denoise_method='const_resid_eigen', + detone=False, kde_bwidth=kde_bwidth) + + # Finding the Targeted Shrinkage De-noised Сovariance matrix + targ_shrink_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, + denoise_method='target_shrink', + detone=False, kde_bwidth=kde_bwidth) + + # Finding the Constant Residual Eigenvalue De-noised and De-toned Сovariance matrix + const_resid_detoned = risk_estimators.denoise_covariance(cov_matrix, tn_relation, + denoise_method='const_resid_eigen', + detone=True, market_component=1, + kde_bwidth=kde_bwidth) Research Notebooks ################## diff --git a/mlfinlab/portfolio_optimization/risk_estimators.py b/mlfinlab/portfolio_optimization/risk_estimators.py index 6b1c12b9e..0472fdb04 100644 --- a/mlfinlab/portfolio_optimization/risk_estimators.py +++ b/mlfinlab/portfolio_optimization/risk_estimators.py @@ -10,9 +10,9 @@ class RiskEstimators: """ This class contains the implementations for different ways to calculate and adjust Covariance matrices. - The functions related to de-noising the Covariance matrix are reproduced with modification from the following paper: - `Marcos Lopez de Prado “A Robust Estimator of the Efficient Frontier”, (2019). - `_. + The functions related to de-noising and de-toning the Covariance matrix are reproduced with modification + from Chapter 2 of the the following book: + Marcos Lopez de Prado “Machine Learning for Asset Managers”, (2020). """ def __init__(self): @@ -22,250 +22,6 @@ def __init__(self): return - @staticmethod - def _fit_kde(observations, kde_bwidth=0.01, kde_kernel='gaussian', eval_points=None): - """ - Fits kernel to a series of observations (in out case eigenvalues), and derives the - probability density function of observations. - - :param observations: (np.array) Array of observations (eigenvalues) eigenvalues to fit kernel to - :param kde_bwidth: (float) The bandwidth of the kernel - :param kde_kernel: (str) Kernel to use [‘gaussian’|’tophat’|’epanechnikov’|’exponential’|’linear’|’cosine’] - :param eval_points: (np.array) Array of values on which the fit of the KDE will be evaluated. - If None, the unique values of observations are used - :return: (pd.Series) Series with estimated pdf values in the eval_points - """ - - # Reshaping array to a vertical one - observations = observations.reshape(-1, 1) - - # Estimating Kernel Density of the empirical distribution of eigenvalues - kde = KernelDensity(kernel=kde_kernel, bandwidth=kde_bwidth).fit(observations) - - # If no specific values provided, the fit KDE will be valued on unique eigenvalues. - if eval_points is None: - eval_points = np.unique(observations).reshape(-1, 1) - - # If the input vector is one-dimensional, reshaping to a vertical one - if len(eval_points.shape) == 1: - eval_points = eval_points.reshape(-1, 1) - - # Evaluating the log density model on the given values - log_prob = kde.score_samples(eval_points) - - # Preparing the output of pdf values - pdf = pd.Series(np.exp(log_prob), index=eval_points.flatten()) - - return pdf - - @staticmethod - def _mp_pdf(var, tn_relation, num_points): - """ - Derives the pdf of the Marcenko-Pastur distribution. - - Outputs the pdf for num_points between the minimum and maximum expected eigenvalues. - Requires the variance of the distribution (var) and the relation of T - the number - of observations of each X variable to N - the number of X variables (T/N). - - :param var: (float) Variance of the M-P distribution - :param tn_relation: (float) Relation of sample length T to the number of variables N (T/N) - :param num_points: (int) Number of points to estimate pdf - :return: (pd.Series) Series of M-P pdf values - """ - - # Changing the type as scipy.optimize.minimize outputs np.array with one element to this function - if not isinstance(var, float): - var = float(var) - - # Minimum and maximum expected eigenvalues - eigen_min = var * (1 - (1 / tn_relation) ** (1 / 2)) ** 2 - eigen_max = var * (1 + (1 / tn_relation) ** (1 / 2)) ** 2 - - # Space of eigenvalues - eigen_space = np.linspace(eigen_min, eigen_max, num_points) - - # Marcenko-Pastur probability density function for eigen_space - pdf = tn_relation * ((eigen_max - eigen_space) * (eigen_space - eigen_min)) ** (1 / 2) / \ - (2 * np.pi * var * eigen_space) - pdf = pd.Series(pdf, index=eigen_space) - - return pdf - - def _pdf_fit(self, var, eigen_observations, tn_relation, kde_bwidth, num_points=1000): - """ - Calculates the fit (Sum of Squared estimate of Errors) of the empirical pdf - (kernel density estimation) to the theoretical pdf (Marcenko-Pastur distribution). - - SSE is calculated for num_points, equally spread between minimum and maximum - expected theoretical eigenvalues. - - :param var: (float) Variance of the M-P distribution (for the theoretical pdf) - :param eigen_observations: (np.array) Observed empirical eigenvalues (for the empirical pdf) - :param tn_relation: (float) Relation of sample length T to the number of variables N (for the theoretical pdf) - :param kde_bwidth: (float) The bandwidth of the kernel (for the empirical pdf) - :param num_points: (int) Number of points to estimate pdf (for the empirical pdf) - :return: (float) SSE between empirical pdf and theoretical pdf - """ - - # Calculating theoretical and empirical pdf - theoretical_pdf = self._mp_pdf(var, tn_relation, num_points) - empirical_pdf = self._fit_kde(eigen_observations, kde_bwidth, eval_points=theoretical_pdf.index.values) - - # Fit calculation - sse = np.sum((empirical_pdf - theoretical_pdf) ** 2) - - return sse - - def _find_max_eval(self, eigen_observations, tn_relation, kde_bwidth): - """ - Searching for maximum random eigenvalue by fitting Marcenko-Pastur distribution - to the empirical one - obtained through kernel density estimation. - - :param eigen_observations: (np.array) Observed empirical eigenvalues (for the empirical pdf) - :param tn_relation: (float) Relation of sample length T to the number of variables N (for the theoretical pdf) - :param kde_bwidth: (float) The bandwidth of the kernel (for the empirical pdf) - :return: (float, float) Maximum random eigenvalue, optimal variation of the Marcenko-Pastur distribution - """ - - # Searching for the variation of Marcenko-Pastur distribution for the best fit with the empirical distribution - optimization = minimize(self._pdf_fit, x0=np.array(0.5), args=(eigen_observations, tn_relation, kde_bwidth), - bounds=((1e-5, 1 - 1e-5),)) - - # The optimal solution found - var = optimization['x'][0] - - # Eigenvalue calculated as the maximum expected eigenvalue based on the input - maximum_eigen = var * (1 + (1 / tn_relation) ** (1 / 2)) ** 2 - - return maximum_eigen, var - - @staticmethod - def corr_to_cov(corr, std): - """ - Recovers the covariance matrix from a correlation matrix. - - :param corr: (np.array) Correlation matrix - :param std: (np.array) Vector of standard deviations - :return: (np.array) Covariance matrix - """ - - cov = corr * np.outer(std, std) - return cov - - @staticmethod - def cov_to_corr(cov): - """ - Derives the correlation matrix from a covariance matrix. - - :param cov: (np.array) Covariance matrix - :return: (np.array) Covariance matrix - """ - - # Calculating standard deviations of the elements - std = np.sqrt(np.diag(cov)) - - # Transforming to correlation matrix - corr = cov / np.outer(std, std) - - # Making sure correlation coefficients are in (-1, 1) range - corr[corr < -1], corr[corr > 1] = -1, 1 - - return corr - - @staticmethod - def _get_pca(hermit_matrix): - """ - Calculates eigenvalues and eigenvectors from a Hermitian matrix. In our case, from the correlation matrix. - - Eigenvalues in the output are on the main diagonal of a matrix. - - :param hermit_matrix: (np.array) Hermitian matrix - :return: (np.array, np.array) Eigenvalues matrix, eigenvectors array - """ - - # Calculating eigenvalues and eigenvectors - eigenvalues, eigenvectors = np.linalg.eigh(hermit_matrix) - - # Index to sort eigenvalues in descending order - indices = eigenvalues.argsort()[::-1] - - # Sorting - eigenvalues = eigenvalues[indices] - eigenvectors = eigenvectors[:, indices] - - # Outputting eigenvalues on the main diagonal of a matrix - eigenvalues = np.diagflat(eigenvalues) - - return eigenvalues, eigenvectors - - def _denoised_corr(self, eigenvalues, eigenvectors, num_facts): - """ - Shrinks the eigenvalues associated with noise, and returns a de-noised correlation matrix. - - Noise is removed from the correlation matrix by fixing random eigenvalues. - - :param eigenvalues: (np.array) Matrix with eigenvalues on the main diagonal - :param eigenvectors: (float) Eigenvectors array - :param num_facts: (float) Threshold for eigenvalues to be fixed - :return: (np.array) De-noised correlation matrix - """ - - # Vector of eigenvalues from the main diagonal of a matrix - eigenval_vec = np.diag(eigenvalues).copy() - - # Replacing eigenvalues after num_facts to their average value - eigenval_vec[num_facts:] = eigenval_vec[num_facts:].sum() / float(eigenval_vec.shape[0] - num_facts) - - # Back to eigenvalues on main diagonal of a matrix - eigenvalues = np.diag(eigenval_vec) - - # De-noised covariance matrix - cov = np.dot(eigenvectors, eigenvalues).dot(eigenvectors.T) - - # Ne-noised correlation matrix - corr = self.cov_to_corr(cov) - - return corr - - def denoise_covariance(self, cov, tn_relation, kde_bwidth=0.01): - """ - Computes a de-noised covariance/correlation matrix from a given covariance/correlation matrix. - - As a threshold for the denoising the correlation matrix, the maximum eigenvalue - that fits the theoretical distribution is used. - - This algorithm is reproduced with minor modifications from the following paper: - `Marcos Lopez de Prado “A Robust Estimator of the Efficient Frontier”, (2019). - `_. - - :param cov: (np.array) Covariance/correlation matrix - :param tn_relation: (float) Relation of sample length T to the number of variables N used to calculate the - covariance/correlation matrix. - :param kde_bwidth: (float) The bandwidth of the kernel to fit - :return: (np.array) De-noised covariance/correlation matrix - """ - - # Correlation matrix computation (if correlation matrix given, nothing changes) - corr = self.cov_to_corr(cov) - - # Calculating eigenvalues and eigenvectors - eigenval, eigenvec = self._get_pca(corr) - - # Calculating the maximum eigenvalue to fit the theoretical distribution - maximum_eigen, _ = self._find_max_eval(np.diag(eigenval), tn_relation, kde_bwidth) - - # Calculating the threshold of eigenvalues that fit the theoretical distribution - # from our set of eigenvalues - num_facts = eigenval.shape[0] - np.diag(eigenval)[::-1].searchsorted(maximum_eigen) - - # Based on the threshold, de-noising the correlation matrix - corr = self._denoised_corr(eigenval, eigenvec, num_facts) - - # Calculating the covariance matrix from the de-noised correlation matrix - cov_denoised = self.corr_to_cov(corr, np.diag(cov) ** (1 / 2)) - - return cov_denoised - @staticmethod def minimum_covariance_determinant(returns, price_data=False, assume_centered=False, support_fraction=None, random_state=None): @@ -286,12 +42,13 @@ def minimum_covariance_determinant(returns, price_data=False, assume_centered=Fa the calculate_returns method from the ReturnsEstimators class. :param returns: (pd.DataFrame) Dataframe where each column is a series of returns or prices for an asset. - :param price_data: (bool) Flag if prices of assets are used and not returns. - :param assume_centered: (bool) Flag for data with mean significantly equal to zero - (Read the documentation for MinCovDet class). + :param price_data: (bool) Flag if prices of assets are used and not returns. (False by default) + :param assume_centered: (bool) Flag for data with mean significantly equal to zero. + (Read the documentation for MinCovDet class, False by default) :param support_fraction: (float) Values between 0 and 1. The proportion of points to be included in the support - of the raw MCD estimate (Read the documentation for MinCovDet class). - :param random_state: (int) Seed used by the random number generator. + of the raw MCD estimate. (Read the documentation for MinCovDet class, + None by default) + :param random_state: (int) Seed used by the random number generator. (None by default) :return: (np.array) Estimated robust covariance matrix. """ @@ -329,9 +86,9 @@ def empirical_covariance(returns, price_data=False, assume_centered=False): the calculate_returns method from the ReturnsEstimators class. :param returns: (pd.DataFrame) Dataframe where each column is a series of returns or prices for an asset. - :param price_data: (bool) Flag if prices of assets are used and not returns. - :param assume_centered: (bool) Flag for data with mean almost, but not exactly zero - (Read documentation for EmpiricalCovariance class). + :param price_data: (bool) Flag if prices of assets are used and not returns. (False by default) + :param assume_centered: (bool) Flag for data with mean almost, but not exactly zero. + (Read documentation for EmpiricalCovariance class, False by default) :return: (np.array) Estimated covariance matrix. """ @@ -370,12 +127,13 @@ def shrinked_covariance(returns, price_data=False, shrinkage_type='basic', assum the calculate_returns method from the ReturnsEstimators class. :param returns: (pd.DataFrame) Dataframe where each column is a series of returns or prices for an asset. - :param price_data: (bool) Flag if prices of assets are used and not returns. - :param shrinkage_type: (str) Type of shrinkage to use ('basic','lw','oas','all'). - :param assume_centered: (bool) Flag for data with mean almost, but not exactly zero - (Read documentation for chosen shrinkage class). + :param price_data: (bool) Flag if prices of assets are used and not returns. (False by default) + :param shrinkage_type: (str) Type of shrinkage to use. (``basic`` by default, ``lw``, ``oas``, ``all``) + :param assume_centered: (bool) Flag for data with mean almost, but not exactly zero. + (Read documentation for chosen shrinkage class, False by default) :param basic_shrinkage: (float) Between 0 and 1. Coefficient in the convex combination for basic shrinkage. - :return: (np.array) Estimated covariance matrix. Tuple of covariance matrices if shrinkage_type = 'all'. + (0.1 by default) + :return: (np.array) Estimated covariance matrix. Tuple of covariance matrices if shrinkage_type = ``all``. """ # Calculating the series of returns from series of prices @@ -416,9 +174,9 @@ def semi_covariance(returns, price_data=False, threshold_return=0): the calculate_returns method from the ReturnsEstimators class. :param returns: (pd.DataFrame) Dataframe where each column is a series of returns or prices for an asset. - :param price_data: (bool) Flag if prices of assets are used and not returns. - :param threshold_return: (float) Required return for each period in the frequency of the input data - (If the input data is daily, it's a daily threshold return). + :param price_data: (bool) Flag if prices of assets are used and not returns. (False by default) + :param threshold_return: (float) Required return for each period in the frequency of the input data. + (If the input data is daily, it's a daily threshold return, 0 by default) :return: (np.array) Semi-Covariance matrix. """ @@ -469,8 +227,9 @@ def exponential_covariance(returns, price_data=False, window_span=60): the calculate_returns method from the ReturnsEstimators class. :param returns: (pd.DataFrame) Dataframe where each column is a series of returns or prices for an asset. - :param price_data: (bool) Flag if prices of assets are used and not returns. + :param price_data: (bool) Flag if prices of assets are used and not returns. (False by default) :param window_span: (int) Used to specify decay in terms of span for the exponentially-weighted series. + (60 by default) :return: (np.array) Exponentially-weighted Covariance matrix. """ @@ -502,3 +261,393 @@ def exponential_covariance(returns, price_data=False, window_span=60): cov_matrix.iloc[row_number, column_number] = ew_ma[-1] return cov_matrix + + def denoise_covariance(self, cov, tn_relation, denoise_method='const_resid_eigen', detone=False, + market_component=1, kde_bwidth=0.01, alpha=0): + """ + De-noises the covariance matrix or the correlation matrix. + + Two denoising methods are supported: + 1. Constant Residual Eigenvalue Method (``const_resid_eigen``) + 2. Targeted Shrinkage Method (``target_shrink``) + + The Constant Residual Eigenvalue Method works as follows: + + First, a correlation is calculated from the covariance matrix (if the input is the covariance matrix). + + Second, eigenvalues and eigenvectors of the correlation matrix are calculated using the linalg.eigh + function from numpy package. + + Third, a maximum theoretical eigenvalue is found by fitting Marcenko-Pastur (M-P) distribution + to the empirical distribution of the correlation matrix eigenvalues. The empirical distribution + is obtained through kernel density estimation using the KernelDensity class from sklearn. + The fit of the M-P distribution is done by minimizing the Sum of Squared estimate of Errors + between the theoretical pdf and the kernel. The minimization is done by adjusting the variation + of the M-P distribution. + + Fourth, the eigenvalues of the correlation matrix are sorted and the eigenvalues lower than + the maximum theoretical eigenvalue are set to their average value. This is how the eigenvalues + associated with noise are shrinked. The de-noised covariance matrix is then calculated back + from new eigenvalues and eigenvectors. + + The Targeted Shrinkage Method works as follows: + + First, a correlation is calculated from the covariance matrix (if the input is the covariance matrix). + + Second, eigenvalues and eigenvectors of the correlation matrix are calculated using the linalg.eigh + function from numpy package. + + Third, the correlation matrix composed from eigenvectors and eigenvalues related to noise is + shrunk using the alpha variable. The shrinkage is done by summing the noise correlation matrix + multiplied by alpha to the diagonal of the noise correlation matrix multiplied by (1-alpha). + + Fourth, the shrinked noise correlation matrix is summed to the information correlation matrix. + + Correlation matrix can also be detoned by excluding a number of first eigenvectors representing + the market component. + + These algorithms are reproduced with minor modifications from the following book: + Marcos Lopez de Prado “Machine Learning for Asset Managers”, (2020). + + :param cov: (np.array) Covariance matrix or correlation matrix. + :param tn_relation: (float) Relation of sample length T to the number of variables N used to calculate the + covariance matrix. + :param denoise_method: (str) Denoising methos to use. (``const_resid_eigen`` by default, ``target_shrink``) + :param detone: (bool) Flag to detone the matrix. (False by default) + :param market_component: (int) Number of fist eigevectors related to a market component. (1 by default) + :param kde_bwidth: (float) The bandwidth of the kernel to fit KDE. + :param alpha: (float) In range (0 to 1) - shrinkage of the noise correlation matrix to use in the + Targeted Shrinkage Method. (0 by default) + :return: (np.array) De-noised covariance matrix or correlation matrix. + + """ + + # Correlation matrix computation (if correlation matrix given, nothing changes) + corr = self.cov_to_corr(cov) + + # Calculating eigenvalues and eigenvectors + eigenval, eigenvec = self._get_pca(corr) + + # Calculating the maximum eigenvalue to fit the theoretical distribution + maximum_eigen, _ = self._find_max_eval(np.diag(eigenval), tn_relation, kde_bwidth) + + # Calculating the threshold of eigenvalues that fit the theoretical distribution + # from our set of eigenvalues + num_facts = eigenval.shape[0] - np.diag(eigenval)[::-1].searchsorted(maximum_eigen) + + if denoise_method == 'target_shrink': + # Based on the threshold, de-noising the correlation matrix + corr = self._denoised_corr_targ_shrink(eigenval, eigenvec, num_facts, alpha) + else: # Default const_resid_eigen method + # Based on the threshold, de-noising the correlation matrix + corr = self._denoised_corr(eigenval, eigenvec, num_facts) + + # Detone the correlation matrix if needed + if detone: + corr = self._detoned_corr(corr, eigenval, eigenvec, num_facts, market_component) + + # Calculating the covariance matrix from the de-noised correlation matrix + cov_denoised = self.corr_to_cov(corr, np.diag(cov) ** (1 / 2)) + + return cov_denoised + + @staticmethod + def corr_to_cov(corr, std): + """ + Recovers the covariance matrix from a correlation matrix. + + Requires a vector of standard deviations of variables - square root + of elements on the main diagonal fo the covariance matrix. + + Formula used: Cov = Corr * OuterProduct(std, std) + + :param corr: (np.array) Correlation matrix. + :param std: (np.array) Vector of standard deviations. + :return: (np.array) Covariance matrix. + """ + + cov = corr * np.outer(std, std) + return cov + + @staticmethod + def cov_to_corr(cov): + """ + Derives the correlation matrix from a covariance matrix. + + Formula used: Corr = Cov / OuterProduct(std, std) + + :param cov: (np.array) Covariance matrix. + :return: (np.array) Covariance matrix. + """ + + # Calculating standard deviations of the elements + std = np.sqrt(np.diag(cov)) + + # Transforming to correlation matrix + corr = cov / np.outer(std, std) + + # Making sure correlation coefficients are in (-1, 1) range + corr[corr < -1], corr[corr > 1] = -1, 1 + + return corr + + @staticmethod + def _fit_kde(observations, kde_bwidth=0.01, kde_kernel='gaussian', eval_points=None): + """ + Fits kernel to a series of observations (in out case eigenvalues), and derives the + probability density function of observations. + + The function used to fit kernel is KernelDensity from sklearn.neighbors. Fit of the KDE + can be evaluated on a given set of points, passed as eval_points variable. + + :param observations: (np.array) Array of observations (eigenvalues) eigenvalues to fit kernel to. + :param kde_bwidth: (float) The bandwidth of the kernel. (0.01 by default) + :param kde_kernel: (str) Kernel to use [``gaussian`` by default, ``tophat``, ``epanechnikov``, ``exponential``, + ``linear``,``cosine``]. + :param eval_points: (np.array) Array of values on which the fit of the KDE will be evaluated. + If None, the unique values of observations are used. (None by default) + :return: (pd.Series) Series with estimated pdf values in the eval_points. + """ + + # Reshaping array to a vertical one + observations = observations.reshape(-1, 1) + + # Estimating Kernel Density of the empirical distribution of eigenvalues + kde = KernelDensity(kernel=kde_kernel, bandwidth=kde_bwidth).fit(observations) + + # If no specific values provided, the fit KDE will be valued on unique eigenvalues. + if eval_points is None: + eval_points = np.unique(observations).reshape(-1, 1) + + # If the input vector is one-dimensional, reshaping to a vertical one + if len(eval_points.shape) == 1: + eval_points = eval_points.reshape(-1, 1) + + # Evaluating the log density model on the given values + log_prob = kde.score_samples(eval_points) + + # Preparing the output of pdf values + pdf = pd.Series(np.exp(log_prob), index=eval_points.flatten()) + + return pdf + + @staticmethod + def _mp_pdf(var, tn_relation, num_points): + """ + Derives the pdf of the Marcenko-Pastur distribution. + + Outputs the pdf for num_points between the minimum and maximum expected eigenvalues. + Requires the variance of the distribution (var) and the relation of T - the number + of observations of each X variable to N - the number of X variables (T/N). + + :param var: (float) Variance of the M-P distribution. + :param tn_relation: (float) Relation of sample length T to the number of variables N (T/N). + :param num_points: (int) Number of points to estimate pdf. + :return: (pd.Series) Series of M-P pdf values. + """ + + # Changing the type as scipy.optimize.minimize outputs np.array with one element to this function + if not isinstance(var, float): + var = float(var) + + # Minimum and maximum expected eigenvalues + eigen_min = var * (1 - (1 / tn_relation) ** (1 / 2)) ** 2 + eigen_max = var * (1 + (1 / tn_relation) ** (1 / 2)) ** 2 + + # Space of eigenvalues + eigen_space = np.linspace(eigen_min, eigen_max, num_points) + + # Marcenko-Pastur probability density function for eigen_space + pdf = tn_relation * ((eigen_max - eigen_space) * (eigen_space - eigen_min)) ** (1 / 2) / \ + (2 * np.pi * var * eigen_space) + pdf = pd.Series(pdf, index=eigen_space) + + return pdf + + def _pdf_fit(self, var, eigen_observations, tn_relation, kde_bwidth, num_points=1000): + """ + Calculates the fit (Sum of Squared estimate of Errors) of the empirical pdf + (kernel density estimation) to the theoretical pdf (Marcenko-Pastur distribution). + + SSE is calculated for num_points, equally spread between minimum and maximum + expected theoretical eigenvalues. + + :param var: (float) Variance of the M-P distribution. (for the theoretical pdf) + :param eigen_observations: (np.array) Observed empirical eigenvalues. (for the empirical pdf) + :param tn_relation: (float) Relation of sample length T to the number of variables N. (for the theoretical pdf) + :param kde_bwidth: (float) The bandwidth of the kernel. (for the empirical pdf) + :param num_points: (int) Number of points to estimate pdf. (for the empirical pdf, 1000 by default) + :return: (float) SSE between empirical pdf and theoretical pdf. + """ + + # Calculating theoretical and empirical pdf + theoretical_pdf = self._mp_pdf(var, tn_relation, num_points) + empirical_pdf = self._fit_kde(eigen_observations, kde_bwidth, eval_points=theoretical_pdf.index.values) + + # Fit calculation + sse = np.sum((empirical_pdf - theoretical_pdf) ** 2) + + return sse + + def _find_max_eval(self, eigen_observations, tn_relation, kde_bwidth): + """ + Searching for maximum random eigenvalue by fitting Marcenko-Pastur distribution + to the empirical one - obtained through kernel density estimation. The fit is done by + minimizing the Sum of Squared estimate of Errors between the theoretical pdf and the + kernel fit. The minimization is done by adjusting the variation of the M-P distribution. + + :param eigen_observations: (np.array) Observed empirical eigenvalues. (for the empirical pdf) + :param tn_relation: (float) Relation of sample length T to the number of variables N. (for the theoretical pdf) + :param kde_bwidth: (float) The bandwidth of the kernel. (for the empirical pdf) + :return: (float, float) Maximum random eigenvalue, optimal variation of the Marcenko-Pastur distribution. + """ + + # Searching for the variation of Marcenko-Pastur distribution for the best fit with the empirical distribution + optimization = minimize(self._pdf_fit, x0=np.array(0.5), args=(eigen_observations, tn_relation, kde_bwidth), + bounds=((1e-5, 1 - 1e-5),)) + + # The optimal solution found + var = optimization['x'][0] + + # Eigenvalue calculated as the maximum expected eigenvalue based on the input + maximum_eigen = var * (1 + (1 / tn_relation) ** (1 / 2)) ** 2 + + return maximum_eigen, var + + @staticmethod + def _get_pca(hermit_matrix): + """ + Calculates eigenvalues and eigenvectors from a Hermitian matrix. In our case, from the correlation matrix. + + Function used to calculate the eigenvalues and eigenvectors is linalg.eigh from numpy package. + + Eigenvalues in the output are placed on the main diagonal of a matrix. + + :param hermit_matrix: (np.array) Hermitian matrix. + :return: (np.array, np.array) Eigenvalues matrix, eigenvectors array. + """ + + # Calculating eigenvalues and eigenvectors + eigenvalues, eigenvectors = np.linalg.eigh(hermit_matrix) + + # Index to sort eigenvalues in descending order + indices = eigenvalues.argsort()[::-1] + + # Sorting + eigenvalues = eigenvalues[indices] + eigenvectors = eigenvectors[:, indices] + + # Outputting eigenvalues on the main diagonal of a matrix + eigenvalues = np.diagflat(eigenvalues) + + return eigenvalues, eigenvectors + + def _denoised_corr(self, eigenvalues, eigenvectors, num_facts): + """ + De-noises the correlation matrix using the Constant Residual Eigenvalue method. + + The input is the eigenvalues and the eigenvectors of the correlation matrix and the number + of the first eigenvalue that is below the maximum theoretical eigenvalue. + + De-noising is done by shrinking the eigenvalues associated with noise (the eigenvalues lower than + the maximum theoretical eigenvalue are set to a constant eigenvalue, preserving the trace of the + correlation matrix). + + The result is the de-noised correlation matrix. + + :param eigenvalues: (np.array) Matrix with eigenvalues on the main diagonal. + :param eigenvectors: (float) Eigenvectors array. + :param num_facts: (float) Threshold for eigenvalues to be fixed. + :return: (np.array) De-noised correlation matrix. + """ + + # Vector of eigenvalues from the main diagonal of a matrix + eigenval_vec = np.diag(eigenvalues).copy() + + # Replacing eigenvalues after num_facts to their average value + eigenval_vec[num_facts:] = eigenval_vec[num_facts:].sum() / float(eigenval_vec.shape[0] - num_facts) + + # Back to eigenvalues on main diagonal of a matrix + eigenvalues = np.diag(eigenval_vec) + + # De-noised correlation matrix + corr = np.dot(eigenvectors, eigenvalues).dot(eigenvectors.T) + + # Rescaling the correlation matrix to have 1s on the main diagonal + corr = self.cov_to_corr(corr) + + return corr + + @staticmethod + def _denoised_corr_targ_shrink(eigenvalues, eigenvectors, num_facts, alpha=0): + """ + De-noises the correlation matrix using the Targeted Shrinkage method. + + The input is the correlation matrix, the eigenvalues and the eigenvectors of the correlation + matrix and the number of the first eigenvalue that is below the maximum theoretical eigenvalue + and the shrinkage coefficient for the eigenvectors and eigenvalues associated with noise. + + Shrinks strictly the random eigenvalues - eigenvalues below the maximum theoretical eigenvalue. + + The result is the de-noised correlation matrix. + + :param eigenvalues: (np.array) Matrix with eigenvalues on the main diagonal. + :param eigenvectors: (float) Eigenvectors array. + :param num_facts: (float) Threshold for eigenvalues to be fixed. + :param alpha: (float) In range (0 to 1) - shrinkage among the eigenvectors. + and eigenvalues associated with noise. (0 by default) + :return: (np.array) De-noised correlation matrix. + """ + + # Getting the eigenvalues and eigenvectors related to signal + eigenvalues_signal = eigenvalues[:num_facts, :num_facts] + eigenvectors_signal = eigenvectors[:, :num_facts] + + # Getting the eigenvalues and eigenvectors related to noise + eigenvalues_noise = eigenvalues[num_facts:, num_facts:] + eigenvectors_noise = eigenvectors[:, num_facts:] + + # Calculating the correlation matrix from eigenvalues associated with signal + corr_signal = np.dot(eigenvectors_signal, eigenvalues_signal).dot(eigenvectors_signal.T) + + # Calculating the correlation matrix from eigenvalues associated with noise + corr_noise = np.dot(eigenvectors_noise, eigenvalues_noise).dot(eigenvectors_noise.T) + + # Calculating the De-noised correlation matrix + corr = corr_signal + alpha * corr_noise + (1 - alpha) * np.diag(np.diag(corr_noise)) + + return corr + + def _detoned_corr(self, corr, eigenvalues, eigenvectors, num_facts, market_component=1): + """ + De-tones the correlation matrix by removing the market component. + + The input is the eigenvalues and the eigenvectors of the correlation matrix and the number + of the first eigenvalue that is above the maximum theoretical eigenvalue and the number of + eigenvectors related to a market component. + + :param corr: (np.array) Correlation matrix to detone. + :param eigenvalues: (np.array) Matrix with eigenvalues on the main diagonal. + :param eigenvectors: (float) Eigenvectors array. + :param num_facts: (float) Threshold for eigenvalues to be fixed. + :param market_component: (int) Number of fist eigevectors related to a market component. (1 by default) + :return: (np.array) De-toned correlation matrix. + """ + + # Getting the de-noised correlation matrix + corr = self._denoised_corr(eigenvalues, eigenvectors, num_facts) + + # Getting the eigenvalues and eigenvectors related to market component + eigenvalues_mark = eigenvalues[:market_component, :market_component] + eigenvectors_mark = eigenvectors[:, :market_component] + + # Calculating the market component correlation + corr_mark = np.dot(eigenvectors_mark, eigenvalues_mark).dot(eigenvectors_mark.T) + + # Removing the market component from the de-noised correlation matrix + corr = corr - corr_mark + + # Rescaling the correlation matrix to have 1s on the main diagonal + corr = self.cov_to_corr(corr) + + return corr diff --git a/mlfinlab/tests/test_risk_estimators.py b/mlfinlab/tests/test_risk_estimators.py index cf63b6735..04979b4a4 100644 --- a/mlfinlab/tests/test_risk_estimators.py +++ b/mlfinlab/tests/test_risk_estimators.py @@ -211,12 +211,71 @@ def test_denoised_corr(): [0.13353165, 1, -0.21921986], [-0.13353165, -0.21921986, 1]]) - # Finding the eigenvalues + # Finding the de-noised correlation matrix corr_matrix = risk_estimators._denoised_corr(eigenvalues, eigenvectors, 1) # Testing if the de-noised correlation matrix is right np.testing.assert_almost_equal(corr_matrix, expected_corr, decimal=4) + @staticmethod + def test_denoised_corr_targ_shrink(): + """ + Test the second method of shrinkage of the eigenvalues associated with noise. + """ + + risk_estimators = RiskEstimators() + + # Eigenvalues and eigenvectors to use + eigenvalues = np.array([[1.3562, 0, 0], + [0, 0.9438, 0], + [0, 0, 0.7]]) + eigenvectors = np.array([[-3.69048184e-01, -9.29410263e-01, 1.10397126e-16], + [-6.57192300e-01, 2.60956474e-01, 7.07106781e-01], + [6.57192300e-01, -2.60956474e-01, 7.07106781e-01]]) + + # Expected correlation matrix + expected_corr = np.array([[1, 0.32892949, -0.32892949], + [0.32892949, 1, -0.58573558], + [-0.32892949, -0.58573558, 1]]) + + # Finding the de-noised correlation matrix + corr_matrix = risk_estimators._denoised_corr_targ_shrink(eigenvalues, eigenvectors, 1) + + # Testing if the de-noised correlation matrix is right + np.testing.assert_almost_equal(corr_matrix, expected_corr, decimal=4) + + @staticmethod + def test_detoned(): + """ + Test the de-toning of the correlation matrix. + """ + + risk_estimators = RiskEstimators() + + # Correlation matrix to use + corr = np.array([[1, 0.1, -0.1], + [0.1, 1, -0.3], + [-0.1, -0.3, 1]]) + + # Eigenvalues and eigenvectors to use + eigenvalues = np.array([[1.3562, 0, 0], + [0, 0.9438, 0], + [0, 0, 0.7]]) + eigenvectors = np.array([[-3.69048184e-01, -9.29410263e-01, 1.10397126e-16], + [-6.57192300e-01, 2.60956474e-01, 7.07106781e-01], + [6.57192300e-01, -2.60956474e-01, 7.07106781e-01]]) + + # Expected correlation matrix + expected_corr = np.array([[1, -0.33622026, 0.33622026], + [-0.33622026, 1, 0.88478197], + [0.33622026, 0.88478197, 1]]) + + # Finding the de-toned correlation matrix + corr_matrix = risk_estimators._detoned_corr(corr, eigenvalues, eigenvectors, 1) + + # Testing if the de-toned correlation matrix is right + np.testing.assert_almost_equal(corr_matrix, expected_corr, decimal=4) + @staticmethod def test_denoise_covariance(): """ @@ -231,18 +290,47 @@ def test_denoise_covariance(): [-0.001, -0.006, 0.01]]) tn_relation = 50 kde_bwidth = 0.25 + alpha = 0.2 + denoise_method = 'const_resid_eigen' + denoise_method_alt = 'target_shrink' + detone = False + detone_alt = True + market_component = 1 # Expected de-noised covariance matrix expected_cov = np.array([[0.01, 0.00267029, -0.00133514], [0.00267029, 0.04, -0.00438387], [-0.00133514, -0.00438387, 0.01]]) + expected_cov_alt = np.array([[0.01, 0.0057, -0.0028], + [0.0057, 0.04, -0.0106], + [-0.0028, -0.0106, 0.01]]) + + expected_cov_detoned = np.array([[0.01, -0.00672445, 0.00336222], + [-0.00672445, 0.04, 0.01769514], + [0.00336222, 0.01769514, 0.01]]) + # Finding the de-noised covariance matrix - cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, kde_bwidth) + cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method, detone, + market_component, kde_bwidth) + + # Using the alternative de-noising method + cov_matrix_denoised_alt = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method_alt, + detone, market_component, kde_bwidth, alpha) + + # Finding the de-toned covariance matrix + cov_matrix_detoned = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method, detone_alt, + market_component, kde_bwidth) # Testing if the de-noised covariance matrix is right np.testing.assert_almost_equal(cov_matrix_denoised, expected_cov, decimal=4) + # Testing if the de-noised covariance matrix is right + np.testing.assert_almost_equal(cov_matrix_denoised_alt, expected_cov_alt, decimal=4) + + # Testing if the de-toned covariance matrix is right + np.testing.assert_almost_equal(cov_matrix_detoned, expected_cov_detoned, decimal=4) + def test_minimum_covariance_determinant(self): """ Test the calculation of the Minimum Covariance Determinant.