Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output is different when n_samples < 100 in MATLAB and R #89

Open
scottgigante opened this issue Mar 28, 2020 · 0 comments
Open

Output is different when n_samples < 100 in MATLAB and R #89

scottgigante opened this issue Mar 28, 2020 · 0 comments
Labels

Comments

@scottgigante
Copy link
Collaborator

HML 9:57 AM
Hello- I first want to say how impressed I am with the PHATE method! I was really excited when I came across the method a few months ago and am currently using PHATE on a microbiome metagenomics dataset. Up until yesterday I was using PHATE in R, but decided to move to Matlab since I will be working with a larger dataset that is too big for R. My initial dataset was 384 samples by 18534 open reading frames and I would subsample this down to look at certain timepoints- this matrix would be 55 samples by 18534 ORFs. I was able to run this and get PHATE images out in R, however when I tried to re-run this same analysis in Matlab yesterday, I kept getting the following error messages: Error using randPCA (line 186)
Input 2 must be <= the smallest dimension of Input 1.
Error in svdpca (line 17)
[U,S,~] = randPCA(X', k);
Error in phate (line 182)
pc = svdpca(data, npca, 'random');
9:59
When I increased the number of samples in the matrix to 110 instead of 55 I was able to get a result out and avoid the error (however I would like to recapitulate the results I have in R with the 55 samples). I tried to go into the code and figure out what was different about the Matlab version vs the R version but was having some difficulty doing so. Is there a way I can get around this in Matlab by potentially changing some of the parameters? Can you also please explain the meaning of this error message? Thank you for your time!
10:01
*I wanted to verify that I get the same results from Matlab as I did in R to make sure I understood how Matlab PHATE was working before running it on my new dataset. The new dataset will be in the range of 384 samples by 110,000 ORFs (rather than 384 samples by 18,000 ORFs)

Scott Gigante 10:02 AM
Hi @HML, this is a bug in the MATLAB code -- we shouldn't be running PCA if n_pca >= n_samples. I've fixed this on dev or alternatively you can just set npca=[] . (edited)
10:02
Thanks for reporting!

HML 10:03 AM
Great- thank you! Just to confirm, would the code line include phate(input_matrix, npca=[])?

Scott Gigante 10:03 AM
I believe so, yes (though I'm not fluent in MATLAB so let me know if that gives you an error!)
10:04
note that this would only be for the case when you have less than 100 data points

HML 10:14 AM
Got it- ok, so it does run now. I created a new function (copied and pasted the original script for phate.m and just manually edited the npca= []), but the image is definitely somewhat different than it was in the R version
10:16
When I ran the matrix with 110 samples in R and Matlab I got the exact same image out
10:18
I have highlighted the points that are the same points- Matlab on the left, R on the right
image

10:19
*The image is for the dataset with the 55 samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant