-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different time series for each time course #8
Comments
Hey, I'm not sure how to "align" the time-points using these approaches. However, I just thought I'd point out that MoHGP clusters output variables. That is, in this case you could see which of your 10 metrics follow similar trends over time. Your problem is to find groups of patients having their trajectories in these 10 metrics in common. This corresponds more to the OMGP model. It should work for 10D output, but might be slow. If you have "aligned" times for the patients, you could run OMGP, find the assignments to individual trends for each patient, then see if any particular kind of patient is enriched in one trend compared to another. |
Thanks for the input @vals ! I have also given some thoughts as to what we should do for unaligned inputs. There seem to be two options
|
Just need a bit of advice. I've just tried out using the Fixed kernel, and I think I've seen this issue before, but I've forgotten what's going on, and I'm wondering if someone's got an explanation. http://nbviewer.jupyter.org/gist/lionfish0/f243356da422c8f2b59b353f42e4d00d Why does predict return a list of variances (one for each value in the training data). Also plot doesn't work any more with a Fixed kernel. By picking the top value of the variance list it seems to make a plausible plot (see notebook linked) but I obviously would like to know what's going on! Thanks, Mike. |
hi Mike, Here's the correct way to plot with that nasty fixed kernel: k0 = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1.)
kfix = GPy.kern.Fixed(input_dim=1, covariance_matrix=CM)
newkernel = k0 + kfix
m = GPy.models.GPRegression(X,Y,newkernel)
m.optimize()
m.plot(predict_kw=dict(kern=k0)) The reason the returned variance is weird is because the Fixed kernel is not really a kernel at all: it just returns a 'Fixed' matrix each time you call it. So at predict time, it sends you a matrix of completely the wrong shape (since the kernel does not know whether you're training or predicting!) This is also why the plotting breaks. |
Ah, I understand now! Didn't know the cool trick to select which kernels to use for prediction/plotting. For reference (for future users) it looks like I can do the same with the predict function by using the 'kern' parameter: |
Hey, Sorry for going on about the OMGP :p, but you want to do the must-link constraints on patients as described here, no? http://www.ece.neu.edu/fac-ece/jdy/papers/ross-dy-ICML2013.pdf The MRF constraints are not implemented here though.. |
Can you model your problem as count data? Then each person wouldn't need the same X. Just a thought. I have a similar problem, but mine is that I have thousands of count data streams representing sales data. That is, a k products are sold at time t is the data set. It is very sparse in that on most days, no product is sold. So I want to model it as counting data, then cluster the GPs that model each count stream. |
I've been looking at how to modify GPClust to handle the situation in which each time course has a different X. The application I'm working on is clustering patients with MND. They have various metrics recorded at irregular intervals (e.g. one person was sampled on day 3,34,64,71,99; another on day 12,54,102,103,120, etc...). I suspect that there are different 'types' of progression and would like to see if I can detect clusters.
I'll also need to look into whether I can add a time offset as a parameter, for each time course (as I don't know when each time course 'starts', i.e. when day zero was for each person). Finally, each person has ~10 different metrics (all recorded together at each interval) - I'll need to look into how to use a multiple-output GP in the clustering framework.
I noticed in MOHGP.py you mention that "#prediction as per my notes" - I'm trying to go from your paper to the code, but if there's some intermediate reasoning somewhere, that would be super helpful!
The text was updated successfully, but these errors were encountered: