-
Notifications
You must be signed in to change notification settings - Fork 5
Theory
This wiki page provides an overview of the theory behind BrainTrak. This section is best read in conjunction with the theory paper link - this is the paper which the equation numbers here refer to. The content on this wiki page is a brief summary and complementary explanation for the content in the paper.
The basic idea behind curve fitting is to minimize the difference between the experimental data and the model fit. This difference is quantified by Eq. (27), and is the sum of squared differences. The chisq statistic is zero if the fit is identical to the data, and it becomes larger as the fit becomes more and more different to the data. We want to choose the parameters to minimize chisq, and thus minimize the difference between the data and the fit.
Traditional optimization methods work with chisq directly, and seek to minimize its value. BrainTrak uses a somewhat different approach. Instead of using chisq directly, we consider the 'likelihood' of the parameters by using exp(-chisq/2) as per Eq. (30). This measure is maximized when chisq is zero (perfect fit) and approaches zero as the fit becomes worse and worse. Criticially, unlike chisq itself, the likelihood is bound between 0 and 1. The likelihood is defined for all possible model parameters.
We can then interpret the likelihood as a probability distribution. However, the normalization is unknown, because the integral of Eq. (30) over all parameters will almost certainly not be 1. In addition, we also do not know the analytic form of L(x) beyond being able to evaluate it via chisq for individual sets of parameters. Both of these problems can be solved by using the Metropolis-Hastings algorithm, a form of Markov Chain Monte Carlo (MCMC) sampling. For our purposes, the role of MCMC sampling is to draw samples from an unknown probability distribution - and that distribution does not need to be normalized. The MCMC algorithm will thus generate more samples for parameters where the probability is high, and less samples where the probability is low. The distribution of the sampled points is the same as the distribution of the unknown distribution. In our case, the unknown distribution is Eq. (30). The MCMC algorithm will return many thousands of parameter combinations. There will be many parameter sets where the fit is good, and fewer parameter sets where the fit is poor. The distribution in Eq. (30) can be estimated by looking at the distribution of the sampled parameters. Don't forget that this distribution corresponds to the goodness of fit, and thus gives us information about the quality of the fit and the uncertainty in the fitted parameters.
The distribution in Eq. (30) is a joint distribution for all of the model parameters. That is, it reflects all of the correlations and interdependences in the parameters. We can construct marginal distributions for individual parameters by integrating the distribution over all of the 'nuisance' parameters (that is, all of the parameters we are not interested in). As shown in the figure below, the samples are the black dots, one contour from the joint distribution is shown in green, and then the marginal distributions for each of the individual parameters are shown in red and blue. BrainTrak returns the samples themselves, which means that the joint distribution or any of the possible marginal distributions can all be computed.
The MCMC algorithm essentially simulates a random walk. A 'proposal distribution' is used to generate a proposed step for the walker. The step is accepted with a probability that depends on the ratio of the probability of the current position and the proposed step, such that improvements are always accepted, and worse steps are accepted with a probability proportionate to the ratio of the goodness of fits for the two points. In other words
- Compute the probability for the current step using Eq. (29).
- Generate a proposed set of parameters randomly.
- Evaluate the probability of the new parameters using Eq. (29).
- If the new parameters have a higher probability, 'accept' the step by making the new parameters the current step.
- Otherwise, generate a random number from 0 to 1, and if the random number is smaller than the ratio of the probability of the new parameters to the old parameters, accept the step. For example, if the probability of the new parameters is 0.25, and the old parameters is 0.5, then the ratio is 0.5, and there is a 50% chance of accepting the new parameters.
- Record the current step, and the probability
- Repeat many times
The output from the algorithm is thus a sequence of parameter sets, referred to as the 'chain'. The chain contains more points with parameter values where the probability is highest - that is, it samples the parameter space more densely near where the fit is good. This makes it very efficient at sampling large parameter spaces, because few samples are wasted far away from regions of interest.
To fit a single spectrum, we perform the above procedure and generate a chain. To obtain the fitted parameters, we select the point that has the highest probability according to Eq. (29). To compute the distributions for the parameters, the points are binned, and then the bin counts are normalized to unit area.
If we measure the physiology of the brain at time t, and then again at time t+1, we would expect the physiology to be highly correlated. Over short time scales, the model parameters are not expected to change very much. We can quantify this by using Bayes's theorem, according to Eq. (31). The idea is that even in the absence of any measurements, parameter values close to the current parameters are more likely than parameter values that are significantly different.
To perform tracking, we start by computing a sequence of spectra, as described in the paper in section 3.3.2. Then, we fit the first spectrum using the method described above. Finally, we use the marginal distributions for each parameter as the prior in Eq. (31), and we repeat fitting, only using Eq. (31) to define the probability rather than Eq. (29).
Now proceed on to see how the theory is reflected in the software structure