-
Notifications
You must be signed in to change notification settings - Fork 5
Fitting algorithm
This page contains information about the fit wrappers
fit spectrum
fit
fit track
fit cluster
The lowest level wrapper is fit_spectrum
. This function serves as a direct wrapper for chain
or chain_parallel
. Almost all fitting is performed via fit_spectrum
. The inputs to this function contain all of the information needed to use chain
, namely
- The model
- The target spectrum
- The priors
- The initial values
- The length of the chain
- Which parameters are being fitted
The variable target_state
is used to help label plots, and doesn't generally affect the fitting process. The outputs from fit_spectrum
are:
-
f_out
which is a feather object containingfit_data
andplot_data
. -
fit_data
which contains the fitted parameters, statistics about the chain and goodness of fit, and the posterior distributions. -
plot_data
which has a grid of XYZ values and counts for how many samples in the chain fell into each XYZ bin. This is used when plotting the 3D density histogram (i.e. the clouds) corresponding to the samples.
fit_spectrum
performs the following operations:
- Query the model for initial values and priors if none are provided
- Set the model's electrodes to
Cz
if they haven't been set already - if you want to use different electrodes, then you should callmodel.set_electrodes()
yourself before passing the model tofit_spectrum
- Call
model.prepare_for_fit
with the target spectrum as input - Call
chain
orchain_parallel
depending on whether a parallel pool is currently running - Use the output from
chain
to constructfit_data
andplot_data
- Create a
feather
object storing both of these variables
fit.m
is a minimal wrapper that has a simplified list of arguments, and automatically plots the output feather object. It is essentially equivalent to fit_spectrum
.
fit_track.m
is a function that fits a sequence of spectra. Therefore, it conceptually sits one level higher than fit_spectrum
because fit_track
calls fit_spectrum
repeatedly. For convenience, fit_track
takes in the name of a data set, instead of the name of a file. The data are then loaded using bt.core.load_subject
- see here for more details. This function returns a struct
that essentially stores a sequence of spectra, and their corresponding sleep stage.
fit_track
then iterates over the spectra. The spectra are stored in a matrix, and the user can select entries from this matrix by setting the input argument t_input
. The EEG spectra are typically computed in 1s increments, so t_increment
usually refers to time in seconds. For example
t_increment = 1:1:5
would mean to fit the first 5 spectra, and
t_increment = 10:2:20
would mean to fit 6 spectra, with 2s intervals between them, starting 10s into the recording. The loop over t_increment
then proceeds as follows:
- The target spectrum and sleep stage are retrieved from the EEG data struct
- If this is the first fit,
model.initialize_fit
is used to supply the priors and initial conditions. Otherwise, they are retrieved from the previous fit -
fit_spectrum
is called, returningfit_data
andplot_data
- These variable are appended to a
feather
object - The
feather
object is returned
braintrak/Published/Abeysuriya_2015/fit_cluster.m
is an example of a wrapper script that can be used for large scale parallel fitting. This script was used to fit the 28 full-night sleep recordings published in Abeysuriya et. al. (2015). fit_cluster
achieves several additional aims that fit_track
does not provide, some of which are specific to the particular methodology in the paper. These are:
- Skip fitting if too many artifacts are present in the recording
- Save
fit_data
andplot_data
separately to reduce RAM requirements when loading the fits - Save backups and resume from backups to allow fitting to be interrupted
Artifacts are detected when the data is converted into a format suitable for BrainTrak, but it is up to the wrapper script to decide how to use the artifact information. fit_cluster
checks how many clean 4s segments were used to calculate the 30s spectrum, and if too few were used, it skips the fit. To skip the fit, for all parameters fit_data.skip_fit(1:end)
is set to 3
, which is a special value indicating that the fit was skipped. The target spectrum in fit_data
is updated, but everything else in fit_data
and plot_data
are kept the same.
As noted in the documentation for the Feather class, plot_data
can be optionally stored in separate files. This is accomplished by capturing the fit_data
and plot_data
variables from fit_spectrum
, rather than using the returned feather
object:
[~,fit_data,plot_data] = bt.core.fit_spectrum(...)
The plot data is then saved to a .mat
file, and the corresponding filename is added to the feather
instead of the data itself.
plot_data_fname = sprintf('%s/plot_data/t_%d',output_dir,j);
save(plot_data_fname,'plot_data');
f.insert(fit_data,plot_data_fname,t_increment(j)); % Insert after saving as j will be recomputed
It's worth noting that on disk, the plot data takes up only around 200MB. However, .mat
files are compressed, and loading all of them into memory rapidly uses up in excess of 16GB. This is the motivation for storing the data separately on disk. In contrast, all of the fit data can be loaded into memory, and it is often important to look at the parameter values over time, so these are all stored in full within the feather
object.
Finally, every so often, the feather
object is written to disk. This is performed in two phases
-
cached_output.mat
is copied tocached_output_backup.mat
- The
feather
is saved tocached_output.mat
On the cluster with parallel simulations, there is a decent chance that an interruption to the jobs will kill Matlab while cached_output.mat
is being written, thus corrupting its contents. This is addressed by cached_output_backup.mat
which ensures that at least one of the files is uncorrupted regardless of when the process is killed. The final output is saved to fit_output.mat
.