Skip to content
Romesh Abeysuriya edited this page Nov 1, 2015 · 9 revisions

This page contains information about the fit wrappers

  • fit spectrum
  • fit
  • fit track
  • fit cluster

fit_spectrum

The lowest level wrapper is fit_spectrum. This function serves as a direct wrapper for chain or chain_parallel. Almost all fitting is performed via fit_spectrum. The inputs to this function contain all of the information needed to use chain, namely

  • The model
  • The target spectrum
  • The priors
  • The initial values
  • The length of the chain
  • Which parameters are being fitted

The variable target_state is used to help label plots, and doesn't generally affect the fitting process. The outputs from fit_spectrum are:

  • f_out which is a feather object containing fit_data and plot_data.
  • fit_data which contains the fitted parameters, statistics about the chain and goodness of fit, and the posterior distributions.
  • plot_data which has a grid of XYZ values and counts for how many samples in the chain fell into each XYZ bin. This is used when plotting the 3D density histogram (i.e. the clouds) corresponding to the samples.

fit_spectrum performs the following operations:

  1. Query the model for initial values and priors if none are provided
  2. Set the model's electrodes to Cz if they haven't been set already - if you want to use different electrodes, then you should call model.set_electrodes() yourself before passing the model to fit_spectrum
  3. Call model.prepare_for_fit with the target spectrum as input
  4. Call chain or chain_parallel depending on whether a parallel pool is currently running
  5. Use the output from chain to construct fit_data and plot_data
  6. Create a feather object storing both of these variables

fit

fit.m is a minimal wrapper that has a simplified list of arguments, and automatically plots the output feather object. It is essentially equivalent to fit_spectrum.

fit_track

fit_track.m is a function that fits a sequence of spectra. Therefore, it conceptually sits one level higher than fit_spectrum because fit_track calls fit_spectrum repeatedly. For convenience, fit_track takes in the name of a data set, instead of the name of a file. The data are then loaded using bt.core.load_subject - see here for more details. This function returns a struct that essentially stores a sequence of spectra, and their corresponding sleep stage.

fit_track then iterates over the spectra. The spectra are stored in a matrix, and the user can select entries from this matrix by setting the input argument t_input. The EEG spectra are typically computed in 1s increments, so t_increment usually refers to time in seconds. For example

t_increment = 1:1:5 

would mean to fit the first 5 spectra, and

t_increment = 10:2:20

would mean to fit 6 spectra, with 2s intervals between them, starting 10s into the recording. The loop over t_increment then proceeds as follows:

  1. The target spectrum and sleep stage are retrieved from the EEG data struct
  2. If this is the first fit, model.initialize_fit is used to supply the priors and initial conditions. Otherwise, they are retrieved from the previous fit
  3. fit_spectrum is called, returning fit_data and plot_data
  4. These variable are appended to a feather object
  5. The feather object is returned

fit_cluster

braintrak/Published/Abeysuriya_2015/fit_cluster.m is an example of a wrapper script that can be used for large scale parallel fitting. This script was used to fit the 28 full-night sleep recordings published in Abeysuriya et. al. (2015). fit_cluster achieves several additional aims that fit_track does not provide, some of which are specific to the particular methodology in the paper. These are:

  1. Skip fitting if too many artifacts are present in the recording
  2. Save fit_data and plot_data separately to reduce RAM requirements when loading the fits
  3. Save backups and resume from backups to allow fitting to be interrupted

Artifacts are detected when the data is converted into a format suitable for BrainTrak, but it is up to the wrapper script to decide how to use the artifact information. fit_cluster checks how many clean 4s segments were used to calculate the 30s spectrum, and if too few were used, it skips the fit. To skip the fit, for all parameters fit_data.skip_fit(1:end) is set to 3, which is a special value indicating that the fit was skipped. The target spectrum in fit_data is updated, but everything else in fit_data and plot_data are kept the same.

As noted in the documentation for the Feather class, plot_data can be optionally stored in separate files. This is accomplished by capturing the fit_data and plot_data variables from fit_spectrum, rather than using the returned feather object:

[~,fit_data,plot_data] = bt.core.fit_spectrum(...)

The plot data is then saved to a .mat file, and the corresponding filename is added to the feather instead of the data itself.

plot_data_fname = sprintf('%s/plot_data/t_%d',output_dir,j);
save(plot_data_fname,'plot_data');
f.insert(fit_data,plot_data_fname,t_increment(j)); % Insert after saving as j will be recomputed

It's worth noting that on disk, the plot data takes up only around 200MB. However, .mat files are compressed, and loading all of them into memory rapidly uses up in excess of 16GB. This is the motivation for storing the data separately on disk. In contrast, all of the fit data can be loaded into memory, and it is often important to look at the parameter values over time, so these are all stored in full within the feather object.

Finally, every so often, the feather object is written to disk. This is performed in two phases

  1. cached_output.mat is copied to cached_output_backup.mat
  2. The feather is saved to cached_output.mat

On the cluster with parallel simulations, there is a decent chance that an interruption to the jobs will kill Matlab while cached_output.mat is being written, thus corrupting its contents. This is addressed by cached_output_backup.mat which ensures that at least one of the files is uncorrupted regardless of when the process is killed. The final output is saved to fit_output.mat.

Clone this wiki locally