Skip to content
Romesh Abeysuriya edited this page Nov 6, 2015 · 5 revisions

The minimal data required to use the fitting routine is a vector of frequencies and a vector of powers for the power spectral density. This is shown in fit_single.m.

Data file format

For state tracking, the .mat data files are initially created using bt.data.import_raw_eeg. The variables generated by import_raw_eeg are:

  • t, a vector corresponding to seconds elapsed. There are as many t values as there are spectra
  • f, a vector of frequencies
  • colheaders, a cell array containing the electrode names, as saved in the raw data (these may therefore be different for different recordings. It is up to bt.data.electrode_positions to recognize them)
  • s, a matrix of power spectral density. Its size is f x t x colheaders - that is, it stores the power spectrum for all electrodes at all points in time. Slices can then be taken across whichever dimensions are desired
  • state_score, a vector the same size as t, which records which stage of sleep was predominant. If an odd number of spectra are used (e.g. 4s windows averaged to give 30s blocks means 27 spectra are used) then there is never any ambiguity which state is dominant. The state_score is a number corresponding to a row in bt_utils.state_cdata
  • state_str, a cell array the same size as t, which stores titles for the plots. By convention, this string also records the breakdown of which states were averaged in the 30s block - for example, 'W-1 (14/16)' means that there were 14 seconds of wake, and 16 seconds of S1 sleep in the block.
  • nspec, a vector the same size as t, which stores how many of the 4s spectra were averaged to yield the 30s block spectrum.
  • n_reject, a vector the same size as colheaders, stores the number of 4s spectra rejected for that channel.
Power spectra and artifact rejection

The power spectrum calculation and artifact rejection is implemented in bt.data.get_tfs. The code in bt.data.get_tfs operates on single electrodes, and this function is called by bt.data.import_raw_eeg, which iterates over each electrodes. Note that this means that artifact detection in the current framework only uses information from one electrode at a time.

The power spectrum is computed in the following sequence

  1. First, utils.rfft (from the corticothalamic-model repository) is used to compute a sequence of spectra in short windows, by default 4s. For each window, the standard deviation of the voltage is computed.
  2. Next, for each spectrum, the delta power is computed. If the delta power lies outside a range of values determined by the distribution of delta powers across all 4s windows, then the spectrum is rejected. If the standard deviation of the voltage exceeds a threshold determined by the distribution of voltage standard deviations across all 4s windows, then the spectrum is rejected. If the voltage does not change for a time period greater than run_threshold (this can happen due to problems with the recording device), then the corresponding spectrum is rejected.
  3. Sets of the 4s spectra are averaged together to give 30s spectra. For a 30s window, all of the 4s spectra contained within it are selected. Those that are not rejected are averaged together to give the 30 spectrum. The number of spectra that were averaged is recorded in nspec. Again, it is up to the fit wrapper to deicde what to do with nspec.

Contaminated 4s spectra are automatically excluded by get_tfs and are therefore not seen by later stages of processing - in fact, the 4s spectra are created in get_tfs but are only used internally and never returned. However, the 30s spectra are always produced, unless there are more than 26 consecutive rejected 4s spectra. If there ever is a run of more than 30s of contaminated 4s spectra, a sensible strategy would probably be to return an array of NaNs as the spectrum. When the MCMC routine is then run, it will return NaN for chisq, the same as if the parameters were not allowed. As before, it would be up to the fit wrapper e.g. fit_cluster.m to decide what to do if an NaN spectrum is encountered.

Loading the data

The .mat files generated by import_raw_eeg are loaded for fitting using bt.core.load_subject_data. This function detects sleep onset and truncates the start of the recordings, and also supports selecting only a subset of the electrodes. The fit data is returned in a struct containing fields:

  • t
  • nspec
  • s
  • state_str
  • state_score
  • start_idx

which are directly analogous to those in the raw .mat file, except that they only contain data for the requested electrodes, and the t vector (and all others) start at start_idx from the raw data.

Any data structure matching this format can be used with BrainTrak

Note that state_score primarily determines the colour of the plots, and can be set to an arbitrary value (as long as it is an integer that indexes a row in bt_utils.state_cdata) for data that does not have sleep stage information. Similarly, state_str is primarily used as the title for plots and in sleep-specific analysis routines, and can be set to arbitrary strings for data that does not have sleep stage information.

Steps for a new data set

To work with a new data set, the general strategy is to get Matlab arrays corresponding to the time series for each electrode. This step can be nontrivial, depending on the format of the raw data, and typically involves processing with software like EEGLAB.

After that point, bt.data.get_tfs can be used to perform the standard FFT and artifact rejection, and the state information can be converted from the data source or else dummy data can be used.

Finally, the data can be loaded through bt.core.load_subject if desired, or else an entirely different loading function can be used (such as data_examples/load_br_data.m) - the only requirement is that this function returns a struct matching the format detailed above.

Clone this wiki locally