alljoined_preprocessing

Folder Structure

Below is a description of the main files and directories:

├── eeg_data/                       # Data files are stored here
    ├── bdf/                        # Raw bdf files
        ├── subj01_session1.bdf
        ├── ...
    └── raw_csv/                    # The stimulus data retrieved from bdf
        ├── subj01_session1.csv
        ├── ...
    └── parsed_csv/                 # Parsed stimulus data with double triggers merged
        ├── subj01_session1.csv
        ├── ...
    └── fif/                        # Parsed stimulus data converted to .fif format
        ├── subj01_session1_eeg.fif
        ├── ...
    └── final_eeg/                  # Final preprocessed and epoched EEG data in .fif format
        ├── 05_125/                 # A frequency range of our filtering
            ├── subj01_session1_epo.fif
            ├── ...
        ├── ...
    └── final_hdf5/                 # Preprocessed EEG data, stimulus info as pd dataframes saved in hdf5 format
        ├── 05_125/
            ├── subj01_session1.h5
            ├── ...
        ├── ...
├── parse-bdf-event-codes-to-fif.ipynb  # Script to convert bdf to fif and merge double triggers
├── fif-eeg-preprocessing.py            # Script to clean our EEG data
├── final_dataset                   # Working folder to create huggingface dataset
    ├── data/                       # Folder for final CSV file
        ├── final_dataset_subj04_session2.csv
    ├── main_dataset.py             # Script to combine our preprocessed, cleaned EEG data and map to coco image ids
    ├── nsd_coco_conversion.csv     # File that maps from NSD image id to coco id
    ├── nsd_expdesign.mat           # File that describes which images were shown to which subject
    ├── create_huggingface_dataset.py   # Script that uploads CSV dataset to huggingface
    ├── download_coco.py            # Script that downloads coco images used in our dataset
    ├── data/                       # Folder for downloaded coco images
    ├── behavioural_dataset.py      # Script to create a dataset for just our behavioural data
    ├── huggingface/                # Huggingface cache dir for our new dataset
└── README.md           # The file you're reading now

Preprocessing pipeline

Set up your environment, by creating a conda or mamba env from the environment.yml file.
Get the raw bdf file from Biosemi device. Link to the datasets is here.
1. You should download all the .bdf files for each subject and session
2. Move them to /eeg_data/bdf
Run the first part of parse-bdf-event-codes-to-fif.ipynb to retrieve stimulus data
Run the second part of parse-bdf-event-codes-to-fif.ipynb to merge the double trigger data into a combined format. We use double trigger because BioSemi only supports 8 bits.
1. The script may come across a phantom event, in the a format like Error in line 10: 101 254. Navigate to that line to resolve the issues around that line.
Run the third part of parse-bdf-event-codes-to-fif.ipynb to convert parsed_csv to fif format.
Upload the preprocessed .fif file to https://drive.google.com/drive/u/0/folders/1gI9csmnCwedRrlDoRy-jCqK4bclVN6mD.

We need to create a dataset for each of the frequency ranges: 0.5/125, 55/95, 14/70, 5/95. You should have been assigned one of these four to continue.

Current assignments:

0.5/125: Jonathan
55/95: Yash
14/70: Tazik
5/95: Yash

Modify the frequency in fif-eeg-preprocessing.py, on this line: raw.filter(l_freq=0.5, h_freq=125) Then modify the path and replace low, high with actual values. For 0.5, use 05: preprocessed_file_path = os.path.join('eeg_data', 'final_eeg_low_high', f"{root_name}\_epo.fif" )`` ex. preprocessed_file_path = os.path.join('eeg_data', 'final_eeg_05_125', f"{root_name}_epo.fif" )``
Run fif-eeg-preprocessing.py to preprocess the .fif data. This performs band filtering, epoch detection, PCA, eye blink removal, and baseline correction. The output is saved in /eeg_data/final_eeg.
Change eeg_fif_folder to the correct one in final_dataset/main_dataset.py, on this line: eeg_fif_folder = '../eeg_data/final_eeg' Then change output_csv_path to point to also include the frequency range in the name: output_csv_path = os.path.join(eeg_csv_folder, '../combined_dataset.csv') Finally, change LO_HI to the right range e.g. `LO_HI = "05_125"``
Run final_dataset/main_dataset.py to create a CSV of all the data for that frequency range
Download the coco images by running final_dataset/download_coco.py. This file is 22 gigabytes in size.
Update csv_file_path to the csv path you create above in final_dataset/create_huggingface_dataset.py
Create .env file, set HF_PUSH to your hugginface access token
In final_dataset/create_huggingface_dataset.py, set DSET_NAME and then run the script to create and upload the huggingface dataset

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
archive		archive
final_dataset		final_dataset
snr		snr
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
fif-eeg-preprocessing.py		fif-eeg-preprocessing.py
fif_preproc.sh		fif_preproc.sh
parse-bdf-event-codes-to-fif.ipynb		parse-bdf-event-codes-to-fif.ipynb
plot_fif.ipynb		plot_fif.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alljoined_preprocessing

Folder Structure

Preprocessing pipeline

About

Releases

Packages

Languages

faithhunja/alljoined_preprocessing

Folders and files

Latest commit

History

Repository files navigation

alljoined_preprocessing

Folder Structure

Preprocessing pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages