Skip to content

Make a predicted spectral library

MarcIsak edited this page Nov 2, 2021 · 107 revisions

Brief background

In this wiki, we will use MSLibrarian to create a predicted spectral library. For demonstration purposes, the spectral library will be created to analyse DIA runs of Yeast samples (Saccharomyces cerevisiae).

Create a Calibration Library

First, we begin by creating a Calibration Library that will be used to extract optimal predicted fragment ion intensities and calibrate iRT predictions.

Click to expand!

Preparations

First we start by finding the paths to the DIA runs.

diaFolder = "Y:/imp_bioms/CK/mslibr_test/YeastDIA/" # The folder with DIA MS files in RAW format
diaFiles = list.files(diaFolder, pattern = ".raw$", full.names = T) # Extracts the full paths to all raw files
diaFiles # prints the file names.

"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2005_306.raw"
"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2005_307.raw"
"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2005_308.raw"
"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2006_079.raw"
"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2006_080.raw"
"Y:/imp_bioms/CK/mslibr_test/YeastDIA/CK_P2006_085.raw"

In the second step, we define some parameters to use for the first function:

projectFolder = "D:/demo_mslibr_yeast" # Project folder to save all MSLibrarian outputs into
fasta = "D:/Databases/Uniprot_Swissport_Yeast_210710/Canonical_Isoforms/uniprot_swissprot_yeast.fasta # Path to protein sequence FASTA"
searchEngine = "msfragger # Database search engine to use (default = "comet")
irt = "biognosys_irt # "Use Biognosys iRT values as RT scale. If argument is not provided, retention times will be in seconds."

Run create.calibration.lib()

To create the Calibration Library, we run the following MSLibrarian function:

create.calibration.lib(projectFolder = projectFolder, fasta = fasta, diaFiles = diaFiles, searchEngine = searchEngine, irt = irt)

Once the the function execution has completed, there should be a Calibration Library in OpenSwath (*tsv) format, located in the Project folder that you created ("D:/demo_mslibr_yeast/library/calibration_lib.tsv")

Process the Calibration Library

Once a Calibration Library has been created, it is possible to compare experimental spectra in the Calibration Library to spectra predicted by Prosit at different collision energies. In this way, the optimal Prosit collision energies can be determined for extracting predicted fragment ion intensities that are most similar to experimental intensities.

Click to expand!

Setting up parameters

Before comparing experimental spectra to predicted spectra, we need to specify some new parameters.

predictionDb = "D:/Data_PROSIT/Libraries/Yeast/210710/SQLITE/yeast_prosit_hcd_intensity_2020_irt_2019.sqlite" # Path to Prosit prediction SQLite DB
rt = "iRT" # The type of retention time scale to use for the latter library building

Run process.calibration.lib()

To run comparisons between experimental spectra and predicted spectra, we run the following function:

process.calibration.lib(projectFolder = projectFolder, predictionDb = predictionDb, rt = rt)

Click to see console output!

"Processing library: D:/demo_mslibr_yeast/library/calibration_lib.tsv"
"Library has indexed retention times (iRT)"
"Extracting precursor data..."
"Adding precursor data to the Calibration Library object..."
"Filtering precursors..."
"Number of precursors after filtering: 11788 ( 98.73% )"
"Adding precursor and MS/MS information to Spectrum2 objects..."
"Adding precursor data in the prediction database to Spectrum2 objects..."
"Precursors match..."
"Comparing experimental spectra and predicted spectra having CE = 20"
44.89 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 21"
40.87 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 22"
44.05 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 23"
45.25 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 24"
45 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 25"
46.8 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 26"
46.78 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 27"
51.97 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 28"
43.29 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 29"
46.77 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 30"
50.45 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 31"
47.41 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 32"
46.22 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 33"
42.42 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 34"
41.42 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 35"
43.61 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 36"
44.17 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 37"
44.02 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 38"
43.73 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 39"
43.61 sec elapsed
"Comparing experimental spectra and predicted spectra having CE = 40"
43.16 sec elapsed
"Adding spectral matching result to Calibration Library object, slot: Comparisons"
"Bin precursors based on a cutoff: 250"
"Writing processed Calibration Library to: D:/demo_mslibr_yeast/library/calibration_lib.RData"
"Creating results plot after comparisons of experimental vs. predicted spectra"
"Saving plot to:D:/demo_mslibr_yeast/library/calibration_lib.pdf"

After executing the function, two more files can be found in the subfolder library of the project folder ("D:/demo_mslibr_yeast/library/")

  • calibration_lib.RData --> contains results for similarity comparisons between experimental and predicted spectra.
  • calibration_lib.pdf --> Plot showing the optimal collision energies, dot products and distributions for precursors of different lengths and charges

prosit_input

Create the predicted spectral library

Since the optimal collision energies have been determined for extracting predicted fragment ion intensities, we can build a predicted spectral library. The library in this case will have both fragment ion intensities and iRT values predicted by Prosit.

Click to expand!

To run create the library, we only need to specify one new parameter:

format = "openswath" # Spectral library output format

To create the spectral library we run the following MSLibrarian function:

create.spectral.lib(projectFolder = projectFolder, fasta = fasta, format = "openswath")

Click to see console output!

[1] "Found the library file: D:/demo_mslibr_yeast/library/calibration_lib.RData in project folder: D:/demo_mslibr_yeast"
[1] "Found mzXML file(s) in D:/demo_mslibr_yeast/dia"
[1] "Using MS file: D:/demo_mslibr_yeast/dia/CK_P2005_306.mzXML"
[1] "No output library file specified! Generating default: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit"
[1] "Importing FASTA database: D:/Databases/Uniprot_Swissport_Yeast_210710/Canonical_Isoforms/uniprot_swissprot_yeast.fasta..."
[1] "Database contains: 6750 proteins..."
[1] "Adding protein data to slot: Proteins of the MSLibrarian object..."
[1] "Assuming all Cysteines are Carbamidomethylated"
[1] "In-silico digestion of proteins into peptides using trypsin"
[1] "Calculating masses..."
[1] "Done!"
[1] "Connecting to SQLite database with Prosit predictions..."
[1] "Mapping precursors from database..."
[1] "Found a DB match for 142810 queried precursors (out of 142810) with a charge = 2"
[1] "Found a DB match for 142810 queried precursors (out of 142810) with a charge = 3"
[1] "Finding duplicated peptides..."
[1] "Creating precursor data..."
[1] "Removing duplicated sequences..."
[1] "Calculating m/z values..."
[1] "Number of unique precursors: 267550"
[1] "Database matching enabled..."
[1] "Adding prediction indices..."
[1] "Number of unique predictable precursors: 267550 (100% )"
[1] "Number of unique predictable precursors within M/Z range (350.011047363281 - 1649.95153808594): 228491"
[1] "Connecting to Prosit database"
[1] "Importing precursor data from Prosit database..."
[1] "Adding matched precursors to slot PredLib@PrositLib..."
[1] "All precursors match!"
[1] "Adding Uniprot Ids..."
[1] "Peptide length and charge optimized CE selected..."
[1] "Importing MS/MS data from database..."
[1] "Get indices for precursors with 7, 16 AA and a charge of 2 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 28"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 8, 9, 11, 12 AA and a charge of 2 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 30"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 10 AA and a charge of 2 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 31"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 13, 14, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 AA and a charge of 2 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 29"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 18 AA and a charge of 2 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 27"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 8, 9, 10 AA and a charge of 3 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 32"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 11 AA and a charge of 3 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 34"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 12, 17, 18 AA and a charge of 3 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 33"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 13, 14, 15, 16 AA and a charge of 3 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 35"
[1] "Subprocess completed!"
[1] "Get indices for precursors with 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 AA and a charge of 3 in Prosit DB"
[1] "Selecting relative intensities predicted with a CE = 30"
[1] "Subprocess completed!"
[1] "Getting MS/MS indices for matched precursors..."
[1] "Adding MS/MS data for matched precursors to slot PredLib@PrositLib..."
[1] "Removing intensities equal to zero..."
[1] "Re-indexing of MS/MS indices..."
[1] "Processing completed!"
[1] "OpenSwath format selected..."
[1] "Setting batchSize to 228491"
[1] "Adding the PrecursorMz column to the openswath library..."
[1] "Adding the PrecursorCharge column to the openswath library..."
[1] "Adding the NormalizedRetentionTime column to the openswath library..."
[1] "Adding the PeptideSequence column to the openswath library..."
[1] "Adding the ModifiedPeptideSequence column to the openswath library..."
[1] "Adding the UniprotId column to the openswath library..."
[1] "Adding fragment ion data to the openswath library..."
[1] "Adding OpenSwath Grouping Columns to TSV Library..."
[1] "Writing batch 1/1 to D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"

After running the function, a predicted spectral library can be found in the subfolder library of the project folder ("D:/demo_mslibr_yeast/library/). Since we did not define an output name (can be set with the argument outputLib), an output name is generated automatically for us. The library name contains information about the date and time for its creation, but also information on the strategy for selecting fragment ion intensities (ceMode = length_charge by default) and the algoritm for prediction of iRT values (_irt_prosit).

prosit_input

Make modified libraries with mod.spectral.lib()

Once a spectral library has been created, several modified versions of that library can be created. In this section, we will go through how to create spectral libraries that have been subsetted on the protein group level, peptide level and transition level. Finally, we will see how retention times can be changed from Prosit iRT to DeepLC iRT.

Protein group subsetting

Click to expand!

Protein group subsetting of a library can be performed in 3 different ways. Either the DIA software DIA-NN can be used to perform a first-pass search of the unmodified library, followed by the extraction of protein groups passing a FDR threshold of for example 5% (arg - protFDR). The protein groups passing the FDR threshold will then be used to subset the unmodified library, and the subsetted library can then be used for a second-pass search. Another option is to subset the unmodified library, using the identified protein groups in the Calibration Library built using MSLibrarian. The third option is to include a character vector of protein accession numbers (Uniprot) to directly filter the library. This approach could be beneficial to

To perform protein group subsetting with DIA-NN, we define the following parameters:

inputLib = "D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv" # input spectral library
mods = c("protein") # Modifies the library on the protein group level
protMod = "diann" # Type of protein modification.
protFdr = 0.01 # Protein group FDR to use to subset the input library
diannPath = "C:/Program Files/DIA-NN_1.8/DiaNN.exe"

To subset the library on protein level, run MSLibrarian::mod.spectral.lib():

mod.spectral.lib(projectFolder = projectFolder, inputLib = inputLib, diaFiles = diaFiles, mods = mods, protMod = protMod, protFdr = protFdr, diannPath = diannPath)

Click to see the function output log!

[1] "Found the library file: D:/demo_mslibr_yeast/library/calibration_lib.RData in project folder: D:/demo_mslibr_yeast"
[1] "Input spectral library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"
[1] "Applying protein ID subsetting..."
[1] "DIA-NN path: C:/Program Files/DIA-NN_1.8/DiaNN.exe"
[1] "Loading DIA-NN report..."
[1] 0.01`
[1] "Extracting protein group ID at a FDR = 0.01"
[1] "Unique protein groups in DIA-NN report after filtering: 4876"
[1] "Importing OpenSwath library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"
|=========================================================================================================================================| 100% 944 MB
[1] "Unique protein groups in input spectral library: 7269"
[1] "Number of matching protein groups in the input library: 5294 ( 72.83% )"
[1] "Subsetting library"
|=========================================================================================================================================| 100% 944 MB
[1] "Library written to: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit_protein_filter_diann_0.01.tsv"

Show the folder with the new library

Transition subsetting

Click to expand!

Predicted libraries may contain many transitions with relative intensities close to zero. There may also be targets/precursors in the library that has very few transitions which makes MS2-based quantification difficult. With the functionmod.spectral.lib() it is possible to apply different transition filters to subset a library.

In this example, we will set the following parameters to perform transition filtering:

mods = c("transition") # Modify spectral library on transition level
topTrans = 14 # Maximum 14 transitions per library target/precursor
cutoffTrans = 0.01 # Minimum relative intensity for a transition
minTrans = 6 # Minimum transitions that a library target/precursor must have

To create a transition-subsetted library we run:

mod.spectral.lib(projectFolder = projectFolder, inputLib = inputLib, mods = mods, topTrans = topTrans, cutoffTrans = cutoffTrans, minTrans = minTrans)

Expand to see function output log!

[1] "Found the library file: D:/demo_mslibr_yeast/library/calibration_lib.RData in project folder: D:/demo_mslibr_yeast" [1] "Input spectral library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv" [1] "Applying transition filtering..." [1] "Importing OpenSwath library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv" |=========================================================================================================================================| 100% 944 MB [1] "Found 228491 precursors and 5799831 transitions in the input library..." [1] "Selecting top 14 most intense transitions for each precursor..." [1] "Performing transition filtering on: 180543 entries ( 79.02 % )" [1] "Library now contains: 3089857 transitions ( 53.27% of initial library )" [1] "Removes transitions with a relative intensity < 0.01" [1] "Filtered library contains: 2968145 transitions ( 51.18% of initial library )" [1] "Removing all transitions for precursors having less than 6 transitions." [1] "Library now contains: 2968116 transitions ( 51.18% of initial library )" |=========================================================================================================================================| 100% 944 MB [1] "Writing library to: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit_topN_14_cutoff_0.01_trans_6.tsv"

Retention time replacement

Click to expand!

MSLibrarian allows for the replacement of retention times in a library from the default Prosit iRT to either DeepLC iRT or RT(minutes). A benefit of using DeepLC over Prosit, is the ability to calibrate the retention time predictions with peptides of known retention times, which may increase the prediction accuracy. When carrying out retention time replacement with MSLibrarian, a small proportion of peptides of known retention times will be extracted from the Calibration Library that is created in the beginning. These peptides are then used to calibrate DeepLC predictions of either iRT or RT values for each target in the library.

To replace retention times in a library, there is only one parameter that needs to be specified.

mods = c("rt")

Then we simply run MSLibrarian::mod.spectral.lib():

mod.spectral.lib(projectFolder = projectFolder, inputLib = inputLib, mods = mods)

Click to see the function output log!

[1] "Found the library file: D:/demo_mslibr_yeast/library/calibration_lib.RData in project folder: D:/demo_mslibr_yeast"
[1] "Input spectral library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"
[1] "Replacement of retention times enabled..."
[1] "Importing calibration library: D:/demo_mslibr_yeast/library/calibration_lib.RData"
[1] "Output folder set to: D:/demo_mslibr_yeast/library"
[1] "Writing filtered output library to: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_rt_deeplc_0.25.tsv"
[1] "Creating results folder: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit_deeplc_0.25"
[1] "Importing OpenSwath library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"
|=========================================================================================================================================| 100% 944 MB
[1] "Found: 228491 precursors in the input library..."
[1] "Importing the calibration library..."
[1] "Sampling 3886 peptide sequences ( 25 % ) with known retention times to use for calibration of DeepLC predictions..."
[1] "Writes csv file for prediction of retention times for library peptides"
[1] "Writes csv file for prediction of retention times for benchmark peptides"
[1] "Writes csv file for calibration of the retention time predictions"
[1] "Searching for DeepLC GUI installation...this may take a few seconds..."
[1] "Found DeepLC GUI installation..."
---DeepLC log---
[1] "Loading DeepLC results in folder: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit_deeplc_0.25"
[1] "Saving retention time benchmark plot to D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit_deeplc_0.25/bench.pdf"
Saving 7 x 7 in image
[1] "Importing OpenSwath library: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_irt_prosit.tsv"
|=========================================================================================================================================| 100% 944 MB
[1] "Replacing library retention times with DeepLC predicted retention times..."
[1] "Writing new library to: D:/demo_mslibr_yeast/library/Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_rt_deeplc_0.25.tsv"

A new library with replaced retention times is then outputted (Aug_11_10_20_32_2021_mslibrarian_ce_length_charge_rt_deeplc_0.25.tsv). Apart from the library, a figure is outputted which shows the correlation between experimental iRTs for all targets in the Calibration Library and the corresponding DeepLC-predicted iRT values.

image

Type ?mod.spectral.lib in the R Console to read the documentation on the mod.spectral.lib