Skip to content

Commit

Permalink
Merge pull request #47 from bigbio/dev
Browse files Browse the repository at this point in the history
Improve parameters, logging, examples files and feature detection for mzmml statistics.
  • Loading branch information
ypriverol authored Mar 10, 2025
2 parents 56f0d67 + b4ef7ee commit 66de992
Show file tree
Hide file tree
Showing 21 changed files with 806 additions and 591,648 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,6 @@ cython_debug/
/tests/test_data/diann2mztab/RD139_Narrow_UPS1_0_1fmol_inj2.mzML
/tests/test_data/diann2mztab/RD139_Narrow_UPS1_0_25fmol_inj1.mzML
/tests/test_data/diann2mztab/RD139_Narrow_UPS1_0_25fmol_inj2.mzML

.qodo
/tests/test_data/RD139_Narrow_UPS1_0_1fmol_inj1.mzML
5 changes: 5 additions & 0 deletions .markdownlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"MD033": {
"allowed_elements": ["details", "summary"]
}
}
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,55 @@ The following functionalities are available in the package:
- `psmconvert` - The convert_psm function converts peptide spectrum matches (PSMs) from an idXML file to a CSV file, optionally filtering out decoy matches. It extracts and processes data from both the idXML and an associated spectra file, handling multiple search engines and scoring systems.
- `mzmlstats` - The `mzmlstats` processes mass spectrometry data files in either `.mzML` or `Bruker .d` formats to extract and compile statistics about the spectra. It supports generating detailed or ID-only CSV files based on the spectra data.

#### mzml statistics

quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed.

<details>
<summary>MS info and details</summary>

`mzmlstats` allows the user to produce a file containing all features for every signal in the MS/MS experiment. The produced file is a parquet file, with the original name of the file plus the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used:

- `scan`: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan).
- `ms_level`: The MS level of the signal, 1 for MS and 2 for MS/MS.
- `num_peaks`: The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`.
- `base_peak_intensity`: The max intensity in the spectrum (MS or MS/MS).
- `summed_peak_intensities`: The sum of all intensities in the spectrum (MS or MS/MS).
- `rt`: The retention time of the spectrum, capture with pyopenms with `spectrum.getRT()`.

For MS/MS signals, we have the following additional columns:

- `precursor_charge`: The charge of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getCharge()`.
- `precursor_mz`: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getMZ()`.
- `precursor_intensity`: The intensity of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getIntensity()`. If the precursor is not annotated (present), we use the purity object to get the information; see note below.
- `precursor_rt`: The retention time of the precursor ion, if the signal is MS/MS. See note below.
- `precursor_total_intensity`: The total intensity of the precursor ion, if the signal is MS/MS. See note below.

> [!NOTE]
>
> For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopnems code:
> ```python
> precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index)
> precursor_rt = precursor_spectrum.getRT()
> purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True)
> precursor_intensity = purity.target_intensity
> total_intensity = purity.total_intensity
> ```
</details>
<details>
<summary>MS2 info and details</summary>
`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak. The produced file is a parquet file, with the original name of the file plus the following postfix `{file_name}_ms2_info.parquet`. Here, the definition of each column and how they are estimated and used:
- `scan`: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan).
- `ms_level`: The MS level of the signal, all of them will be 2.
- `mz_array`: The m/z array of the peaks in the MS/MS signal. Capture with pyopenms with `mz_array, intensity_array = spectrum.get_peaks()`.
- `intensity_array`: The intensity array of the peaks in the MS/MS signal. Capture with pyopenms with `mz_array, intensity_array = spectrum.get_peaks()`.
</details>
## Contributions and issues
Contributions and issues are welcome. Please, open an issue in the [GitHub repository](https://github.com/bigbio/quantms) or PR in the [GitHub repository](https://github.com/bigbio/quantms-utils).
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "quantms-utils"
description = "Python scripts and helpers for the quantMS workflow"
readme = "README.md"
license = "MIT"
version = "0.0.19"
version = "0.0.20"
authors = [
"Yasset Perez-Riverol <[email protected]>",
"Dai Chengxin <[email protected]>",
Expand Down
2 changes: 1 addition & 1 deletion quantmsutils/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.0.19"
__version__ = "0.0.20"
Loading

0 comments on commit 66de992

Please sign in to comment.