Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parameters, logging, examples files and feature detection for mzmml statistics. #47

Merged
merged 33 commits into from
Mar 10, 2025

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Mar 7, 2025

Summary by CodeRabbit

  • Documentation

    • Added a new section detailing mzML statistics processing, including an overview of output file structure and key spectral fields.
  • New Features

    • Enhanced mzML statistics now include additional output details for comprehensive spectral analysis.
    • Introduced logging functionality across various scripts for improved error handling and tracking.
  • Chores

    • Package version updated to 0.0.20.
    • Added .qodo to .gitignore to prevent tracking of specific files.
    • New configuration file .markdownlint.json added for Markdown linting rules.

Copy link
Contributor

coderabbitai bot commented Mar 7, 2025

Walkthrough

This pull request introduces two new entries in the .gitignore file to ignore .qodo files and a specific mzML file. It adds a section in the README.md detailing mzML statistics functionality and updates the package version across several files from 0.0.19 to 0.0.20. The changes include formatting improvements for code readability and the introduction of logging configurations in multiple modules to enhance error reporting and debugging. Additionally, new constants are defined, and test paths are updated for better maintainability.

Changes

Files Change Summary
.gitignore Added new entries .qodo and /tests/test_data/RD139_Narrow_UPS1_0_1fmol_inj1.mzML to ignore specific files.
README.md Added new "mzml statistics" section detailing the mzML stats functionality and output file structure.
pyproject.toml, recipe/meta.yaml, quantmsutils/__init__.py Updated version number from 0.0.19 to 0.0.20.
quantmsutils/diann/diann2mztab.py Reformatted function/method signatures and internal code for enhanced readability.
quantmsutils/diann/dianncfg.py, quantmsutils/mzml/mzml_statistics.py, quantmsutils/psm/psm_conversion.py, quantmsutils/sdrf/check_samplesheet.py, quantmsutils/sdrf/extract_sample.py Introduced logging functionality with configured formats and levels; improved error messaging and schema updates in mzML processing.
quantmsutils/utils/constants.py Updated constants related to precursor ion characteristics and added new constants.
tests/test_commands.py Updated test file paths using constants and adjusted file loading order in tests.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant M as mzML Processor
    participant L as Logger
    participant O as Output Writer

    U->>M: Initiate mzML statistics generation
    M->>M: Read and process spectra (transform_mzml_spectrum)
    M->>L: Log detailed precursor, error, and progress info
    M->>O: Write batch data to Parquet file
    O-->>M: Confirm successful write
    M->>L: Log success message
Loading

Possibly related PRs

  • Support for version 1.9.2 #33: The changes in the main PR, which involve updating the version number in the quantmsutils/__init__.py file, are related to the changes in the retrieved PR, which also update the version number in the same file.

Suggested reviewers

  • jpfeuffer
  • daichengxin

Poem

I'm a rabbit in the codefield, hopping with delight,
Skipping through changes that shine so bright.
From logging trails to neatly aligned rows,
My carrot of code steadily grows.
With every version bump and new constant hue,
I cheerfully celebrate the changes with you!
🥕🐇


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
quantmsutils/sdrf/check_samplesheet.py (1)

184-184: 💡 Verification agent

🧩 Analysis chain

Consider addressing the TODO comment.

There's a TODO comment about validating the expdesign file, but the function check_expdesign already exists and is called below. It appears the TODO might be stale.


🏁 Script executed:

#!/bin/bash
# Check if there are issues or tasks related to the validation of expdesign files
grep -r "TODO.*expdesign" --include="*.py" .

Length of output: 122


Stale TODO Comment: Remove or Update Expdesign Validation Note
The TODO in quantmsutils/sdrf/check_samplesheet.py (line 184) appears to be redundant since the check_expdesign function is already implemented and used below. Please remove (or update) this comment to avoid confusion.

🧹 Nitpick comments (4)
quantmsutils/diann/dianncfg.py (1)

86-89: Consider replacing remaining print statements with logging.

For consistency, consider replacing the remaining print statements with logging calls, similar to what was done at lines 65-68.

-                print(
-                    "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
-                    + mod
-                )
+                logging.info(
+                    "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
+                    + mod
+                )

And similarly for the print statement at lines 101-104.

Also applies to: 101-104

README.md (1)

35-48: Great addition of documentation for mzML statistics.

The detailed explanation of the mzML statistics functionality and output format is very helpful for users.

Small formatting suggestion: ensure consistent spacing in bullet points for better readability.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)

quantmsutils/mzml/mzml_statistics.py (2)

13-13: Remove unused import.

The import import pyopenms as oms appears to be unused since pyopenms is already imported as MzMLFile.

-import pyopenms as oms
🧰 Tools
🪛 Ruff (0.8.2)

13-13: pyopenms imported but unused

Remove unused import: pyopenms

(F401)


109-133: Good handling of multiple precursors.

The code now properly handles spectra with multiple precursors, which is important for accurate data processing. However, the manual index incrementing could be simplified using enumerate.

-                index = 0
-                for precursor in spectrum.getPrecursors():
+                for index, precursor in enumerate(spectrum.getPrecursors()):
                     logging.info(
                         "Precursor charge: %s, precursor mz: %s",
                         precursor.getCharge(),
                         precursor.getMZ(),
                     )
                     charge_state = precursor.getCharge()
                     exp_mz = precursor.getMZ()
                     intensity = precursor.getIntensity()
                     precursors.append(
                         {
                             "charge": charge_state,
                             "mz": exp_mz,
                             "intensity": intensity,
                             "index": index,
                         }
                     )
-                    index += 1
🧰 Tools
🪛 Ruff (0.8.2)

132-132: Use enumerate() for index variable index in for loop

(SIM113)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56f0d67 and d6a4a6b.

📒 Files selected for processing (12)
  • .gitignore (1 hunks)
  • README.md (1 hunks)
  • pyproject.toml (1 hunks)
  • quantmsutils/diann/diann2mztab.py (23 hunks)
  • quantmsutils/diann/dianncfg.py (2 hunks)
  • quantmsutils/mzml/mzml_statistics.py (11 hunks)
  • quantmsutils/psm/psm_conversion.py (2 hunks)
  • quantmsutils/sdrf/check_samplesheet.py (2 hunks)
  • quantmsutils/sdrf/extract_sample.py (1 hunks)
  • quantmsutils/utils/constants.py (1 hunks)
  • recipe/meta.yaml (1 hunks)
  • tests/test_commands.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

13-13: pyopenms imported but unused

Remove unused import: pyopenms

(F401)


132-132: Use enumerate() for index variable index in for loop

(SIM113)

🪛 LanguageTool
README.md

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
🔇 Additional comments (33)
.gitignore (1)

169-170: New Ignore Rule for .qodo Files
The addition of .qodo to the .gitignore file is clear and appropriate to prevent tracking of these files. Please verify that .qodo files are indeed generated by a build/tool process and that no other related file extensions need to be ignored.

pyproject.toml (1)

6-6: Version Bump Updated to 0.0.20
The version upgrade from "0.0.19" to "0.0.20" is correctly applied. This is consistent with the version bump in related files. Make sure that any CI/CD pipeline or documentation referencing the version is updated accordingly.

recipe/meta.yaml (1)

4-4: Meta File Version Update
The package version in recipe/meta.yaml has been updated to "0.0.20", which aligns perfectly with the update in pyproject.toml. Confirm that downstream processes (e.g., packaging and deployment jobs) are aware of this change.

quantmsutils/sdrf/extract_sample.py (1)

1-1: Good addition of logging facilities.

Adding logging configuration improves observability and will help with debugging.

Also applies to: 7-8

quantmsutils/utils/constants.py (1)

14-15: New constants added appropriately.

The new PRECURSORS and INTENSITY constants follow the consistent naming pattern of the existing constants.

quantmsutils/psm/psm_conversion.py (1)

1-1: Good addition of logging facilities.

Adding logging configuration is consistent with other files and enhances debugging capabilities.

Also applies to: 37-38

quantmsutils/sdrf/check_samplesheet.py (1)

5-5: Good addition of logging facilities.

The logging setup is consistent with the implementation in other files and will improve debugging capabilities.

Also applies to: 14-15

quantmsutils/diann/dianncfg.py (2)

6-6: Good addition of logging functionality.

Adding logging capabilities is a good practice for better error handling and traceability.

Also applies to: 13-14


65-68: Consistent use of logging for warnings.

Good job replacing the print statement with logging.info for warnings, which improves observability.

tests/test_commands.py (1)

1-2: Good use of Path for better file handling.

Using pathlib.Path for file paths is a good practice and more maintainable than string concatenation.

Also applies to: 7-7

quantmsutils/mzml/mzml_statistics.py (7)

1-1: Good addition of logging functionality.

Adding logging capabilities improves error handling and observability throughout the codebase.

Also applies to: 32-33


73-86: Improved docstring for better code documentation.

The enhanced docstring for consumeSpectrum method adds clarity about its functionality and parameters.


159-159: Added precursors field to data structure.

Good addition of the PRECURSORS field to both MS level 2 and level 1 spectra data structures.

Also applies to: 176-176


232-232: Improved error handling with better context.

Replacing print statements with logging.error and including the file path in error messages improves debugging capabilities.

Also applies to: 330-330


380-395: Well-structured schema for precursor data.

The new schema for PRECURSORS with a structured list is well-designed and provides good type safety.


420-421: Cleaner path construction.

Using Path.with_name() for creating output paths is more maintainable than string concatenation.


439-439: Added success logging message.

Good practice to log successful completion of processing, which helps with monitoring and troubleshooting.

quantmsutils/diann/diann2mztab.py (16)

51-59: Method Signature Update in diann2mztab:
The function signature now includes a trailing comma after the last parameter. This change helps keep diffs clean and maintains consistency in parameter formatting.


115-121: Lambda Function Formatting for Peptide Conversion:
The lambda used for converting peptide sequences in the MSstats conversion block has been reformatted into a multi‐line expression. The logic remains unchanged while readability has been significantly improved.


236-236: List Slicing Formatting in Experimental Design Parsing:
The slicing expression in the creation of s_table now uses an extra space for clarity (i.e. data[empty_row + 1 :][1:]). There is no functional change, only improved readability.


276-301: Conditional Logic Reformat in Mass Calculation:
The conditional block in compute_mass_modified_peptide (checking for allowed amino acids) has been reflowed over multiple lines. This reformat enhances clarity without affecting functionality.


381-387: Method Signature Update in convert_to_mztab:
The signature for convert_to_mztab now includes a trailing comma after the last parameter, ensuring consistency with other function declarations.


467-471: Update to Remain Columns List:
The list of columns (remain_cols) now explicitly includes "Global.PG.Q.Value" after "Precursor.Quantity". This addition is clearly formatted and ensures that the report is filtered using all expected columns.


496-499: Precursor m/z Calculation Formatting:
The calculation for "Calculate.Precursor.Mz" has been re-indented—wrapping the addition within parentheses improves readability while leaving the underlying logic untouched.


599-609: Concatenation Formatting for mzTab Settings:
The string concatenation for software setting entries (fragment and precursor mass tolerances) in the MTD sub‐table is now split over multiple lines. This change increases clarity with no alteration in functionality.


787-794: Protein Abundance Aggregation Refinement:
Within the loop aggregating protein abundance data by study variable, the computation and subsequent drop of columns is now formatted in a more streamlined manner. This refactor improves readability and confirms that the mean is calculated correctly.


762-764: Improved Concatenation in PRH Construction:
The call to pd.concat to merge protein detail rows with the existing PRH dataframe has been re-indented and reformatted for better clarity and maintainability.


883-889: Method Signature Update in mztab_peh:
The updated definition for mztab_peh now shows each parameter on its own line with a trailing comma. This formatting change enhances consistency across the codebase.


1059-1067: Refactored Merge for Decoy Incorporation in PEH:
The merge block that adds the "Decoy" column into out_mztab_peh has been reformatted over multiple lines. The updated formatting enhances readability while preserving the intended functionality.


1137-1145: DataFrame Column Formatting in PSH Construction:
Within the PSH sub‐table, the conversion of the "opt_global_spectrum_reference" column—by prepending "scan=" and ensuring data is cast to string—is now clearly separated. Also, the conversion of the RT values (dividing by 60 and ensuring the group's "RT" is of type float64) is now explicit, promoting consistency across DIA‐NN versions.


1291-1295: Spectra Reference String Reformatting in PSH:
The lambda that constructs the spectra reference string (concatenating "ms_run[{}]:" with the original reference) has been broken into two lines. This reformat makes the string assembly clearer without modifying its behavior.


1592-1594: Method Signature Update in calculate_protein_coverages:
The function header now explicitly formats its parameters (with a trailing comma) across multiple lines. This small stylistic update improves consistency with other parts of the code.


1580-1582: Enhanced Readability in Coverage Calculation:
The merging of overlapping intervals during the protein coverage calculation (in calculate_coverage) has been reformatted for clarity, making the computation easier to follow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)

32-33: Consider adjusting logging level for production

Setting the logging level to DEBUG might generate excessive output in a production environment. Consider using INFO as the default level and allowing it to be configurable based on the environment.

-logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG)
+logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.INFO)

110-132: Use enumerate() instead of manual index incrementing

When iterating through a collection and tracking an index, it's more Pythonic to use enumerate() rather than manually incrementing an index variable.

-            index = 0
-            for precursor in spectrum.getPrecursors():
+            for index, precursor in enumerate(spectrum.getPrecursors()):
                 logging.info(
                     "Precursor charge: %s, precursor mz: %s",
                     precursor.getCharge(),
                     precursor.getMZ(),
                 )
                 charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus
                 exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus
                 intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus
                 rt = spectrum.getRT() # TODO: Review by @timo and @julianus
                 precursors.append(
                     {
                         "charge": charge_state,
                         "mz": exp_mz,
                         "intensity": intensity,
                         "rt": rt,
                         "index": index,
                     }
                 )
-                index += 1
🧰 Tools
🪛 Ruff (0.8.2)

131-131: Use enumerate() for index variable index in for loop

(SIM113)


113-117: Use logger instance instead of logging module directly

You've created a logger instance but are using the logging module directly here. Use the logger instance for consistency.

-                logging.info(
+                logger.info(
                     "Precursor charge: %s, precursor mz: %s",
                     precursor.getCharge(),
                     precursor.getMZ(),
                 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6a4a6b and 21f43f0.

📒 Files selected for processing (8)
  • quantmsutils/__init__.py (1 hunks)
  • quantmsutils/diann/dianncfg.py (2 hunks)
  • quantmsutils/mzml/mzml_statistics.py (11 hunks)
  • quantmsutils/psm/psm_conversion.py (2 hunks)
  • quantmsutils/sdrf/check_samplesheet.py (2 hunks)
  • quantmsutils/sdrf/extract_sample.py (1 hunks)
  • quantmsutils/utils/constants.py (1 hunks)
  • tests/test_commands.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • quantmsutils/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • quantmsutils/sdrf/extract_sample.py
  • quantmsutils/psm/psm_conversion.py
  • quantmsutils/utils/constants.py
  • quantmsutils/diann/dianncfg.py
  • tests/test_commands.py
  • quantmsutils/sdrf/check_samplesheet.py
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

131-131: Use enumerate() for index variable index in for loop

(SIM113)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (10)
quantmsutils/mzml/mzml_statistics.py (10)

28-29: Added new constants for schema consistency

Adding PRECURSORS and INTENSITY constants is a good improvement for maintaining schema definition consistency throughout the codebase.


74-87: Improved documentation with clear docstring format

The enhanced docstring with parameters section provides much better context for the method's functionality.


157-157: Added new fields to row_data for improved MS2 spectrum information

The addition of intensity and precursors data to row_data provides more comprehensive information about MS2 spectra.

Also applies to: 159-159


176-176: Consistent handling for ms_level 1 spectra

Adding the PRECURSORS field with a None value for MS level 1 spectra ensures schema consistency across all spectra records.


232-232: Improved error handling with logging

Using the logger for error messages with file path context is a significant improvement for debugging and traceability.


292-308: Well-structured schema definition for precursors data

The nested schema structure for precursors is well-defined and properly includes all necessary fields with appropriate data types.


346-346: Consistent error handling with proper context

Using logging consistently for error handling with file path information improves the diagnostic capabilities of the code.


397-413: Schema consistency between mzML and Bruker .d file handling

Maintaining consistent schema structures between different file formats ensures uniform data representation across the system.


437-438: Improved path handling with Path.with_name()

Using Path.with_name() is cleaner and more robust than string manipulation for constructing output paths.


456-457: Added success message for better operational visibility

Adding a success log message after processing provides valuable operational feedback and makes monitoring easier.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
quantmsutils/mzml/mzml_statistics.py (1)

105-128: 🛠️ Refactor suggestion

Remove TODO comments and use enumerate for iteration.

The code contains several TODO comments that should be addressed, and the manual index management can be simplified.

-            precursors = []
-            # capture if more than one precursor
-            index = 0
-            for precursor in spectrum.getPrecursors():
+            precursors = []
+            # capture if more than one precursor
+            for index, precursor in enumerate(spectrum.getPrecursors()):
                 logging.info(
                     "Precursor charge: %s, precursor mz: %s",
                     precursor.getCharge(),
                     precursor.getMZ(),
                 )
-                charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus
-                exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus
-                intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus
-                rt = spectrum.getRT() # TODO: Review by @timo and @julianus
+                charge_state = precursor.getCharge()
+                exp_mz = precursor.getMZ()
+                intensity = precursor.getIntensity()
+                # rt is already calculated outside the loop and doesn't change per precursor
                 precursors.append(
                     {
                         "charge": charge_state,
                         "mz": exp_mz,
                         "intensity": intensity,
                         "rt": rt,
                         "index": index,
                     }
                 )
-                index += 1
🧰 Tools
🪛 Ruff (0.8.2)

127-127: Use enumerate() for index variable index in for loop

(SIM113)

🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)

175-175: Fix truncated log message.

The log message is incomplete and might cause confusion.

-            logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not ", ms_level)
+            logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not found", ms_level)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc7f86 and 647b9c3.

📒 Files selected for processing (1)
  • quantmsutils/mzml/mzml_statistics.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

127-127: Use enumerate() for index variable index in for loop

(SIM113)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (11)
quantmsutils/mzml/mzml_statistics.py (11)

1-1: Good addition of logging configuration.

Adding structured logging with appropriate levels helps with debugging and monitoring application behavior.

Also applies to: 32-34


28-29: Good addition of new constants.

Adding PRECURSORS and INTENSITY constants maintains consistency with other field naming in the codebase.


73-87: Well-documented function with improved docstring.

The enhanced docstring clearly explains the function's purpose, behavior, and parameters.


89-96: Improved peak extraction and intensity calculations.

The refactored code now directly uses get_peaks() to extract mz and intensity arrays, improving readability and potentially performance.


153-155: Good addition of precursor and intensity information.

Adding precursor details and intensity values to the row data enriches the output and provides more comprehensive information for analysis.

Also applies to: 172-172


229-229: Improved error handling with context information.

Adding file path information to the error message provides better context for troubleshooting.


289-304: Well-structured schema for precursor data.

The schema definition correctly handles the complex nested structure of precursor data, properly defining field types and nullability.


343-343: Enhanced error logging with file path context.

Adding file path information to error logs significantly improves troubleshooting capabilities.


394-409: Consistent schema definition across functions.

The precursor schema definition is consistently applied in the batch_write_bruker_d function, maintaining a uniform data structure throughout the application.


434-435: Improved file path handling using Path methods.

Using Path.with_name() instead of string manipulation improves readability and reliability of path construction.


453-453: Added success logging.

Adding success logging helps track the application flow and confirms successful file processing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_commands.py (1)

151-153: ⚠️ Potential issue

Missing test file causes pipeline failure.

Referencing BSA1_F1_test_ms_info.parquet leads to a FileNotFoundError. You must either include this file in test_data or remove/change this reference to a valid existing file.

-table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")
+# table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")
🧰 Tools
🪛 GitHub Actions: Python application

[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🪛 GitHub Actions: Python package

[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🧹 Nitpick comments (4)
tests/test_commands.py (1)

157-157: Consider verifying table contents rather than just length.

A simple length comparison might mask data discrepancies. For more robust testing, consider comparing column-level data or using a DataFrame method like pd.testing.assert_frame_equal(table1, table2).

quantmsutils/mzml/mzml_statistics.py (3)

5-5: Remove unused import.

typing.Optional is imported but never used in this file.

-from typing import Optional, Set, Union
+from typing import Set, Union
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


207-211: Unify logging usage.

Lines in this method use both logging.info and logger.info. To maintain consistency and configurability, consider replacing any direct logging.info calls with logger.info.

-logging.info(
+logger.info(

239-239: Use 'enumerate()' to handle index.

Refactor the loop to leverage Pythonic iteration and avoid manual index increments.

-index = 0
-for precursor in spectrum.getPrecursors():
+for index, precursor in enumerate(spectrum.getPrecursors()):
🧰 Tools
🪛 Ruff (0.8.2)

239-239: Use enumerate() for index variable index in for loop

(SIM113)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 647b9c3 and 68ac536.

📒 Files selected for processing (2)
  • quantmsutils/mzml/mzml_statistics.py (10 hunks)
  • tests/test_commands.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

5-5: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


239-239: Use enumerate() for index variable index in for loop

(SIM113)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
tests/test_commands.py (1)

7-7: Good practice using a project-level constant for tests directory.

Defining TESTS_DIR = Path(__file__).parent promotes clarity and maintainability when referencing relative testing paths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_commands.py (1)

158-158: ⚠️ Potential issue

Missing file breaks the test execution.
The Parquet file BSA1_F1_test_ms_info.parquet does not appear to exist, causing a FileNotFoundError. To fix, either add the file to tests/test_data or compare with ms_info_path if the data sets are intended to be the same.

-    table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")
+    table1 = pd.read_parquet(ms_info_path)
🧰 Tools
🪛 GitHub Actions: Python application

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🪛 GitHub Actions: Python package

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🧹 Nitpick comments (7)
tests/test_commands.py (1)

149-153: Clarify the comment for file removal.
The comment, "check if the file exist, delete it," is a bit ambiguous since it's happening for ms_info_path and not mzml_path. Consider updating the comment to reflect which file is truly being checked and removed.

-    # check if the file exist, delete it
+    # check if the ms_info_path file exists and delete it
quantmsutils/mzml/mzml_statistics.py (6)

5-5: Remove unused imports Tuple and Union.
These imports are not used, as flagged by static analysis.

-from typing import Dict, List, Optional, Set, Tuple, Union
+from typing import Dict, List, Optional, Set
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.Tuple imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)


13-13: Remove unused import Schema.
The Schema import from pyarrow is not directly referenced and can be removed.

-from pyarrow import Schema
🧰 Tools
🪛 Ruff (0.8.2)

13-13: pyarrow.Schema imported but unused

Remove unused import: pyarrow.Schema

(F401)


22-22: Remove unused import MONOISOTOPIC_MZ.
This constant from quantmsutils.utils.constants is not being referenced in the code.

-    MONOISOTOPIC_MZ,
🧰 Tools
🪛 Ruff (0.8.2)

22-22: quantmsutils.utils.constants.MONOISOTOPIC_MZ imported but unused

Remove unused import

(F401)


30-30: Remove unused import MAX_INTENSITY.
It is never used in this file.

-    MAX_INTENSITY,
🧰 Tools
🪛 Ruff (0.8.2)

30-30: quantmsutils.utils.constants.MAX_INTENSITY imported but unused

Remove unused import

(F401)


351-351: Remove or utilize the intensity variable.
Currently, intensity is assigned but never used, which is flagged by static analysis. Either remove it or incorporate it into the logic if needed.

-            intensity = precursor.getIntensity()
🧰 Tools
🪛 Ruff (0.8.2)

351-351: Local variable intensity is assigned to but never used

Remove assignment to unused variable intensity

(F841)


609-609: Preserve original exception context.
When re-raising exceptions, consider using raise click.Abort() from e to attach the original context.

-        raise click.Abort()
+        raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

609-609: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ac536 and 9dd2e1f.

📒 Files selected for processing (2)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
  • tests/test_commands.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

5-5: typing.Tuple imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)


13-13: pyarrow.Schema imported but unused

Remove unused import: pyarrow.Schema

(F401)


22-22: quantmsutils.utils.constants.MONOISOTOPIC_MZ imported but unused

Remove unused import

(F401)


30-30: quantmsutils.utils.constants.MAX_INTENSITY imported but unused

Remove unused import

(F401)


351-351: Local variable intensity is assigned to but never used

Remove assignment to unused variable intensity

(F841)


609-609: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
tests/test_commands.py (3)

1-1: Good import usage.
Using pathlib.Path for building file paths is more reliable and portable than using raw strings.


7-7: Initialization of TESTS_DIR looks good.
Defining a base directory for tests promotes maintainability and consistency across all test assets.


162-162: Verify the correctness of the final assertion.
If you're comparing two different Parquet outputs, confirm that both files indeed exist and represent the same or comparable data. Otherwise, the test will fail due to the missing second file or mismatching data.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_commands.py (1)

158-159: ⚠️ Potential issue

Fix the missing test file causing pipeline failures.

The pipeline is failing because the file BSA1_F1_test_ms_info.parquet doesn't exist. You need to either create this file or update the test to use an existing file.

Apply this fix:

-    table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")
+    table1 = pd.read_parquet(ms_info_path)
🧰 Tools
🪛 GitHub Actions: Python application

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🪛 GitHub Actions: Python package

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'

🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)

1-32: Improved logging configuration and imports.

Good addition of proper logging configuration. This will help with debugging and monitoring. However, there are some unused imports from the typing module that should be cleaned up.

-from typing import Dict, List, Optional, Set, Tuple, Union
+from typing import Dict, Optional, Set
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.List imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)


27-27: quantmsutils.utils.constants.MAX_INTENSITY imported but unused

Remove unused import: quantmsutils.utils.constants.MAX_INTENSITY

(F401)


27-27: Remove unused constant import.

The MAX_INTENSITY constant is imported but never used in this file.

-    MAX_INTENSITY, PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW,
+    PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)

27-27: quantmsutils.utils.constants.MAX_INTENSITY imported but unused

Remove unused import: quantmsutils.utils.constants.MAX_INTENSITY

(F401)


599-601: Improve exception handling with exception chaining.

When re-raising exceptions, use explicit exception chaining to preserve the original traceback.

-        logger.error(f"Error processing file {ms_path}: {e}")
-        raise click.Abort()
+        logger.error(f"Error processing file {ms_path}: {e}")
+        raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

601-601: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9dd2e1f and de2130c.

📒 Files selected for processing (3)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
  • quantmsutils/utils/constants.py (1 hunks)
  • tests/test_commands.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • quantmsutils/utils/constants.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

5-5: typing.List imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)


27-27: quantmsutils.utils.constants.MAX_INTENSITY imported but unused

Remove unused import: quantmsutils.utils.constants.MAX_INTENSITY

(F401)


601-601: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (3)

1-8: Improved test file organization with Path and constants.

Good refactoring to use pathlib.Path and the TESTS_DIR constant. This makes the test file paths more maintainable and platform-independent.


148-156: Clear test file path management using TESTS_DIR.

The updates to use TESTS_DIR for file paths improve readability and maintainability. The file existence check and deletion before test is a good practice to ensure clean test runs.


162-162:

❓ Verification inconclusive

Updated assertion to compare table lengths instead of using compare().

This change from checking table1.compare(table2).empty to comparing lengths is more lenient. Make sure this doesn't hide potential differences in the table contents that should be verified.


🏁 Script executed:

#!/bin/bash
# Check the previous version of the test to see what was being verified
git show HEAD^ -- tests/test_commands.py | grep -A 5 "assert.*table"

Length of output: 100


Action: Manual Verification Required on Table Assertion Change

The recent update in tests/test_commands.py (line 162) replaces the stricter content comparison (table1.compare(table2).empty) with a simpler length check (assert len(table2) == len(table1)). This update is more lenient and might miss discrepancies in the actual table data. Additionally, the earlier version of the test could not be retrieved due to the absence of a valid previous commit (error with HEAD^), so it's unclear how comprehensive the original check was.

  • Review Area: Confirm whether comparing only the table lengths is sufficient or if a more detailed comparison is needed to capture potential data differences.
  • Additional Action: Manually inspect the commit history for tests/test_commands.py (or use another valid reference) to understand the intended behavior of the previous assertion.
quantmsutils/mzml/mzml_statistics.py (5)

34-64: Well-structured schema definitions with clear documentation.

The separation of schemas into create_ms_schema() and create_id_schema() functions with proper documentation is a good practice. This makes the code more maintainable and easier to understand.


207-373: Comprehensive refactoring of spectrum transformation logic.

The transform_mzml_spectrum method is a significant improvement over the previous implementation. It:

  1. Correctly handles MS level filtering
  2. Properly extracts peak data using get_peaks()
  3. Includes detailed precursor data extraction
  4. Handles edge cases and logs errors appropriately

The code is well-structured with helpful comments explaining the processing logic.


374-420: Enhanced batch writing with improved error handling.

The _write_batch method improvements include better error logging and a more efficient approach to writing batches. The use of RecordBatch.from_pylist is more memory-efficient than converting through pandas first.


481-487: Include batch size in BatchWritingConsumer instantiation.

Good improvement to pass the batch_size parameter to the BatchWritingConsumer. This allows for more flexibility in controlling memory usage during processing.


489-496: Added acquisition datetime retrieval with fallback logic.

The improved acquisition datetime handling with fallback options is a good enhancement that ensures this metadata is captured when available.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_commands.py (1)

137-142: ⚠️ Potential issue

Fix the reference to the missing test file.

The code is still trying to read BSA1_F1_test_ms_info.parquet which was previously identified as missing in the repository. This will cause test failures.

-    table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")
-    table2 = pd.read_parquet(ms_info_path)
-    table2 = table2.set_index("scan")
-    table1 = table1.set_index("scan")
-    assert len(table2) == len(table1)
+    table1 = pd.read_parquet(ms_info_path)
+    table1 = table1.set_index("scan")
+    # Additional assertions can be added here to validate the content of the file
+    assert len(table1) > 0

Or, if two different files are truly needed, ensure that BSA1_F1_test_ms_info.parquet exists in the test_data directory.

🧹 Nitpick comments (5)
README.md (1)

39-48: Well-structured documentation with clear examples.

The collapsible details section with information about the MS info output format is helpful. Consider adding a space after the bullet points for consistent formatting.

-... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: 
+... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used:

-... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). 
+... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan).

-... The MS level of the signal, 1 for MS and 2 for MS/MS.
+... The MS level of the signal, 1 for MS and 2 for MS/MS.

-... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`.
+... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)

quantmsutils/mzml/mzml_statistics.py (4)

1-1: Good addition of logging configuration.

Adding proper logging setup with appropriate formatting is a significant improvement for debugging and monitoring. However, consider removing the unused List import from typing.

-from typing import Dict, List, Optional, Set
+from typing import Dict, Optional, Set

Also applies to: 5-5, 32-34


387-424: Enhanced batch writing with better error handling.

The batch writing method now includes better error handling and logging of exceptions, which will make debugging issues easier.

When raising exceptions in an except block, consider using raise ... from e to preserve the original exception context:

-            logger.error(f"Error during batch writing: {e}, file path: {self.output_path}")
-            raise
+            logger.error(f"Error during batch writing: {e}, file path: {self.output_path}")
+            raise Exception(f"Batch writing failed") from e

511-513: Improve exception handling in streaming function.

Similar to the previous comment, consider preserving the exception context when raising exceptions inside exception handlers.

-    except Exception as e:
-        logger.error(f"Error during mzML streaming: {e}, file path: {file_name}")
-        return None
+    except Exception as e:
+        logger.error(f"Error during mzML streaming: {e}, file path: {file_name}")
+        return None

602-604: Improve CLI exception handling.

When raising in an except block, use raise ... from e to preserve the exception context:

-    except Exception as e:
-        logger.error(f"Error processing file {ms_path}: {e}")
-        raise click.Abort()
+    except Exception as e:
+        logger.error(f"Error processing file {ms_path}: {e}")
+        raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32914fe and 4c88bb8.

📒 Files selected for processing (3)
  • README.md (1 hunks)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
  • tests/test_commands.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

5-5: typing.List imported but unused

Remove unused import: typing.List

(F401)


604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 LanguageTool
README.md

[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (2)

1-2: Good use of the pathlib module and constant definition.

Using Path from the pathlib module and defining a TESTS_DIR constant are good practices to make the test code more maintainable and portable.

Also applies to: 7-8


129-135: Check file path construction and existence.

The file path construction using TESTS_DIR is a good improvement, but there appears to be a check for deleting the main ms_info file before running the test. Make sure this is intentional and that the file is correctly regenerated during the test.

README.md (1)

35-38: Good documentation addition for mzML statistics.

The new section provides valuable context about the mzML statistics functionality. Consider adding "the" before "quantms ecosystem" for improved grammar.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

quantmsutils/mzml/mzml_statistics.py (5)

36-66: Well-structured schema definitions with clear type annotations.

Creating dedicated functions for schema definitions improves code organization and reusability. The schemas are well-defined with appropriate field types and nullability.


68-141: Enhanced Bruker file processing with better error handling.

The improvements to the Bruker file processing include proper error handling, acquisition datetime retrieval, and better logging. This is a significant enhancement to the code quality.


209-269: Improved spectrum transformation with proper peak extraction.

The spectrum transformation method has been significantly improved with better error handling and extraction of peak data. The TODOs have been properly addressed as suggested in previous reviews.


270-274: Good handling of missing precursor intensity.

Using the calculated precursor intensity when the regular intensity is missing or zero is a good fallback strategy.


330-386: Well-implemented precursor data extraction.

The precursor data extraction is now more robust with proper error handling and logging of missing precursors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
quantmsutils/mzml/mzml_statistics.py (6)

68-141: Consider renaming column_exists for clarity
Because this function returns a set of columns instead of a boolean, a name like get_table_columns could be more intuitive. Otherwise, the chunk-based reading logic and writing to Parquet is well-implemented.


265-296: Ensure consistent usage of PRECURSOR_INTENSITY_WINDOW
It’s set to None in the MS1 branch. For consistency, consider handling it similarly in MS2 or removing it entirely if not needed.


298-316: MS1 branch appears clean
If there's overlap with the MS2 branch, consider a shared helper to reduce duplicated logic.


323-329: Regex-based scan ID extraction
If multiple matches occur, consider logging a warning or handling them specifically.


516-553: Path resolution logic
Consider refining behavior when multiple valid candidates are found. Currently, an error is raised, but you might provide a fallback or clearer warning message.


555-605: Preserve exception context
When re-raising via raise click.Abort(), consider using raise click.Abort() from e to maintain the original traceback context.

🧰 Tools
🪛 Ruff (0.8.2)

604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c88bb8 and 15ed5d7.

📒 Files selected for processing (1)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (13)
quantmsutils/mzml/mzml_statistics.py (13)

1-5: Imports look good
No issues found with these import additions.


12-12: No issues with this import
Importing pyopenms is appropriate here.


32-53: Nice introduction of logging and well-structured schema
Defining the logging configuration and creating a clear PyArrow schema is a solid foundation for handling spectral data.


56-65: Schema definitions are consistent and well-defined
The ID-only schema is clearly specified and matches expected data structures.


143-161: Same rename suggestion
The function name can be misleading since it returns a set of columns rather than a boolean.


165-196: Constructor docstring is comprehensive
All parameters are thoroughly explained, making the initialization straightforward to understand.


209-239: Implementation of spectrum transformation looks solid
Skipping unsupported MS levels and proceeding with MS1 or MS2 is clear and well-structured.


240-247: Correct intensity calculations
Base peak and total intensities are properly handled, including the case with zero peaks.


249-264: Conditional logic for MS2 detection is well-handled
Calling _extract_first_precursor_data for MS2 spectra aligns with the design.


330-386: Precursor purity logic
Catching and logging exceptions at debug level is a good choice for diagnosing purity calculation issues.


387-425: Incremental batch writing logic
Lazy initialization of parquet writers is an effective way to manage resources.


426-442: Finalize method
Properly flushes any remaining data and closes writers. No issues found.


444-514: Efficient streaming for mzML files
The chunk-based approach is likely performant for large files. Consider adding tests for varied file sizes and missing metadata scenarios.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)

555-604: CLI command with improved error handling, but missing exception chaining

The CLI command has been enhanced with improved documentation and error handling. However, there's a static analysis warning about exception chaining being missing.

At line 604, implement proper exception chaining to preserve the original traceback:

     except Exception as e:
         logger.error(f"Error processing file {ms_path}: {e}")
-        raise click.Abort()
+        raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15ed5d7 and 68d9e89.

📒 Files selected for processing (2)
  • .markdownlint.json (1 hunks)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .markdownlint.json
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

604-604: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (9)
quantmsutils/mzml/mzml_statistics.py (9)

1-33: Logging configuration added - good improvement!

The addition of proper logging configuration with levels and formatting enhances the overall code quality. This is consistent with similar enhancements made in other files mentioned in the summary.


36-66: Well-structured schema definitions with proper type annotations

The new create_ms_schema() and create_id_schema() functions provide clear, reusable schema definitions with proper PyArrow types and nullability settings. This modular approach improves code organization.


68-141: Improved error handling and batch processing for Bruker files

The enhanced batch_write_bruker_d function now includes proper exception handling, logging, and batch processing. The code handles acquisition datetime retrieval and safely constructs SQL queries.


143-161: Well-documented helper function for column checking

The column_exists function provides a clean way to check for column existence in SQLite tables with proper typing and documentation.


323-329: Good extraction of scan ID from native ID

The _extract_scan_id helper method uses a regex pattern to extract scan IDs, improving code readability and maintainability.


330-386: Comprehensive precursor data extraction with error handling

The _extract_first_precursor_data method thoroughly extracts precursor information with proper error handling using try-except blocks and debug logging.


387-425: Improved batch writing with error handling

The _write_batch method now includes proper error logging with file path information, which will be valuable for troubleshooting.


444-514: Enhanced mzML streaming with proper error handling and logging

The batch_write_mzml_streaming function now includes comprehensive error handling, schema creation, and acquisition datetime extraction.


516-553: Improved path resolution with informative logging

The resolve_ms_path function now includes better logging for path resolution, which helps with debugging path-related issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
quantmsutils/mzml/mzml_statistics.py (4)

29-29: Remove unused import to clean up code.

The constant PRECURSOR_INTENSITY_WINDOW is imported here but never referenced in the file. Consider removing this import to maintain a clean codebase.

-    PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


68-137: Secure SQL column handling and robust batch writing.

  1. The dynamic column selection using column_exists mitigates SQL injection risk.
  2. Streaming the data in chunks helps with memory usage and is aligned with best practices.
  3. Consider re-raising the original error context for improved debuggability.
  except Exception as e:
      logger.error(f"Error processing Bruker .d file: {e}")
-     raise
+     raise e from None  # or directly use `raise e` from the original context

271-271: Clarify logical condition.

The condition if intensity is None or intensity == 0.0 and first_precursor_calculated: might be ambiguous. If you intended (intensity is None or intensity == 0.0) and first_precursor_calculated, consider adding parentheses to clarify the precedence.

- if intensity is None or intensity == 0.0 and first_precursor_calculated:
+ if (intensity is None or intensity == 0.0) and first_precursor_calculated:

603-603: Preserve original exception context.

To distinguish exceptions due to downstream errors from those in the exception handler, consider chaining the exception explicitly when re-raising or aborting.

logger.error(f"Error processing file {ms_path}: {e}")
- raise click.Abort()
+ raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

603-603: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68d9e89 and c8c49f0.

📒 Files selected for processing (1)
  • quantmsutils/mzml/mzml_statistics.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


603-603: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
quantmsutils/mzml/mzml_statistics.py (3)

32-33: Good job adding a global logging configuration.

The chosen logging format provides valuable info (timestamp, function name, message). Make sure to adapt the verbosity level based on the environment (development vs. production).


36-54: Schema definition appears consistent.

The fields defined in create_ms_schema() properly align with how they are populated downstream. This ensures consistent data typing in the Parquet output.


56-66: ID-only schema is well-defined.

Each field (scan, MS level, mz & intensity arrays) has an appropriate type. This separates ID-only data from the main data effectively.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/test_commands.py (2)

16-16: ⚠️ Potential issue

Fix the reference to the missing test file.

The file BSA1_F1_test_ms_info.parquet doesn't exist in the test_data directory, causing the pipeline to fail. This is the same issue identified in the previous review.

Either create this file in the test_data directory or modify the test to use an existing file.


127-132: ⚠️ Potential issue

Create missing reference file or modify test verification approach.

The test fails because BSA1_F1_test_ms_info.parquet doesn't exist, yet the test tries to use it as a reference for comparison. This issue was previously identified but hasn't been resolved.

Options to fix this:

  1. Create the missing reference file by running the command manually and copying the output to the test location
  2. Modify the test to compare specific attributes instead of comparing with a reference file:
        # Compare with reference data
        output_table = pd.read_parquet(BSA_MS_INFO_FILE).set_index("scan")
-       reference_table = pd.read_parquet(BSA_TEST_MS_INFO_FILE).set_index("scan")
-       
-       assert len(output_table) == len(reference_table), "Number of scans doesn't match"
+       # Verify specific expectations about the output
+       assert len(output_table) > 0, "Output table is empty"
+       
+       expected_columns = ["scan", "rt", "ms_level", "precursor_charge"]
+       for col in expected_columns:
+           assert col in output_table.columns, f"Expected column {col} missing from output"
🧰 Tools
🪛 GitHub Actions: Python application

[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.

🪛 GitHub Actions: Python package

[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.

🧹 Nitpick comments (4)
tests/test_commands.py (4)

1-2: Remove unused import.

The os module is imported but not used anywhere in the file.

from pathlib import Path
-import os
import pandas as pd
import pytest
🧰 Tools
🪛 Ruff (0.8.2)

2-2: os imported but unused

Remove unused import: os

(F401)


66-68: Enhance test assertions for output validation.

The test checks if the command executed successfully but doesn't verify if the output is correct.

Consider adding assertions to verify the output files exist and contain the expected data. For example:

        result = run_cli_command("diann2mztab", args)
        assert result.exit_code == 0
-       # Additional assertions could check for expected output files
+       # Verify output files
+       output_file = DIANN_TEST_DIR / "combined.mztab"
+       assert output_file.exists(), "Output mzTab file was not created"
+       
+       # Verify file has expected content
+       with open(output_file, 'r') as f:
+           content = f.read()
+           assert "MTD" in content, "Output file doesn't contain MTD section"
+           assert "PRT" in content, "Output file doesn't contain PRT section"

100-102: Add assertions to verify output files for PSM conversion.

Similar to other tests, this test only checks the exit code but doesn't verify the output.

Add assertions to verify that the expected output files exist and contain the correct data:

        result = run_cli_command("psmconvert", args)
        assert result.exit_code == 0
-       # Could add assertions to check for expected output files
+       # Verify output files
+       expected_output = Path.cwd() / "psm.parquet"  # Adjust path as needed
+       assert expected_output.exists(), "Output file was not created"
+       
+       # Verify file content
+       output_df = pd.read_parquet(expected_output)
+       assert len(output_df) > 0, "Output file is empty"
+       assert "peptide_sequence" in output_df.columns, "Expected column missing"
🧰 Tools
🪛 GitHub Actions: Python application

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.

🪛 GitHub Actions: Python package

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.


142-144: Add more thorough verification for TMT statistics.

The test verifies the file exists and is not empty, but could do more to validate its content.

Add more specific validations for the TMT statistics output:

        output_table = pd.read_parquet(TMT_MS_INFO_FILE)
        assert len(output_table) > 0, "Output table is empty"
-
-       # Could add assertions for expected columns or data validation
+       
+       # Validate specific columns and data types
+       expected_columns = ["scan", "rt", "ms_level", "precursor_charge"]
+       for col in expected_columns:
+           assert col in output_table.columns, f"Expected column {col} missing from output"
+           
+       # Validate data content
+       assert output_table["ms_level"].isin([1, 2]).all(), "MS levels should be either 1 or 2"
+       assert (output_table["rt"] >= 0).all(), "Retention times should be non-negative"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e959970 and 2af3899.

⛔ Files ignored due to path filters (1)
  • tests/test_data/BSA1_F1_spectrum_df.parquet is excluded by !**/*.parquet
📒 Files selected for processing (1)
  • tests/test_commands.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_commands.py

2-2: os imported but unused

Remove unused import: os

(F401)

🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.


[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.


[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tests/test_commands.py (3)

16-16: ⚠️ Potential issue

Fix the missing test file reference that's causing pipeline failures.

The file BSA1_F1_test_ms_info.parquet doesn't exist in the test_data directory according to the pipeline errors. This is causing the test at line 129 to fail.

You have two options:

  1. Create this reference file in the test_data directory
  2. Modify the test to use an existing file instead

94-102: ⚠️ Potential issue

Fix the failing psmconvert command

The CI pipeline shows this test is failing with exit code 2 instead of the expected 0. The previous reviewer identified this may be due to a missing dependency (numpy).

Ensure that:

  1. All dependencies are correctly installed
  2. The command arguments are correct and valid
  3. The implementation of the psmconvert command works with these inputs
🧰 Tools
🪛 GitHub Actions: Python application

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.

🪛 GitHub Actions: Python package

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.


128-130: ⚠️ Potential issue

Fix reference data handling

The test is trying to compare with a reference file (BSA1_F1_test_ms_info.parquet) that doesn't exist, causing pipeline failures.

Either:

  1. Add the missing reference file to the repository
  2. Generate a reference file as part of the test setup
  3. Use a different approach that doesn't require a preexisting reference file
🧰 Tools
🪛 GitHub Actions: Python application

[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.

🪛 GitHub Actions: Python package

[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.

🧹 Nitpick comments (1)
tests/test_commands.py (1)

67-68: Consider adding output validation

The test verifies the command executes successfully, but doesn't validate the content of the output files. Consider adding assertions to verify specific output characteristics.

        result = run_cli_command("diann2mztab", args)
        assert result.exit_code == 0
-        # Additional assertions could check for expected output files
+        # Verify output file exists and has expected content
+        output_file = DIANN_TEST_DIR / "output.mztab"
+        assert output_file.exists(), "Output file was not created"
+        # Add assertions to check file content
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2af3899 and 384c787.

📒 Files selected for processing (2)
  • .gitignore (1 hunks)
  • tests/test_commands.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitignore
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.


[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.


[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)

8-18: Good use of constants for better maintainability!

Defining path constants at the top of the file is an excellent practice that improves readability and makes future path updates easier.


21-28: Well-designed helper function!

Creating a reusable run_cli_command function reduces duplication across the test suite and makes the code more maintainable.


31-50: Good use of parameterization!

Consolidating CLI help message tests using pytest's parametrize is a great improvement that follows testing best practices.

🧰 Tools
🪛 GitHub Actions: Python application

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

🪛 GitHub Actions: Python package

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


108-117: Good test fixture setup!

Using a pytest fixture for setup and teardown is an excellent practice. This ensures tests start with a clean state and don't interfere with each other.


146-149: Good validation of output structure!

Checking for the presence of expected columns is a good validation approach.


150-160: Well-structured skipped test

Marking tests that require large files as skipped by default is a good practice. This allows local testing without slowing down CI pipelines.

@ypriverol ypriverol changed the title improve logging and MS features. Improve parameters, logging, examples files and feature detection for mzmml statistics. Mar 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)

613-615: Consider preserving the original exception cause.
Raising click.Abort() directly loses the traceback context. Using raise click.Abort() from e provides better debugging information, though it’s optional.

🧰 Tools
🪛 Ruff (0.8.2)

615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 384c787 and 5740d85.

⛔ Files ignored due to path filters (1)
  • tests/test_data/tmt_erwinia_1ulsike_top10hcd_isol2_45stepped_60min_01_sage_ms2rescore.idxml.feature_names.tsv is excluded by !**/*.tsv
📒 Files selected for processing (3)
  • quantmsutils/mzml/mzml_statistics.py (1 hunks)
  • quantmsutils/psm/psm_conversion.py (5 hunks)
  • tests/test_commands.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.


[error] 135-135: AssertionError: Expected column scan missing from output in test_mzml_statistics_bsa.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.


[error] 135-135: AssertionError: Expected column scan missing from output.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)

9-17: Good use of constants for maintainability.
Defining all file and directory constants at the top promotes clarity and eases maintenance.


20-28: Helper function is well-structured.
Centralizing CLI invocation logic in run_cli_command helps avoid duplication and simplifies testing.


30-49: CLI help message tests look good.
Parameterizing the test over multiple commands is a neat, scalable approach.

🧰 Tools
🪛 GitHub Actions: Python application

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

🪛 GitHub Actions: Python package

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


51-68: DIA-NN to mzTab test is clear.
The test arguments and usage align well with the documented functionality.


70-88: Samplesheet validation tests are properly organized.
The separation of concerns and explicit checks make these tests easy to extend.


120-128: Solid baseline for mzML statistics testing.
The test covers file creation checks and ensures the broad functionality is correct.

Comment on lines +101 to +102
result = run_cli_command("psmconvert", args)
assert result.exit_code == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Investigate failing 'psmconvert' command.
The pipeline indicates the test returns exit code 2 instead of 0. Confirm that the input files exist and all dependencies (e.g., numpy) are installed. Also ensure the command invocation is correct.

🧰 Tools
🪛 GitHub Actions: Python application

[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.

🪛 GitHub Actions: Python package

[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.

SUMMED_PEAK_INTENSITY,
PRECURSOR_RT,
PRECURSOR_TOTAL_INTENSITY,
PRECURSOR_INTENSITY_WINDOW,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused import.
PRECURSOR_INTENSITY_WINDOW is never referenced, so it should be removed to avoid clutter.

-    PRECURSOR_INTENSITY_WINDOW,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_commands.py (2)

20-27: Helper function reduces code duplication

The run_cli_command function is a good abstraction that eliminates repetitive code. It handles both help message testing and command execution with arguments.

Consider adding a return type annotation to make the function's interface clearer:

-def run_cli_command(command, args=None):
+def run_cli_command(command, args=None) -> click.testing.Result:

138-147: Good use of skip marker for local-only tests

Using pytest.mark.skip for tests that should only run locally with larger files is a good practice. This prevents these tests from slowing down CI pipelines while still keeping them in the codebase for local verification.

Consider adding a more specific reason for skipping:

-@pytest.mark.skip("Test to be run locally, with big files")
+@pytest.mark.skip(reason="Uses large files that would slow down CI pipelines and consume excessive resources")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5740d85 and 85ebdff.

📒 Files selected for processing (1)
  • tests/test_commands.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py

[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package
tests/test_commands.py

[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (5)
tests/test_commands.py (5)

9-18: Well-organized constants improve maintainability

I like how you've defined all path and file constants at the top of the file. This improves maintainability and makes it easier to update paths if needed. The constants follow a clear naming convention, making their purpose obvious.

As a minor suggestion, consider adding a docstring at the top of the file to explain its purpose and the overall testing strategy.


30-49: Excellent use of parametrized testing

Converting individual test methods into a single parametrized test is a best practice that reduces code duplication and makes adding new command tests easier.

The docstrings are clear and the assertions are appropriate for verifying help messages.

🧰 Tools
🪛 GitHub Actions: Python application

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

🪛 GitHub Actions: Python package

[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


106-119: Good test setup with fixture for cleanup

Using a pytest fixture with autouse=True for setup and teardown is a clean approach. This ensures each test runs in a clean environment by removing output files before the test and allowing for cleanup after.

I particularly like how you're explicitly checking for and removing existing files to prevent test interference.


1-147:

❓ Verification inconclusive

Address deprecation warnings in the pipeline

The pipeline shows deprecation warnings that should be addressed:

  1. FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  2. DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

While these don't appear directly in this file, they're triggered during test execution and should be fixed in the relevant implementation files.


🏁 Script executed:

#!/bin/bash
# Find where the deprecated methods are used
echo "Searching for deprecated DataFrame.applymap usage..."
rg -l "\.applymap\(" --type py

echo "Searching for deprecated is_categorical_dtype usage..."
rg -l "is_categorical_dtype" --type py

Length of output: 305


Deprecation Warnings: Manual Verification Recommended

Our search for direct usages of DataFrame.applymap and is_categorical_dtype in the repository (including in key modules such as quantmsutils/quantmsutilsc.py) did not yield any matches. Given that these warnings are observed during test runs, it's possible they stem from indirect calls (e.g., through dependencies or deeper library internals).

  • Please manually verify if these deprecated calls are coming from our own code or from an external dependency.
  • If found within our implementations, update:
    • DataFrame.applymap to DataFrame.map
    • The usage of is_categorical_dtype to an appropriate type-check using isinstance(dtype, pd.CategoricalDtype)
  • Focus especially on the implementation files that transform or manipulate data, such as in quantmsutils/quantmsutilsc.py.
🧰 Tools
🪛 GitHub Actions: Python application

[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.

🪛 GitHub Actions: Python package

[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.


[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.


120-136:

❓ Verification inconclusive

Test has comprehensive assertions but had past failures

The test has solid assertions checking both the exit code and the existence of expected output files, followed by content validation.

Past review comments indicated an issue with the 'scan' column missing from output. The current code checks for this (line 135), which should prevent that issue from recurring.


🏁 Script executed:

#!/bin/bash
# Check if TMT files exist
echo "Checking test data files..."
ls -la ${TEST_DATA_DIR}/TMT* 2>/dev/null || echo "TMT files not found"

# Check whether the test can run without errors
echo "Running mzml statistics test..."
python -c "
import os
import sys
from pathlib import Path
import pandas as pd
from click.testing import CliRunner
from quantmsutils.quantmsutilsc import cli

TMT_MZML_FILE = '${TMT_MZML_FILE}'
TMT_MS_INFO_FILE = '${TMT_MS_INFO_FILE}'

# Remove output files if they exist
if os.path.exists(TMT_MS_INFO_FILE):
    os.unlink(TMT_MS_INFO_FILE)

runner = CliRunner()
result = runner.invoke(cli, ['mzmlstats', '--ms2_file', '--ms_path', TMT_MZML_FILE])

print(f'Exit code: {result.exit_code}')
if result.exception:
    print(f'Exception: {result.exception}')
    print(f'Traceback: {result.exc_info}')

if os.path.exists(TMT_MS_INFO_FILE):
    print('Output file created successfully')
    df = pd.read_parquet(TMT_MS_INFO_FILE)
    print(f'Columns: {df.columns.tolist()}')
    print(f'Row count: {len(df)}')
else:
    print('Output file not created')
"

Length of output: 1210


Action Required: Verify Test Environment Dependencies

The updated test implementation now robustly validates the output—including checking for the existence of the “scan” column—to address the past issue. However, manual verification is necessary because our test script execution revealed an environment misconfiguration: Pandas was missing from the testing environment (i.e. ModuleNotFoundError: No module named 'pandas'). Additionally, the test data files (TMT files) were not found during the check.

  • Please ensure that all required dependencies (e.g., Pandas) are installed in the testing environment.
  • Verify that the test data file paths (e.g., ${TEST_DATA_DIR}/TMT*) are correctly configured and accessible.

Once these issues are resolved and the environment is properly set up, rerun the tests to confirm that the assertions (including the check for the “scan” column) pass as expected.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85ebdff and 8b0054e.

⛔ Files ignored due to path filters (1)
  • tests/test_data/STATIC_TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_ms2_info.parquet is excluded by !**/*.parquet
📒 Files selected for processing (1)
  • tests/test_commands.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (7)
tests/test_commands.py (7)

9-18: Good use of constants for improved maintainability.

Defining constants at the top of the file makes the code more maintainable and reduces the risk of path-related errors. Using the Path object from pathlib is also a good practice for cross-platform compatibility.


21-28: Well-designed helper function for CLI command execution.

The run_cli_command function nicely abstracts the common pattern of creating a CLI runner and invoking commands, making the test code more concise and following the DRY principle.


31-50: Excellent use of pytest parameterization for testing help messages.

Converting individual test functions into a parameterized test reduces code duplication and makes it easier to add new commands to test in the future.


107-120: Good use of pytest fixture for test setup and teardown.

Using a fixture with autouse=True ensures that test files are properly cleaned up before and after tests run, preventing interference between test runs.


138-148: Good practice using pytest.mark.skip for resource-intensive tests.

Marking tests that require large files or significant resources with pytest.mark.skip is a good practice. It allows other tests to run quickly while still keeping the test code available for local debugging.


1-148: Overall refactoring improves test organization and maintainability.

The refactoring of this test file follows best practices by:

  1. Organizing related tests into classes
  2. Using fixtures for setup/teardown
  3. Applying parameterization to reduce duplication
  4. Providing clear docstrings for tests
  5. Using consistent naming patterns

This structure makes the tests more maintainable and easier to extend in the future.


134-137:

❓ Verification inconclusive

Ensure 'scan' column is correctly generated in the output.

Previous pipeline failures indicated that the 'scan' column was missing from the output. This test now checks for its presence, but verify that the implementation correctly generates this column.


🏁 Script executed:

#!/bin/bash
# Run the mzmlstats command to generate the output file
python -m quantmsutils.quantmsutilsc mzmlstats --ms2_file --ms_path "${TMT_MZML_FILE}"

# Check if the output file exists and contains the expected columns
echo "Checking output file for 'scan' column:"
python -c "
import pandas as pd
file_path = '${TMT_MS_INFO_FILE}'
try:
    df = pd.read_parquet(file_path)
    if 'scan' in df.columns:
        print(f\"✓ 'scan' column is present in {file_path}\")
        print(f\"Column values (first 5 rows): {df['scan'].head().tolist()}\")
    else:
        print(f\"✗ 'scan' column is missing from {file_path}\")
        print(f\"Available columns: {df.columns.tolist()}\")
except Exception as e:
    print(f\"Error reading file: {e}\")
"

Length of output: 1197


ACTION REQUIRED: Ensure dependency installation and verify 'scan' column generation

The execution failed due to missing modules (numpy and pandas), so we couldn’t confirm whether the output contains the 'scan' column. Please make sure all required dependencies are installed in the test environment and re-run the pipeline. Once dependencies are in place, manually verify that the output file (referenced by ${TMT_MS_INFO_FILE}) correctly includes the 'scan' column as intended.

Comment on lines +91 to +105
class TestPSMConversion:
"""
Test class for PSM conversion commands, it takes an ms2 file in parquet and an idXML file
and converts it to a parquet file with PSM information.
"""
def test_convert_psm(self):
"""Test converting PSM data"""
args = [
"--idxml", str(TMT_IDXML_FILE),
"--ms2_file", str(TMT_STATIC_MS2_INFO_FILE),
]
result = run_cli_command("psmconvert", args)
assert result.exit_code == 0
# Could add assertions to check for expected output files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify that dependencies for psmconvert are correctly installed.

Previous pipeline failures indicated issues with missing dependencies (e.g., numpy) for the psmconvert command. While the test structure has been improved, ensure that all required dependencies are properly installed in the testing environment.


🏁 Script executed:

#!/bin/bash
# Check if numpy is installed and its version
echo "Checking for numpy installation:"
pip list | grep numpy || echo "numpy not found!"

# Check if the test files exist
echo "Checking if test files exist:"
ls -la ${TMT_IDXML_FILE} ${TMT_STATIC_MS2_INFO_FILE} 2>/dev/null || echo "Test files not found"

# Try importing numpy in a Python script
echo "Attempting to import numpy:"
python -c "import numpy; print(f'Numpy version: {numpy.__version__}')" || echo "Failed to import numpy"

Length of output: 1492


Missing Dependency: Ensure numpy is installed in the testing environment

The latest test run confirms that numpy is not installed (output shows “numpy not found!” and a failure when attempting to import numpy). Please update the dependency configuration (e.g., add numpy to requirements.txt or environment.yml) so that the psmconvert command has all necessary packages during testing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
README.md (6)

35-36: Header Formatting: Standardize "mzML Statistics" Headline.

Consider capitalizing and formatting the header as "mzML Statistics" (instead of "mzml statistics") to match the conventional capitalization of the mzML format for clarity and consistency.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Typographical Error in Section Description.

There is a typo in the phrase "gow they are computed." It should be corrected to "how they are computed."

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


39-43: Details Section Clarity.

The details section is well-structured and clearly explains the purpose and output of the mzmlstats functionality. For further clarity, consider adding a brief example demonstrating the output filename format.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-49: Field Descriptions and Markdown Formatting.

The list describing each field is comprehensive and informative. However, please review the punctuation and spacing to address minor markdownlint warnings (e.g., trailing spaces and line-length issues) as mentioned in the static analysis hints. Paying attention to these details will further improve readability.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


51-57: MS/MS Signal Field Details Consistency.

The additional fields for MS/MS signals are clearly described. Ensure that all field names (e.g., precursor_charge, precursor_mz, precursor_rt, and precursor_total_intensity) are formatted uniformly with backticks and consistent punctuation.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


58-69: Note Block and Code Fencing Adjustments.

The note section effectively explains how precursor-related information is computed. Consider the following improvements:

  • Ensure that the fenced code block is surrounded by blank lines (per MD031 guidelines).
  • Correct the typo "pyopnems" to "pyopenms" to accurately reference the library.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


60-60: Line length
Expected: 80; Actual: 233

(MD013, line-length)


61-61: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


64-64: Line length
Expected: 80; Actual: 97

(MD013, line-length)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0054e and e8227a4.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • quantmsutils/utils/constants.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • quantmsutils/utils/constants.py
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
README.md

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


60-60: Line length
Expected: 80; Actual: 233

(MD013, line-length)


61-61: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


64-64: Line length
Expected: 80; Actual: 97

(MD013, line-length)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
README.md (1)

71-73: Contributions and Issues Section.

The "Contributions and issues" section is clear and directs users appropriately to the GitHub repository for further engagement. No changes are needed in this part.

🧰 Tools
🪛 LanguageTool

[typographical] ~73-~73: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...

(PLEASE_UNNECESSARY_COMMA)

🪛 markdownlint-cli2 (0.17.2)

73-73: Line length
Expected: 80; Actual: 196

(MD013, line-length)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
README.md (3)

35-36: Header Capitalization & Consistency

The new header "#### mzml statistics" is a clear indicator for this section. For consistency and clarity—especially given standard usage in proteomics—it would be better to capitalize it as "#### mzML Statistics".

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-38: Typo & Clarity in Descriptive Text

The introductory text contains a minor typographical error: "gow they are computed" should be corrected to "how they are computed". Additionally, you might consider rephrasing the sentence for clarity. For example, starting with "The quantms-utils package provides multiple scripts…" could improve the flow.

Proposed Diff:

- Here are some details about the formats, the fields they contain and gow they are computed.
+ Here are some details about the formats, the fields they contain and how they are computed.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


35-38: Markdown Formatting Improvements

Static analysis reports indicate minor markdown formatting issues in these new lines (e.g., trailing spaces on line 35 and an excessive line length on line 37). Addressing these by removing extra whitespace and breaking long lines into shorter, more readable segments will further improve the documentation’s clarity.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8227a4 and a514c16.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~60-~60: Possible missing preposition found.
Context: ...ng the first precursor in the spectrum. The following columns intensity (if not a...

(AI_HYDRA_LEO_MISSING_IN)


[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Possible missing comma found.
Context: ...el: The MS level of the signal, all of them will be 2. - mz_array`: The m/z array ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
README.md

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


60-60: Line length
Expected: 80; Actual: 233

(MD013, line-length)


61-61: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


64-64: Line length
Expected: 80; Actual: 97

(MD013, line-length)


74-74: Line length
Expected: 80; Actual: 313

(MD013, line-length)


76-76: Line length
Expected: 80; Actual: 334

(MD013, line-length)


78-78: Line length
Expected: 80; Actual: 140

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 153

(MD013, line-length)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
README.md (1)

39-69: Comprehensive documentation of MS info outputs

The detailed explanation of the MS info output format, including column definitions and their computation methods, is excellent. This level of detail will help users understand and properly utilize the data.

However, there is a minor typo in line 61: "pyopnems" should be "pyopenms".

-precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index)
-precursor_rt = precursor_spectrum.getRT()
-purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True)
-precursor_intensity = purity.target_intensity
-total_intensity = purity.total_intensity
+precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index)
+precursor_rt = precursor_spectrum.getRT()
+purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True)
+precursor_intensity = purity.target_intensity
+total_intensity = purity.total_intensity
🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


60-60: Line length
Expected: 80; Actual: 233

(MD013, line-length)


61-61: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


64-64: Line length
Expected: 80; Actual: 97

(MD013, line-length)

quantmsutils/mzml/mzml_statistics.py (2)

209-316: Comprehensive spectrum transformation with better error handling

The transform_mzml_spectrum method provides a more structured approach to processing spectra, with clear separation of MS1 and MS2 handling. The code now properly handles cases where peaks or precursor information might be missing.

However, there's one improvement opportunity: the code could benefit from a better structured exception handling when dealing with spectrum processing.

Consider adding a try-except block around the main transformation logic to catch and log any unexpected errors that might occur during spectrum processing:

def transform_mzml_spectrum(
    self,
    i: int,
    spectrum: oms.MSSpectrum,
    mzml_exp: oms.MSExperiment,
) -> None:
+    try:
        # Extract peaks data
        mz_array, intensity_array = spectrum.get_peaks()
        peak_count = len(mz_array)
        scan_id = self._extract_scan_id(spectrum)
        
        # ... remaining implementation ...
        
        self.batch_data.append(row_data)
        
        # Write batch when it reaches specified size
        if len(self.batch_data) >= self.batch_size:
            self._write_batch()
+    except Exception as e:
+        logger.error(f"Error processing spectrum {i}: {e}")
+        # Continue processing other spectra

566-615: Enhanced CLI command with better options and error handling

The CLI command now includes better documentation, support for MS2 file generation, and improved error handling. However, there's an issue with the exception handling in line 615.

When catching and re-raising exceptions, it's better to use raise ... from e to preserve the original traceback:

    except Exception as e:
        logger.error(f"Error processing file {ms_path}: {e}")
-        raise click.Abort()
+        raise click.Abort() from e
🧰 Tools
🪛 Ruff (0.8.2)

615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a514c16 and ef5aeaf.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • quantmsutils/mzml/mzml_statistics.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
README.md

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


60-60: Line length
Expected: 80; Actual: 233

(MD013, line-length)


61-61: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


64-64: Line length
Expected: 80; Actual: 97

(MD013, line-length)


74-74: Line length
Expected: 80; Actual: 331

(MD013, line-length)


76-76: Line length
Expected: 80; Actual: 334

(MD013, line-length)


78-78: Line length
Expected: 80; Actual: 140

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 153

(MD013, line-length)

🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


615-615: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (14)
README.md (2)

35-38: Well-structured introduction to mzML statistics functionality

The new section provides a clear introduction to the mzML statistics functionality, highlighting its purpose within the quantms ecosystem. This documentation will be valuable for users.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


71-81: Clear explanation of MS2 file format

The documentation for the MS2 output format is concise and clearly explains the contents of the MS2 info parquet file. This will be helpful for users working with MS2 data.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

74-74: Line length
Expected: 80; Actual: 331

(MD013, line-length)


76-76: Line length
Expected: 80; Actual: 334

(MD013, line-length)


78-78: Line length
Expected: 80; Actual: 140

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 153

(MD013, line-length)

quantmsutils/mzml/mzml_statistics.py (12)

1-33: Improved logging implementation

Adding proper logging with configurable levels is a good improvement that will help with troubleshooting and monitoring. The module-level logger with a consistent format will make it easier to track the execution flow.

🧰 Tools
🪛 Ruff (0.8.2)

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


29-29: Remove unused import

The PRECURSOR_INTENSITY_WINDOW constant is imported but never used in the code.

-    PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)

29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW imported but unused

Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW

(F401)


36-66: Well-structured schema definitions

The addition of dedicated functions for schema creation improves code organization and maintainability. These schemas clearly define the structure of the output files as documented in the README.


68-141: Enhanced Bruker file processing with better error handling

The updated batch_write_bruker_d function now includes proper logging, batch processing, and robust error handling. The function ensures that all schema fields are properly populated, even when they don't exist in the source data.


143-161: Useful helper function for SQLite column checking

The column_exists function is a good addition that makes the code more robust by allowing it to adapt to different SQLite schema structures.


169-202: Improved constructor with better parameter documentation

The updated constructor now includes support for MS2 file generation and has comprehensive parameter documentation. This makes the code more maintainable and easier to understand.


322-340: Improved scan ID extraction

The _extract_scan_id method is a good addition that handles different formats of scan IDs in a consistent way. This will make the output files more consistent and easier to work with.


341-397: Well-structured precursor data extraction

The _extract_first_precursor_data method properly handles precursor information extraction, including fallbacks for when precursor information is not explicitly provided. The code follows good practices by including comprehensive error handling and logging.


398-436: Efficient batch writing with error handling

The _write_batch method now includes proper error handling and logging, making it more robust. The method efficiently writes batches of data to the output files.


437-453: Improved finalization process

The finalize method now properly handles resource cleanup, ensuring that all writers are closed and references are cleared, which helps prevent resource leaks.


455-524: Enhanced mzML streaming with comprehensive error handling

The batch_write_mzml_streaming function now includes better error handling, acquisition date extraction, and supports generating MS2 files. The function is well-structured and follows good practices for logging and error reporting.


527-564: Improved file path resolution

The resolve_ms_path function now handles different file extensions and provides better logging for path resolution. This makes the tool more user-friendly by automatically finding the correct file even if the extension is not exactly as specified.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
README.md (4)

35-38: Fix Typos and Grammar in the mzML Statistics Section

The new "mzml statistics" section has a couple of language issues:

  • The phrase “quantms-utils have multiple scripts…” should be corrected to “quantms-utils has multiple scripts…” for proper subject–verb agreement.
  • The word “gow” in “…and gow they are computed” is a typo; it should be “how”.

Consider applying this diff:

-quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed.
+quantms-utils has multiple scripts to generate mzML stats. These files are used by multiple tools and packages within the quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and how they are computed.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


40-70: Ensure Consistent Library Naming and Punctuation

Within the "MS info and details" block:

  • The note section refers to “pyopnems” which is likely a typo; it should be “pyopenms” to correctly reference the library.
  • Additionally, several lines trigger static analysis warnings (e.g., loose punctuation and lengthy lines). Although these issues do not affect functionality, revising them will enhance readability.

A diff suggestion for the library naming issue:

-For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopnems code:
+For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopenms code:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Line length
Expected: 80; Actual: 233

(MD013, line-length)


62-62: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


65-65: Line length
Expected: 80; Actual: 97

(MD013, line-length)


72-82: Correct Typo in MS2 Details

In the "MS2 info and details" section on line 75, the word “intesities” is misspelled. It should be “intensities”:

-`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak.
+`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intensities and masses of every peak.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

75-75: Line length
Expected: 80; Actual: 331

(MD013, line-length)


77-77: Line length
Expected: 80; Actual: 334

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 140

(MD013, line-length)


80-80: Line length
Expected: 80; Actual: 153

(MD013, line-length)


35-86: Refine Markdown Formatting

Several lines in the new sections are triggering markdownlint warnings (e.g., trailing spaces, excessive line lengths, and loose punctuation). Although these are minor formatting issues, addressing them will improve the overall readability and maintainability of the README. Consider reviewing these aspects using markdownlint tools.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)


[typographical] ~86-~86: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...

(PLEASE_UNNECESSARY_COMMA)

🪛 markdownlint-cli2 (0.17.2)

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Line length
Expected: 80; Actual: 233

(MD013, line-length)


62-62: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


65-65: Line length
Expected: 80; Actual: 97

(MD013, line-length)


75-75: Line length
Expected: 80; Actual: 331

(MD013, line-length)


77-77: Line length
Expected: 80; Actual: 334

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 140

(MD013, line-length)


80-80: Line length
Expected: 80; Actual: 153

(MD013, line-length)


86-86: Line length
Expected: 80; Actual: 196

(MD013, line-length)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef5aeaf and b4ef7ee.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • quantmsutils/psm/psm_conversion.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level: The MS level of the signal, 1 for MS an...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks: The number of peaks in the MS. Compute ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). - base_peak_intensity`: The max intensity in the spectrum (MS o...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities: The sum of all intensities in the spect...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt: The retention time of the spectrum, cap...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge: The charge of the precursor ion, if the...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). - precursor_mz`: The m/z of the precursor ion, if the si...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with s...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). - precursor_intensity`: The intensity of the precursor ion, if ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt: The retention time of the precursor ion...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity: The total intensity of the precursor io...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: - scan: The scan accession for each MS and MS/M...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level: The MS level of the signal, all of them...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array: The m/z array of the peaks in the MS/MS...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). - intensity_array`: The intensity array of the peaks in the...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
README.md

35-35: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


37-37: Line length
Expected: 80; Actual: 272

(MD013, line-length)


42-42: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


42-42: Line length
Expected: 80; Actual: 315

(MD013, line-length)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


44-44: Line length
Expected: 80; Actual: 335

(MD013, line-length)


53-53: Line length
Expected: 80; Actual: 148

(MD013, line-length)


54-54: Line length
Expected: 80; Actual: 137

(MD013, line-length)


55-55: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


55-55: Line length
Expected: 80; Actual: 268

(MD013, line-length)


56-56: Line length
Expected: 80; Actual: 98

(MD013, line-length)


57-57: Line length
Expected: 80; Actual: 112

(MD013, line-length)


60-60: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


61-61: Line length
Expected: 80; Actual: 233

(MD013, line-length)


62-62: Fenced code blocks should be surrounded by blank lines
null

(MD031, blanks-around-fences)


65-65: Line length
Expected: 80; Actual: 97

(MD013, line-length)


75-75: Line length
Expected: 80; Actual: 331

(MD013, line-length)


77-77: Line length
Expected: 80; Actual: 334

(MD013, line-length)


79-79: Line length
Expected: 80; Actual: 140

(MD013, line-length)


80-80: Line length
Expected: 80; Actual: 153

(MD013, line-length)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build
  • GitHub Check: build

@ypriverol ypriverol merged commit 66de992 into main Mar 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants