-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parameters, logging, examples files and feature detection for mzmml statistics. #47
Conversation
WalkthroughThis pull request introduces two new entries in the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant M as mzML Processor
participant L as Logger
participant O as Output Writer
U->>M: Initiate mzML statistics generation
M->>M: Read and process spectra (transform_mzml_spectrum)
M->>L: Log detailed precursor, error, and progress info
M->>O: Write batch data to Parquet file
O-->>M: Confirm successful write
M->>L: Log success message
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
quantmsutils/sdrf/check_samplesheet.py (1)
184-184
: 💡 Verification agent🧩 Analysis chain
Consider addressing the TODO comment.
There's a TODO comment about validating the expdesign file, but the function
check_expdesign
already exists and is called below. It appears the TODO might be stale.
🏁 Script executed:
#!/bin/bash # Check if there are issues or tasks related to the validation of expdesign files grep -r "TODO.*expdesign" --include="*.py" .Length of output: 122
Stale TODO Comment: Remove or Update Expdesign Validation Note
The TODO inquantmsutils/sdrf/check_samplesheet.py
(line 184) appears to be redundant since thecheck_expdesign
function is already implemented and used below. Please remove (or update) this comment to avoid confusion.
🧹 Nitpick comments (4)
quantmsutils/diann/dianncfg.py (1)
86-89
: Consider replacing remaining print statements with logging.For consistency, consider replacing the remaining print statements with logging calls, similar to what was done at lines 65-68.
- print( - "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " - + mod - ) + logging.info( + "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + + mod + )And similarly for the print statement at lines 101-104.
Also applies to: 101-104
README.md (1)
35-48
: Great addition of documentation for mzML statistics.The detailed explanation of the mzML statistics functionality and output format is very helpful for users.
Small formatting suggestion: ensure consistent spacing in bullet points for better readability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
quantmsutils/mzml/mzml_statistics.py (2)
13-13
: Remove unused import.The import
import pyopenms as oms
appears to be unused since pyopenms is already imported as MzMLFile.-import pyopenms as oms
🧰 Tools
🪛 Ruff (0.8.2)
13-13:
pyopenms
imported but unusedRemove unused import:
pyopenms
(F401)
109-133
: Good handling of multiple precursors.The code now properly handles spectra with multiple precursors, which is important for accurate data processing. However, the manual index incrementing could be simplified using enumerate.
- index = 0 - for precursor in spectrum.getPrecursors(): + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) charge_state = precursor.getCharge() exp_mz = precursor.getMZ() intensity = precursor.getIntensity() precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
132-132: Use
enumerate()
for index variableindex
infor
loop(SIM113)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.gitignore
(1 hunks)README.md
(1 hunks)pyproject.toml
(1 hunks)quantmsutils/diann/diann2mztab.py
(23 hunks)quantmsutils/diann/dianncfg.py
(2 hunks)quantmsutils/mzml/mzml_statistics.py
(11 hunks)quantmsutils/psm/psm_conversion.py
(2 hunks)quantmsutils/sdrf/check_samplesheet.py
(2 hunks)quantmsutils/sdrf/extract_sample.py
(1 hunks)quantmsutils/utils/constants.py
(1 hunks)recipe/meta.yaml
(1 hunks)tests/test_commands.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 154-154: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
13-13: pyopenms
imported but unused
Remove unused import: pyopenms
(F401)
132-132: Use enumerate()
for index variable index
in for
loop
(SIM113)
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
🔇 Additional comments (33)
.gitignore (1)
169-170
: New Ignore Rule for.qodo
Files
The addition of.qodo
to the.gitignore
file is clear and appropriate to prevent tracking of these files. Please verify that.qodo
files are indeed generated by a build/tool process and that no other related file extensions need to be ignored.pyproject.toml (1)
6-6
: Version Bump Updated to 0.0.20
The version upgrade from"0.0.19"
to"0.0.20"
is correctly applied. This is consistent with the version bump in related files. Make sure that any CI/CD pipeline or documentation referencing the version is updated accordingly.recipe/meta.yaml (1)
4-4
: Meta File Version Update
The package version inrecipe/meta.yaml
has been updated to"0.0.20"
, which aligns perfectly with the update inpyproject.toml
. Confirm that downstream processes (e.g., packaging and deployment jobs) are aware of this change.quantmsutils/sdrf/extract_sample.py (1)
1-1
: Good addition of logging facilities.Adding logging configuration improves observability and will help with debugging.
Also applies to: 7-8
quantmsutils/utils/constants.py (1)
14-15
: New constants added appropriately.The new PRECURSORS and INTENSITY constants follow the consistent naming pattern of the existing constants.
quantmsutils/psm/psm_conversion.py (1)
1-1
: Good addition of logging facilities.Adding logging configuration is consistent with other files and enhances debugging capabilities.
Also applies to: 37-38
quantmsutils/sdrf/check_samplesheet.py (1)
5-5
: Good addition of logging facilities.The logging setup is consistent with the implementation in other files and will improve debugging capabilities.
Also applies to: 14-15
quantmsutils/diann/dianncfg.py (2)
6-6
: Good addition of logging functionality.Adding logging capabilities is a good practice for better error handling and traceability.
Also applies to: 13-14
65-68
: Consistent use of logging for warnings.Good job replacing the print statement with logging.info for warnings, which improves observability.
tests/test_commands.py (1)
1-2
: Good use of Path for better file handling.Using pathlib.Path for file paths is a good practice and more maintainable than string concatenation.
Also applies to: 7-7
quantmsutils/mzml/mzml_statistics.py (7)
1-1
: Good addition of logging functionality.Adding logging capabilities improves error handling and observability throughout the codebase.
Also applies to: 32-33
73-86
: Improved docstring for better code documentation.The enhanced docstring for consumeSpectrum method adds clarity about its functionality and parameters.
159-159
: Added precursors field to data structure.Good addition of the PRECURSORS field to both MS level 2 and level 1 spectra data structures.
Also applies to: 176-176
232-232
: Improved error handling with better context.Replacing print statements with logging.error and including the file path in error messages improves debugging capabilities.
Also applies to: 330-330
380-395
: Well-structured schema for precursor data.The new schema for PRECURSORS with a structured list is well-designed and provides good type safety.
420-421
: Cleaner path construction.Using Path.with_name() for creating output paths is more maintainable than string concatenation.
439-439
: Added success logging message.Good practice to log successful completion of processing, which helps with monitoring and troubleshooting.
quantmsutils/diann/diann2mztab.py (16)
51-59
: Method Signature Update indiann2mztab
:
The function signature now includes a trailing comma after the last parameter. This change helps keep diffs clean and maintains consistency in parameter formatting.
115-121
: Lambda Function Formatting for Peptide Conversion:
The lambda used for converting peptide sequences in the MSstats conversion block has been reformatted into a multi‐line expression. The logic remains unchanged while readability has been significantly improved.
236-236
: List Slicing Formatting in Experimental Design Parsing:
The slicing expression in the creation ofs_table
now uses an extra space for clarity (i.e.data[empty_row + 1 :][1:]
). There is no functional change, only improved readability.
276-301
: Conditional Logic Reformat in Mass Calculation:
The conditional block incompute_mass_modified_peptide
(checking for allowed amino acids) has been reflowed over multiple lines. This reformat enhances clarity without affecting functionality.
381-387
: Method Signature Update inconvert_to_mztab
:
The signature forconvert_to_mztab
now includes a trailing comma after the last parameter, ensuring consistency with other function declarations.
467-471
: Update to Remain Columns List:
The list of columns (remain_cols
) now explicitly includes"Global.PG.Q.Value"
after"Precursor.Quantity"
. This addition is clearly formatted and ensures that the report is filtered using all expected columns.
496-499
: Precursor m/z Calculation Formatting:
The calculation for"Calculate.Precursor.Mz"
has been re-indented—wrapping the addition within parentheses improves readability while leaving the underlying logic untouched.
599-609
: Concatenation Formatting for mzTab Settings:
The string concatenation for software setting entries (fragment and precursor mass tolerances) in the MTD sub‐table is now split over multiple lines. This change increases clarity with no alteration in functionality.
787-794
: Protein Abundance Aggregation Refinement:
Within the loop aggregating protein abundance data by study variable, the computation and subsequent drop of columns is now formatted in a more streamlined manner. This refactor improves readability and confirms that the mean is calculated correctly.
762-764
: Improved Concatenation in PRH Construction:
The call topd.concat
to merge protein detail rows with the existing PRH dataframe has been re-indented and reformatted for better clarity and maintainability.
883-889
: Method Signature Update inmztab_peh
:
The updated definition formztab_peh
now shows each parameter on its own line with a trailing comma. This formatting change enhances consistency across the codebase.
1059-1067
: Refactored Merge for Decoy Incorporation in PEH:
The merge block that adds the"Decoy"
column intoout_mztab_peh
has been reformatted over multiple lines. The updated formatting enhances readability while preserving the intended functionality.
1137-1145
: DataFrame Column Formatting in PSH Construction:
Within the PSH sub‐table, the conversion of the"opt_global_spectrum_reference"
column—by prepending"scan="
and ensuring data is cast to string—is now clearly separated. Also, the conversion of the RT values (dividing by 60 and ensuring the group's"RT"
is of type float64) is now explicit, promoting consistency across DIA‐NN versions.
1291-1295
: Spectra Reference String Reformatting in PSH:
The lambda that constructs the spectra reference string (concatenating"ms_run[{}]:"
with the original reference) has been broken into two lines. This reformat makes the string assembly clearer without modifying its behavior.
1592-1594
: Method Signature Update incalculate_protein_coverages
:
The function header now explicitly formats its parameters (with a trailing comma) across multiple lines. This small stylistic update improves consistency with other parts of the code.
1580-1582
: Enhanced Readability in Coverage Calculation:
The merging of overlapping intervals during the protein coverage calculation (incalculate_coverage
) has been reformatted for clarity, making the computation easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
32-33
: Consider adjusting logging level for productionSetting the logging level to DEBUG might generate excessive output in a production environment. Consider using INFO as the default level and allowing it to be configurable based on the environment.
-logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG) +logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.INFO)
110-132
: Use enumerate() instead of manual index incrementingWhen iterating through a collection and tracking an index, it's more Pythonic to use
enumerate()
rather than manually incrementing an index variable.- index = 0 - for precursor in spectrum.getPrecursors(): + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus rt = spectrum.getRT() # TODO: Review by @timo and @julianus precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "rt": rt, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
131-131: Use
enumerate()
for index variableindex
infor
loop(SIM113)
113-117
: Use logger instance instead of logging module directlyYou've created a logger instance but are using the logging module directly here. Use the logger instance for consistency.
- logging.info( + logger.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
quantmsutils/__init__.py
(1 hunks)quantmsutils/diann/dianncfg.py
(2 hunks)quantmsutils/mzml/mzml_statistics.py
(11 hunks)quantmsutils/psm/psm_conversion.py
(2 hunks)quantmsutils/sdrf/check_samplesheet.py
(2 hunks)quantmsutils/sdrf/extract_sample.py
(1 hunks)quantmsutils/utils/constants.py
(1 hunks)tests/test_commands.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- quantmsutils/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
- quantmsutils/sdrf/extract_sample.py
- quantmsutils/psm/psm_conversion.py
- quantmsutils/utils/constants.py
- quantmsutils/diann/dianncfg.py
- tests/test_commands.py
- quantmsutils/sdrf/check_samplesheet.py
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
131-131: Use enumerate()
for index variable index
in for
loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (10)
quantmsutils/mzml/mzml_statistics.py (10)
28-29
: Added new constants for schema consistencyAdding PRECURSORS and INTENSITY constants is a good improvement for maintaining schema definition consistency throughout the codebase.
74-87
: Improved documentation with clear docstring formatThe enhanced docstring with parameters section provides much better context for the method's functionality.
157-157
: Added new fields to row_data for improved MS2 spectrum informationThe addition of intensity and precursors data to row_data provides more comprehensive information about MS2 spectra.
Also applies to: 159-159
176-176
: Consistent handling for ms_level 1 spectraAdding the PRECURSORS field with a None value for MS level 1 spectra ensures schema consistency across all spectra records.
232-232
: Improved error handling with loggingUsing the logger for error messages with file path context is a significant improvement for debugging and traceability.
292-308
: Well-structured schema definition for precursors dataThe nested schema structure for precursors is well-defined and properly includes all necessary fields with appropriate data types.
346-346
: Consistent error handling with proper contextUsing logging consistently for error handling with file path information improves the diagnostic capabilities of the code.
397-413
: Schema consistency between mzML and Bruker .d file handlingMaintaining consistent schema structures between different file formats ensures uniform data representation across the system.
437-438
: Improved path handling with Path.with_name()Using Path.with_name() is cleaner and more robust than string manipulation for constructing output paths.
456-457
: Added success message for better operational visibilityAdding a success log message after processing provides valuable operational feedback and makes monitoring easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
105-128
: 🛠️ Refactor suggestionRemove TODO comments and use enumerate for iteration.
The code contains several TODO comments that should be addressed, and the manual index management can be simplified.
- precursors = [] - # capture if more than one precursor - index = 0 - for precursor in spectrum.getPrecursors(): + precursors = [] + # capture if more than one precursor + for index, precursor in enumerate(spectrum.getPrecursors()): logging.info( "Precursor charge: %s, precursor mz: %s", precursor.getCharge(), precursor.getMZ(), ) - charge_state = precursor.getCharge() # TODO: Review by @timo and @julianus - exp_mz = precursor.getMZ() # TODO: Review by @timo and @julianus - intensity = precursor.getIntensity() # TODO: Review by @timo and @julianus - rt = spectrum.getRT() # TODO: Review by @timo and @julianus + charge_state = precursor.getCharge() + exp_mz = precursor.getMZ() + intensity = precursor.getIntensity() + # rt is already calculated outside the loop and doesn't change per precursor precursors.append( { "charge": charge_state, "mz": exp_mz, "intensity": intensity, "rt": rt, "index": index, } ) - index += 1🧰 Tools
🪛 Ruff (0.8.2)
127-127: Use
enumerate()
for index variableindex
infor
loop(SIM113)
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
175-175
: Fix truncated log message.The log message is incomplete and might cause confusion.
- logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not ", ms_level) + logger.info("Skipping spectrum with MS level %s, MS not supported, or precursors not found", ms_level)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py
(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
127-127: Use enumerate()
for index variable index
in for
loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (11)
quantmsutils/mzml/mzml_statistics.py (11)
1-1
: Good addition of logging configuration.Adding structured logging with appropriate levels helps with debugging and monitoring application behavior.
Also applies to: 32-34
28-29
: Good addition of new constants.Adding
PRECURSORS
andINTENSITY
constants maintains consistency with other field naming in the codebase.
73-87
: Well-documented function with improved docstring.The enhanced docstring clearly explains the function's purpose, behavior, and parameters.
89-96
: Improved peak extraction and intensity calculations.The refactored code now directly uses
get_peaks()
to extract mz and intensity arrays, improving readability and potentially performance.
153-155
: Good addition of precursor and intensity information.Adding precursor details and intensity values to the row data enriches the output and provides more comprehensive information for analysis.
Also applies to: 172-172
229-229
: Improved error handling with context information.Adding file path information to the error message provides better context for troubleshooting.
289-304
: Well-structured schema for precursor data.The schema definition correctly handles the complex nested structure of precursor data, properly defining field types and nullability.
343-343
: Enhanced error logging with file path context.Adding file path information to error logs significantly improves troubleshooting capabilities.
394-409
: Consistent schema definition across functions.The precursor schema definition is consistently applied in the
batch_write_bruker_d
function, maintaining a uniform data structure throughout the application.
434-435
: Improved file path handling using Path methods.Using
Path.with_name()
instead of string manipulation improves readability and reliability of path construction.
453-453
: Added success logging.Adding success logging helps track the application flow and confirms successful file processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
151-153
:⚠️ Potential issueMissing test file causes pipeline failure.
Referencing
BSA1_F1_test_ms_info.parquet
leads to a FileNotFoundError. You must either include this file intest_data
or remove/change this reference to a valid existing file.-table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") +# table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet")🧰 Tools
🪛 GitHub Actions: Python application
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (4)
tests/test_commands.py (1)
157-157
: Consider verifying table contents rather than just length.A simple length comparison might mask data discrepancies. For more robust testing, consider comparing column-level data or using a DataFrame method like
pd.testing.assert_frame_equal(table1, table2)
.quantmsutils/mzml/mzml_statistics.py (3)
5-5
: Remove unused import.
typing.Optional
is imported but never used in this file.-from typing import Optional, Set, Union +from typing import Set, Union🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Optional
imported but unusedRemove unused import:
typing.Optional
(F401)
207-211
: Unify logging usage.Lines in this method use both
logging.info
andlogger.info
. To maintain consistency and configurability, consider replacing any directlogging.info
calls withlogger.info
.-logging.info( +logger.info(
239-239
: Use 'enumerate()' to handle index.Refactor the loop to leverage Pythonic iteration and avoid manual index increments.
-index = 0 -for precursor in spectrum.getPrecursors(): +for index, precursor in enumerate(spectrum.getPrecursors()):🧰 Tools
🪛 Ruff (0.8.2)
239-239: Use
enumerate()
for index variableindex
infor
loop(SIM113)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsutils/mzml/mzml_statistics.py
(10 hunks)tests/test_commands.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 153-153: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.Optional
imported but unused
Remove unused import: typing.Optional
(F401)
239-239: Use enumerate()
for index variable index
in for
loop
(SIM113)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
tests/test_commands.py (1)
7-7
: Good practice using a project-level constant for tests directory.Defining
TESTS_DIR = Path(__file__).parent
promotes clarity and maintainability when referencing relative testing paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
158-158
:⚠️ Potential issueMissing file breaks the test execution.
The Parquet fileBSA1_F1_test_ms_info.parquet
does not appear to exist, causing aFileNotFoundError
. To fix, either add the file totests/test_data
or compare withms_info_path
if the data sets are intended to be the same.- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") + table1 = pd.read_parquet(ms_info_path)🧰 Tools
🪛 GitHub Actions: Python application
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (7)
tests/test_commands.py (1)
149-153
: Clarify the comment for file removal.
The comment, "check if the file exist, delete it," is a bit ambiguous since it's happening forms_info_path
and notmzml_path
. Consider updating the comment to reflect which file is truly being checked and removed.- # check if the file exist, delete it + # check if the ms_info_path file exists and delete itquantmsutils/mzml/mzml_statistics.py (6)
5-5
: Remove unused importsTuple
andUnion
.
These imports are not used, as flagged by static analysis.-from typing import Dict, List, Optional, Set, Tuple, Union +from typing import Dict, List, Optional, Set🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Tuple
imported but unusedRemove unused import
(F401)
5-5:
typing.Union
imported but unusedRemove unused import
(F401)
13-13
: Remove unused importSchema
.
TheSchema
import frompyarrow
is not directly referenced and can be removed.-from pyarrow import Schema
🧰 Tools
🪛 Ruff (0.8.2)
13-13:
pyarrow.Schema
imported but unusedRemove unused import:
pyarrow.Schema
(F401)
22-22
: Remove unused importMONOISOTOPIC_MZ
.
This constant fromquantmsutils.utils.constants
is not being referenced in the code.- MONOISOTOPIC_MZ,
🧰 Tools
🪛 Ruff (0.8.2)
22-22:
quantmsutils.utils.constants.MONOISOTOPIC_MZ
imported but unusedRemove unused import
(F401)
30-30
: Remove unused importMAX_INTENSITY
.
It is never used in this file.- MAX_INTENSITY,
🧰 Tools
🪛 Ruff (0.8.2)
30-30:
quantmsutils.utils.constants.MAX_INTENSITY
imported but unusedRemove unused import
(F401)
351-351
: Remove or utilize theintensity
variable.
Currently,intensity
is assigned but never used, which is flagged by static analysis. Either remove it or incorporate it into the logic if needed.- intensity = precursor.getIntensity()
🧰 Tools
🪛 Ruff (0.8.2)
351-351: Local variable
intensity
is assigned to but never usedRemove assignment to unused variable
intensity
(F841)
609-609
: Preserve original exception context.
When re-raising exceptions, consider usingraise click.Abort() from e
to attach the original context.- raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
609-609: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsutils/mzml/mzml_statistics.py
(2 hunks)tests/test_commands.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.Tuple
imported but unused
Remove unused import
(F401)
5-5: typing.Union
imported but unused
Remove unused import
(F401)
13-13: pyarrow.Schema
imported but unused
Remove unused import: pyarrow.Schema
(F401)
22-22: quantmsutils.utils.constants.MONOISOTOPIC_MZ
imported but unused
Remove unused import
(F401)
30-30: quantmsutils.utils.constants.MAX_INTENSITY
imported but unused
Remove unused import
(F401)
351-351: Local variable intensity
is assigned to but never used
Remove assignment to unused variable intensity
(F841)
609-609: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
tests/test_commands.py (3)
1-1
: Good import usage.
Usingpathlib.Path
for building file paths is more reliable and portable than using raw strings.
7-7
: Initialization ofTESTS_DIR
looks good.
Defining a base directory for tests promotes maintainability and consistency across all test assets.
162-162
: Verify the correctness of the final assertion.
If you're comparing two different Parquet outputs, confirm that both files indeed exist and represent the same or comparable data. Otherwise, the test will fail due to the missing second file or mismatching data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
158-159
:⚠️ Potential issueFix the missing test file causing pipeline failures.
The pipeline is failing because the file
BSA1_F1_test_ms_info.parquet
doesn't exist. You need to either create this file or update the test to use an existing file.Apply this fix:
- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") + table1 = pd.read_parquet(ms_info_path)🧰 Tools
🪛 GitHub Actions: Python application
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🪛 GitHub Actions: Python package
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
🧹 Nitpick comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
1-32
: Improved logging configuration and imports.Good addition of proper logging configuration. This will help with debugging and monitoring. However, there are some unused imports from the typing module that should be cleaned up.
-from typing import Dict, List, Optional, Set, Tuple, Union +from typing import Dict, Optional, Set🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.List
imported but unusedRemove unused import
(F401)
5-5:
typing.Tuple
imported but unusedRemove unused import
(F401)
5-5:
typing.Union
imported but unusedRemove unused import
(F401)
27-27:
quantmsutils.utils.constants.MAX_INTENSITY
imported but unusedRemove unused import:
quantmsutils.utils.constants.MAX_INTENSITY
(F401)
27-27
: Remove unused constant import.The MAX_INTENSITY constant is imported but never used in this file.
- MAX_INTENSITY, PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW, + PRECURSOR_RT, PRECURSOR_TOTAL_INTENSITY, PRECURSOR_INTENSITY_WINDOW,🧰 Tools
🪛 Ruff (0.8.2)
27-27:
quantmsutils.utils.constants.MAX_INTENSITY
imported but unusedRemove unused import:
quantmsutils.utils.constants.MAX_INTENSITY
(F401)
599-601
: Improve exception handling with exception chaining.When re-raising exceptions, use explicit exception chaining to preserve the original traceback.
- logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + logger.error(f"Error processing file {ms_path}: {e}") + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
601-601: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
quantmsutils/mzml/mzml_statistics.py
(2 hunks)quantmsutils/utils/constants.py
(1 hunks)tests/test_commands.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/utils/constants.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 158-158: FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet'
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.List
imported but unused
Remove unused import
(F401)
5-5: typing.Tuple
imported but unused
Remove unused import
(F401)
5-5: typing.Union
imported but unused
Remove unused import
(F401)
27-27: quantmsutils.utils.constants.MAX_INTENSITY
imported but unused
Remove unused import: quantmsutils.utils.constants.MAX_INTENSITY
(F401)
601-601: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (3)
1-8
: Improved test file organization with Path and constants.Good refactoring to use pathlib.Path and the TESTS_DIR constant. This makes the test file paths more maintainable and platform-independent.
148-156
: Clear test file path management using TESTS_DIR.The updates to use TESTS_DIR for file paths improve readability and maintainability. The file existence check and deletion before test is a good practice to ensure clean test runs.
162-162
:❓ Verification inconclusive
Updated assertion to compare table lengths instead of using compare().
This change from checking
table1.compare(table2).empty
to comparing lengths is more lenient. Make sure this doesn't hide potential differences in the table contents that should be verified.
🏁 Script executed:
#!/bin/bash # Check the previous version of the test to see what was being verified git show HEAD^ -- tests/test_commands.py | grep -A 5 "assert.*table"Length of output: 100
Action: Manual Verification Required on Table Assertion Change
The recent update in
tests/test_commands.py
(line 162) replaces the stricter content comparison (table1.compare(table2).empty
) with a simpler length check (assert len(table2) == len(table1)
). This update is more lenient and might miss discrepancies in the actual table data. Additionally, the earlier version of the test could not be retrieved due to the absence of a valid previous commit (error withHEAD^
), so it's unclear how comprehensive the original check was.
- Review Area: Confirm whether comparing only the table lengths is sufficient or if a more detailed comparison is needed to capture potential data differences.
- Additional Action: Manually inspect the commit history for
tests/test_commands.py
(or use another valid reference) to understand the intended behavior of the previous assertion.quantmsutils/mzml/mzml_statistics.py (5)
34-64
: Well-structured schema definitions with clear documentation.The separation of schemas into create_ms_schema() and create_id_schema() functions with proper documentation is a good practice. This makes the code more maintainable and easier to understand.
207-373
: Comprehensive refactoring of spectrum transformation logic.The transform_mzml_spectrum method is a significant improvement over the previous implementation. It:
- Correctly handles MS level filtering
- Properly extracts peak data using get_peaks()
- Includes detailed precursor data extraction
- Handles edge cases and logs errors appropriately
The code is well-structured with helpful comments explaining the processing logic.
374-420
: Enhanced batch writing with improved error handling.The _write_batch method improvements include better error logging and a more efficient approach to writing batches. The use of RecordBatch.from_pylist is more memory-efficient than converting through pandas first.
481-487
: Include batch size in BatchWritingConsumer instantiation.Good improvement to pass the batch_size parameter to the BatchWritingConsumer. This allows for more flexibility in controlling memory usage during processing.
489-496
: Added acquisition datetime retrieval with fallback logic.The improved acquisition datetime handling with fallback options is a good enhancement that ensures this metadata is captured when available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_commands.py (1)
137-142
:⚠️ Potential issueFix the reference to the missing test file.
The code is still trying to read
BSA1_F1_test_ms_info.parquet
which was previously identified as missing in the repository. This will cause test failures.- table1 = pd.read_parquet(TESTS_DIR / "test_data" / "BSA1_F1_test_ms_info.parquet") - table2 = pd.read_parquet(ms_info_path) - table2 = table2.set_index("scan") - table1 = table1.set_index("scan") - assert len(table2) == len(table1) + table1 = pd.read_parquet(ms_info_path) + table1 = table1.set_index("scan") + # Additional assertions can be added here to validate the content of the file + assert len(table1) > 0Or, if two different files are truly needed, ensure that
BSA1_F1_test_ms_info.parquet
exists in the test_data directory.
🧹 Nitpick comments (5)
README.md (1)
39-48
: Well-structured documentation with clear examples.The collapsible details section with information about the MS info output format is helpful. Consider adding a space after the bullet points for consistent formatting.
-... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: +... the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: -... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). +... The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). -... The MS level of the signal, 1 for MS and 2 for MS/MS. +... The MS level of the signal, 1 for MS and 2 for MS/MS. -... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`. +... The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
quantmsutils/mzml/mzml_statistics.py (4)
1-1
: Good addition of logging configuration.Adding proper logging setup with appropriate formatting is a significant improvement for debugging and monitoring. However, consider removing the unused
List
import from typing.-from typing import Dict, List, Optional, Set +from typing import Dict, Optional, SetAlso applies to: 5-5, 32-34
387-424
: Enhanced batch writing with better error handling.The batch writing method now includes better error handling and logging of exceptions, which will make debugging issues easier.
When raising exceptions in an except block, consider using
raise ... from e
to preserve the original exception context:- logger.error(f"Error during batch writing: {e}, file path: {self.output_path}") - raise + logger.error(f"Error during batch writing: {e}, file path: {self.output_path}") + raise Exception(f"Batch writing failed") from e
511-513
: Improve exception handling in streaming function.Similar to the previous comment, consider preserving the exception context when raising exceptions inside exception handlers.
- except Exception as e: - logger.error(f"Error during mzML streaming: {e}, file path: {file_name}") - return None + except Exception as e: + logger.error(f"Error during mzML streaming: {e}, file path: {file_name}") + return None
602-604
: Improve CLI exception handling.When raising in an except block, use
raise ... from e
to preserve the exception context:- except Exception as e: - logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + except Exception as e: + logger.error(f"Error processing file {ms_path}: {e}") + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md
(1 hunks)quantmsutils/mzml/mzml_statistics.py
(2 hunks)tests/test_commands.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
5-5: typing.List
imported but unused
Remove unused import: typing.List
(F401)
604-604: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🪛 LanguageTool
README.md
[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (8)
tests/test_commands.py (2)
1-2
: Good use of thepathlib
module and constant definition.Using Path from the pathlib module and defining a TESTS_DIR constant are good practices to make the test code more maintainable and portable.
Also applies to: 7-8
129-135
: Check file path construction and existence.The file path construction using TESTS_DIR is a good improvement, but there appears to be a check for deleting the main ms_info file before running the test. Make sure this is intentional and that the file is correctly regenerated during the test.
README.md (1)
35-38
: Good documentation addition for mzML statistics.The new section provides valuable context about the mzML statistics functionality. Consider adding "the" before "quantms ecosystem" for improved grammar.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ...d by multiple tools and packages within quantms ecosystem for quality control, mzTab ge...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
quantmsutils/mzml/mzml_statistics.py (5)
36-66
: Well-structured schema definitions with clear type annotations.Creating dedicated functions for schema definitions improves code organization and reusability. The schemas are well-defined with appropriate field types and nullability.
68-141
: Enhanced Bruker file processing with better error handling.The improvements to the Bruker file processing include proper error handling, acquisition datetime retrieval, and better logging. This is a significant enhancement to the code quality.
209-269
: Improved spectrum transformation with proper peak extraction.The spectrum transformation method has been significantly improved with better error handling and extraction of peak data. The TODOs have been properly addressed as suggested in previous reviews.
270-274
: Good handling of missing precursor intensity.Using the calculated precursor intensity when the regular intensity is missing or zero is a good fallback strategy.
330-386
: Well-implemented precursor data extraction.The precursor data extraction is now more robust with proper error handling and logging of missing precursors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
quantmsutils/mzml/mzml_statistics.py (6)
68-141
: Consider renamingcolumn_exists
for clarity
Because this function returns a set of columns instead of a boolean, a name likeget_table_columns
could be more intuitive. Otherwise, the chunk-based reading logic and writing to Parquet is well-implemented.
265-296
: Ensure consistent usage ofPRECURSOR_INTENSITY_WINDOW
It’s set toNone
in the MS1 branch. For consistency, consider handling it similarly in MS2 or removing it entirely if not needed.
298-316
: MS1 branch appears clean
If there's overlap with the MS2 branch, consider a shared helper to reduce duplicated logic.
323-329
: Regex-based scan ID extraction
If multiple matches occur, consider logging a warning or handling them specifically.
516-553
: Path resolution logic
Consider refining behavior when multiple valid candidates are found. Currently, an error is raised, but you might provide a fallback or clearer warning message.
555-605
: Preserve exception context
When re-raising viaraise click.Abort()
, consider usingraise click.Abort() from e
to maintain the original traceback context.🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
604-604: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (13)
quantmsutils/mzml/mzml_statistics.py (13)
1-5
: Imports look good
No issues found with these import additions.
12-12
: No issues with this import
Importingpyopenms
is appropriate here.
32-53
: Nice introduction of logging and well-structured schema
Defining the logging configuration and creating a clear PyArrow schema is a solid foundation for handling spectral data.
56-65
: Schema definitions are consistent and well-defined
The ID-only schema is clearly specified and matches expected data structures.
143-161
: Same rename suggestion
The function name can be misleading since it returns a set of columns rather than a boolean.
165-196
: Constructor docstring is comprehensive
All parameters are thoroughly explained, making the initialization straightforward to understand.
209-239
: Implementation of spectrum transformation looks solid
Skipping unsupported MS levels and proceeding with MS1 or MS2 is clear and well-structured.
240-247
: Correct intensity calculations
Base peak and total intensities are properly handled, including the case with zero peaks.
249-264
: Conditional logic for MS2 detection is well-handled
Calling_extract_first_precursor_data
for MS2 spectra aligns with the design.
330-386
: Precursor purity logic
Catching and logging exceptions at debug level is a good choice for diagnosing purity calculation issues.
387-425
: Incremental batch writing logic
Lazy initialization of parquet writers is an effective way to manage resources.
426-442
: Finalize method
Properly flushes any remaining data and closes writers. No issues found.
444-514
: Efficient streaming for mzML files
The chunk-based approach is likely performant for large files. Consider adding tests for varied file sizes and missing metadata scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
555-604
: CLI command with improved error handling, but missing exception chainingThe CLI command has been enhanced with improved documentation and error handling. However, there's a static analysis warning about exception chaining being missing.
At line 604, implement proper exception chaining to preserve the original traceback:
except Exception as e: logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
604-604: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.markdownlint.json
(1 hunks)quantmsutils/mzml/mzml_statistics.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- .markdownlint.json
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
604-604: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (9)
quantmsutils/mzml/mzml_statistics.py (9)
1-33
: Logging configuration added - good improvement!The addition of proper logging configuration with levels and formatting enhances the overall code quality. This is consistent with similar enhancements made in other files mentioned in the summary.
36-66
: Well-structured schema definitions with proper type annotationsThe new
create_ms_schema()
andcreate_id_schema()
functions provide clear, reusable schema definitions with proper PyArrow types and nullability settings. This modular approach improves code organization.
68-141
: Improved error handling and batch processing for Bruker filesThe enhanced
batch_write_bruker_d
function now includes proper exception handling, logging, and batch processing. The code handles acquisition datetime retrieval and safely constructs SQL queries.
143-161
: Well-documented helper function for column checkingThe
column_exists
function provides a clean way to check for column existence in SQLite tables with proper typing and documentation.
323-329
: Good extraction of scan ID from native IDThe
_extract_scan_id
helper method uses a regex pattern to extract scan IDs, improving code readability and maintainability.
330-386
: Comprehensive precursor data extraction with error handlingThe
_extract_first_precursor_data
method thoroughly extracts precursor information with proper error handling using try-except blocks and debug logging.
387-425
: Improved batch writing with error handlingThe
_write_batch
method now includes proper error logging with file path information, which will be valuable for troubleshooting.
444-514
: Enhanced mzML streaming with proper error handling and loggingThe
batch_write_mzml_streaming
function now includes comprehensive error handling, schema creation, and acquisition datetime extraction.
516-553
: Improved path resolution with informative loggingThe
resolve_ms_path
function now includes better logging for path resolution, which helps with debugging path-related issues.
# Conflicts: # quantmsutils/mzml/mzml_statistics.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
quantmsutils/mzml/mzml_statistics.py (4)
29-29
: Remove unused import to clean up code.The constant
PRECURSOR_INTENSITY_WINDOW
is imported here but never referenced in the file. Consider removing this import to maintain a clean codebase.- PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
68-137
: Secure SQL column handling and robust batch writing.
- The dynamic column selection using
column_exists
mitigates SQL injection risk.- Streaming the data in chunks helps with memory usage and is aligned with best practices.
- Consider re-raising the original error context for improved debuggability.
except Exception as e: logger.error(f"Error processing Bruker .d file: {e}") - raise + raise e from None # or directly use `raise e` from the original context
271-271
: Clarify logical condition.The condition
if intensity is None or intensity == 0.0 and first_precursor_calculated:
might be ambiguous. If you intended(intensity is None or intensity == 0.0) and first_precursor_calculated
, consider adding parentheses to clarify the precedence.- if intensity is None or intensity == 0.0 and first_precursor_calculated: + if (intensity is None or intensity == 0.0) and first_precursor_calculated:
603-603
: Preserve original exception context.To distinguish exceptions due to downstream errors from those in the exception handler, consider chaining the exception explicitly when re-raising or aborting.
logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
603-603: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/mzml/mzml_statistics.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
603-603: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
quantmsutils/mzml/mzml_statistics.py (3)
32-33
: Good job adding a global logging configuration.The chosen logging format provides valuable info (timestamp, function name, message). Make sure to adapt the verbosity level based on the environment (development vs. production).
36-54
: Schema definition appears consistent.The fields defined in
create_ms_schema()
properly align with how they are populated downstream. This ensures consistent data typing in the Parquet output.
56-66
: ID-only schema is well-defined.Each field (scan, MS level, mz & intensity arrays) has an appropriate type. This separates ID-only data from the main data effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/test_commands.py (2)
16-16
:⚠️ Potential issueFix the reference to the missing test file.
The file
BSA1_F1_test_ms_info.parquet
doesn't exist in the test_data directory, causing the pipeline to fail. This is the same issue identified in the previous review.Either create this file in the test_data directory or modify the test to use an existing file.
127-132
:⚠️ Potential issueCreate missing reference file or modify test verification approach.
The test fails because
BSA1_F1_test_ms_info.parquet
doesn't exist, yet the test tries to use it as a reference for comparison. This issue was previously identified but hasn't been resolved.Options to fix this:
- Create the missing reference file by running the command manually and copying the output to the test location
- Modify the test to compare specific attributes instead of comparing with a reference file:
# Compare with reference data output_table = pd.read_parquet(BSA_MS_INFO_FILE).set_index("scan") - reference_table = pd.read_parquet(BSA_TEST_MS_INFO_FILE).set_index("scan") - - assert len(output_table) == len(reference_table), "Number of scans doesn't match" + # Verify specific expectations about the output + assert len(output_table) > 0, "Output table is empty" + + expected_columns = ["scan", "rt", "ms_level", "precursor_charge"] + for col in expected_columns: + assert col in output_table.columns, f"Expected column {col} missing from output"🧰 Tools
🪛 GitHub Actions: Python application
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🪛 GitHub Actions: Python package
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🧹 Nitpick comments (4)
tests/test_commands.py (4)
1-2
: Remove unused import.The
os
module is imported but not used anywhere in the file.from pathlib import Path -import os import pandas as pd import pytest
🧰 Tools
🪛 Ruff (0.8.2)
2-2:
os
imported but unusedRemove unused import:
os
(F401)
66-68
: Enhance test assertions for output validation.The test checks if the command executed successfully but doesn't verify if the output is correct.
Consider adding assertions to verify the output files exist and contain the expected data. For example:
result = run_cli_command("diann2mztab", args) assert result.exit_code == 0 - # Additional assertions could check for expected output files + # Verify output files + output_file = DIANN_TEST_DIR / "combined.mztab" + assert output_file.exists(), "Output mzTab file was not created" + + # Verify file has expected content + with open(output_file, 'r') as f: + content = f.read() + assert "MTD" in content, "Output file doesn't contain MTD section" + assert "PRT" in content, "Output file doesn't contain PRT section"
100-102
: Add assertions to verify output files for PSM conversion.Similar to other tests, this test only checks the exit code but doesn't verify the output.
Add assertions to verify that the expected output files exist and contain the correct data:
result = run_cli_command("psmconvert", args) assert result.exit_code == 0 - # Could add assertions to check for expected output files + # Verify output files + expected_output = Path.cwd() / "psm.parquet" # Adjust path as needed + assert expected_output.exists(), "Output file was not created" + + # Verify file content + output_df = pd.read_parquet(expected_output) + assert len(output_df) > 0, "Output file is empty" + assert "peptide_sequence" in output_df.columns, "Expected column missing"🧰 Tools
🪛 GitHub Actions: Python application
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
🪛 GitHub Actions: Python package
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
142-144
: Add more thorough verification for TMT statistics.The test verifies the file exists and is not empty, but could do more to validate its content.
Add more specific validations for the TMT statistics output:
output_table = pd.read_parquet(TMT_MS_INFO_FILE) assert len(output_table) > 0, "Output table is empty" - - # Could add assertions for expected columns or data validation + + # Validate specific columns and data types + expected_columns = ["scan", "rt", "ms_level", "precursor_charge"] + for col in expected_columns: + assert col in output_table.columns, f"Expected column {col} missing from output" + + # Validate data content + assert output_table["ms_level"].isin([1, 2]).all(), "MS levels should be either 1 or 2" + assert (output_table["rt"] >= 0).all(), "Retention times should be non-negative"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/test_data/BSA1_F1_spectrum_df.parquet
is excluded by!**/*.parquet
📒 Files selected for processing (1)
tests/test_commands.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_commands.py
2-2: os
imported but unused
Remove unused import: os
(F401)
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tests/test_commands.py (3)
16-16
:⚠️ Potential issueFix the missing test file reference that's causing pipeline failures.
The file
BSA1_F1_test_ms_info.parquet
doesn't exist in the test_data directory according to the pipeline errors. This is causing the test at line 129 to fail.You have two options:
- Create this reference file in the test_data directory
- Modify the test to use an existing file instead
94-102
:⚠️ Potential issueFix the failing psmconvert command
The CI pipeline shows this test is failing with exit code 2 instead of the expected 0. The previous reviewer identified this may be due to a missing dependency (
numpy
).Ensure that:
- All dependencies are correctly installed
- The command arguments are correct and valid
- The implementation of the psmconvert command works with these inputs
🧰 Tools
🪛 GitHub Actions: Python application
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
🪛 GitHub Actions: Python package
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
128-130
:⚠️ Potential issueFix reference data handling
The test is trying to compare with a reference file (
BSA1_F1_test_ms_info.parquet
) that doesn't exist, causing pipeline failures.Either:
- Add the missing reference file to the repository
- Generate a reference file as part of the test setup
- Use a different approach that doesn't require a preexisting reference file
🧰 Tools
🪛 GitHub Actions: Python application
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🪛 GitHub Actions: Python package
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
🧹 Nitpick comments (1)
tests/test_commands.py (1)
67-68
: Consider adding output validationThe test verifies the command executes successfully, but doesn't validate the content of the output files. Consider adding assertions to verify specific output characteristics.
result = run_cli_command("diann2mztab", args) assert result.exit_code == 0 - # Additional assertions could check for expected output files + # Verify output file exists and has expected content + output_file = DIANN_TEST_DIR / "output.mztab" + assert output_file.exists(), "Output file was not created" + # Add assertions to check file content
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.gitignore
(1 hunks)tests/test_commands.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' did not execute successfully.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 101-101: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
[error] 129-129: FileNotFoundError: No such file or directory: '/home/runner/work/quantms-utils/quantms-utils/tests/test_data/BSA1_F1_test_ms_info.parquet' in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)
8-18
: Good use of constants for better maintainability!Defining path constants at the top of the file is an excellent practice that improves readability and makes future path updates easier.
21-28
: Well-designed helper function!Creating a reusable
run_cli_command
function reduces duplication across the test suite and makes the code more maintainable.
31-50
: Good use of parameterization!Consolidating CLI help message tests using pytest's parametrize is a great improvement that follows testing best practices.
🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
108-117
: Good test fixture setup!Using a pytest fixture for setup and teardown is an excellent practice. This ensures tests start with a clean state and don't interfere with each other.
146-149
: Good validation of output structure!Checking for the presence of expected columns is a good validation approach.
150-160
: Well-structured skipped testMarking tests that require large files as skipped by default is a good practice. This allows local testing without slowing down CI pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
quantmsutils/mzml/mzml_statistics.py (1)
613-615
: Consider preserving the original exception cause.
Raisingclick.Abort()
directly loses the traceback context. Usingraise click.Abort() from e
provides better debugging information, though it’s optional.🧰 Tools
🪛 Ruff (0.8.2)
615-615: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/test_data/tmt_erwinia_1ulsike_top10hcd_isol2_45stepped_60min_01_sage_ms2rescore.idxml.feature_names.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (3)
quantmsutils/mzml/mzml_statistics.py
(1 hunks)quantmsutils/psm/psm_conversion.py
(5 hunks)tests/test_commands.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
[error] 135-135: AssertionError: Expected column scan missing from output in test_mzml_statistics_bsa.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.
[error] 135-135: AssertionError: Expected column scan missing from output.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
615-615: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (6)
tests/test_commands.py (6)
9-17
: Good use of constants for maintainability.
Defining all file and directory constants at the top promotes clarity and eases maintenance.
20-28
: Helper function is well-structured.
Centralizing CLI invocation logic inrun_cli_command
helps avoid duplication and simplifies testing.
30-49
: CLI help message tests look good.
Parameterizing the test over multiple commands is a neat, scalable approach.🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
51-68
: DIA-NN to mzTab test is clear.
The test arguments and usage align well with the documented functionality.
70-88
: Samplesheet validation tests are properly organized.
The separation of concerns and explicit checks make these tests easy to extend.
120-128
: Solid baseline for mzML statistics testing.
The test covers file creation checks and ensures the broad functionality is correct.
result = run_cli_command("psmconvert", args) | ||
assert result.exit_code == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigate failing 'psmconvert' command.
The pipeline indicates the test returns exit code 2 instead of 0. Confirm that the input files exist and all dependencies (e.g., numpy) are installed. Also ensure the command invocation is correct.
🧰 Tools
🪛 GitHub Actions: Python application
[error] 102-102: AssertionError: assert 2 == 0 in test_convert_psm. The command 'psmconvert' returned an exit code of 2.
🪛 GitHub Actions: Python package
[error] 102-102: AssertionError: assert 2 == 0. The command 'psmconvert' returned an exit code of 2.
SUMMED_PEAK_INTENSITY, | ||
PRECURSOR_RT, | ||
PRECURSOR_TOTAL_INTENSITY, | ||
PRECURSOR_INTENSITY_WINDOW, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused import.
PRECURSOR_INTENSITY_WINDOW
is never referenced, so it should be removed to avoid clutter.
- PRECURSOR_INTENSITY_WINDOW,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
PRECURSOR_INTENSITY_WINDOW, |
🧰 Tools
🪛 Ruff (0.8.2)
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_commands.py (2)
20-27
: Helper function reduces code duplicationThe
run_cli_command
function is a good abstraction that eliminates repetitive code. It handles both help message testing and command execution with arguments.Consider adding a return type annotation to make the function's interface clearer:
-def run_cli_command(command, args=None): +def run_cli_command(command, args=None) -> click.testing.Result:
138-147
: Good use of skip marker for local-only testsUsing
pytest.mark.skip
for tests that should only run locally with larger files is a good practice. This prevents these tests from slowing down CI pipelines while still keeping them in the codebase for local verification.Consider adding a more specific reason for skipping:
-@pytest.mark.skip("Test to be run locally, with big files") +@pytest.mark.skip(reason="Uses large files that would slow down CI pipelines and consume excessive resources")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_commands.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Python application
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
tests/test_commands.py
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (5)
tests/test_commands.py (5)
9-18
: Well-organized constants improve maintainabilityI like how you've defined all path and file constants at the top of the file. This improves maintainability and makes it easier to update paths if needed. The constants follow a clear naming convention, making their purpose obvious.
As a minor suggestion, consider adding a docstring at the top of the file to explain its purpose and the overall testing strategy.
30-49
: Excellent use of parametrized testingConverting individual test methods into a single parametrized test is a best practice that reduces code duplication and makes adding new command tests easier.
The docstrings are clear and the assertions are appropriate for verifying help messages.
🧰 Tools
🪛 GitHub Actions: Python application
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
🪛 GitHub Actions: Python package
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
106-119
: Good test setup with fixture for cleanupUsing a pytest fixture with
autouse=True
for setup and teardown is a clean approach. This ensures each test runs in a clean environment by removing output files before the test and allowing for cleanup after.I particularly like how you're explicitly checking for and removing existing files to prevent test interference.
1-147
:❓ Verification inconclusive
Address deprecation warnings in the pipeline
The pipeline shows deprecation warnings that should be addressed:
FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
While these don't appear directly in this file, they're triggered during test execution and should be fixed in the relevant implementation files.
🏁 Script executed:
#!/bin/bash # Find where the deprecated methods are used echo "Searching for deprecated DataFrame.applymap usage..." rg -l "\.applymap\(" --type py echo "Searching for deprecated is_categorical_dtype usage..." rg -l "is_categorical_dtype" --type pyLength of output: 305
Deprecation Warnings: Manual Verification Recommended
Our search for direct usages of
DataFrame.applymap
andis_categorical_dtype
in the repository (including in key modules such asquantmsutils/quantmsutilsc.py
) did not yield any matches. Given that these warnings are observed during test runs, it's possible they stem from indirect calls (e.g., through dependencies or deeper library internals).
- Please manually verify if these deprecated calls are coming from our own code or from an external dependency.
- If found within our implementations, update:
DataFrame.applymap
toDataFrame.map
- The usage of
is_categorical_dtype
to an appropriate type-check usingisinstance(dtype, pd.CategoricalDtype)
- Focus especially on the implementation files that transform or manipulate data, such as in
quantmsutils/quantmsutilsc.py
.🧰 Tools
🪛 GitHub Actions: Python application
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
🪛 GitHub Actions: Python package
[error] 102-102: AssertionError: assert 2 == 0. The test 'test_convert_psm' failed due to unexpected exit code.
[warning] 38-38: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
[warning] 89-89: DeprecationWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead.
120-136
:❓ Verification inconclusive
Test has comprehensive assertions but had past failures
The test has solid assertions checking both the exit code and the existence of expected output files, followed by content validation.
Past review comments indicated an issue with the 'scan' column missing from output. The current code checks for this (line 135), which should prevent that issue from recurring.
🏁 Script executed:
#!/bin/bash # Check if TMT files exist echo "Checking test data files..." ls -la ${TEST_DATA_DIR}/TMT* 2>/dev/null || echo "TMT files not found" # Check whether the test can run without errors echo "Running mzml statistics test..." python -c " import os import sys from pathlib import Path import pandas as pd from click.testing import CliRunner from quantmsutils.quantmsutilsc import cli TMT_MZML_FILE = '${TMT_MZML_FILE}' TMT_MS_INFO_FILE = '${TMT_MS_INFO_FILE}' # Remove output files if they exist if os.path.exists(TMT_MS_INFO_FILE): os.unlink(TMT_MS_INFO_FILE) runner = CliRunner() result = runner.invoke(cli, ['mzmlstats', '--ms2_file', '--ms_path', TMT_MZML_FILE]) print(f'Exit code: {result.exit_code}') if result.exception: print(f'Exception: {result.exception}') print(f'Traceback: {result.exc_info}') if os.path.exists(TMT_MS_INFO_FILE): print('Output file created successfully') df = pd.read_parquet(TMT_MS_INFO_FILE) print(f'Columns: {df.columns.tolist()}') print(f'Row count: {len(df)}') else: print('Output file not created') "Length of output: 1210
Action Required: Verify Test Environment Dependencies
The updated test implementation now robustly validates the output—including checking for the existence of the “scan” column—to address the past issue. However, manual verification is necessary because our test script execution revealed an environment misconfiguration: Pandas was missing from the testing environment (i.e.
ModuleNotFoundError: No module named 'pandas'
). Additionally, the test data files (TMT files) were not found during the check.
- Please ensure that all required dependencies (e.g., Pandas) are installed in the testing environment.
- Verify that the test data file paths (e.g.,
${TEST_DATA_DIR}/TMT*
) are correctly configured and accessible.Once these issues are resolved and the environment is properly set up, rerun the tests to confirm that the assertions (including the check for the “scan” column) pass as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/test_data/STATIC_TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_ms2_info.parquet
is excluded by!**/*.parquet
📒 Files selected for processing (1)
tests/test_commands.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (7)
tests/test_commands.py (7)
9-18
: Good use of constants for improved maintainability.Defining constants at the top of the file makes the code more maintainable and reduces the risk of path-related errors. Using the
Path
object frompathlib
is also a good practice for cross-platform compatibility.
21-28
: Well-designed helper function for CLI command execution.The
run_cli_command
function nicely abstracts the common pattern of creating a CLI runner and invoking commands, making the test code more concise and following the DRY principle.
31-50
: Excellent use of pytest parameterization for testing help messages.Converting individual test functions into a parameterized test reduces code duplication and makes it easier to add new commands to test in the future.
107-120
: Good use of pytest fixture for test setup and teardown.Using a fixture with
autouse=True
ensures that test files are properly cleaned up before and after tests run, preventing interference between test runs.
138-148
: Good practice using pytest.mark.skip for resource-intensive tests.Marking tests that require large files or significant resources with
pytest.mark.skip
is a good practice. It allows other tests to run quickly while still keeping the test code available for local debugging.
1-148
: Overall refactoring improves test organization and maintainability.The refactoring of this test file follows best practices by:
- Organizing related tests into classes
- Using fixtures for setup/teardown
- Applying parameterization to reduce duplication
- Providing clear docstrings for tests
- Using consistent naming patterns
This structure makes the tests more maintainable and easier to extend in the future.
134-137
:❓ Verification inconclusive
Ensure 'scan' column is correctly generated in the output.
Previous pipeline failures indicated that the 'scan' column was missing from the output. This test now checks for its presence, but verify that the implementation correctly generates this column.
🏁 Script executed:
#!/bin/bash # Run the mzmlstats command to generate the output file python -m quantmsutils.quantmsutilsc mzmlstats --ms2_file --ms_path "${TMT_MZML_FILE}" # Check if the output file exists and contains the expected columns echo "Checking output file for 'scan' column:" python -c " import pandas as pd file_path = '${TMT_MS_INFO_FILE}' try: df = pd.read_parquet(file_path) if 'scan' in df.columns: print(f\"✓ 'scan' column is present in {file_path}\") print(f\"Column values (first 5 rows): {df['scan'].head().tolist()}\") else: print(f\"✗ 'scan' column is missing from {file_path}\") print(f\"Available columns: {df.columns.tolist()}\") except Exception as e: print(f\"Error reading file: {e}\") "Length of output: 1197
ACTION REQUIRED: Ensure dependency installation and verify 'scan' column generation
The execution failed due to missing modules (
numpy
andpandas
), so we couldn’t confirm whether the output contains the'scan'
column. Please make sure all required dependencies are installed in the test environment and re-run the pipeline. Once dependencies are in place, manually verify that the output file (referenced by${TMT_MS_INFO_FILE}
) correctly includes the'scan'
column as intended.
class TestPSMConversion: | ||
""" | ||
Test class for PSM conversion commands, it takes an ms2 file in parquet and an idXML file | ||
and converts it to a parquet file with PSM information. | ||
""" | ||
def test_convert_psm(self): | ||
"""Test converting PSM data""" | ||
args = [ | ||
"--idxml", str(TMT_IDXML_FILE), | ||
"--ms2_file", str(TMT_STATIC_MS2_INFO_FILE), | ||
] | ||
result = run_cli_command("psmconvert", args) | ||
assert result.exit_code == 0 | ||
# Could add assertions to check for expected output files | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify that dependencies for psmconvert are correctly installed.
Previous pipeline failures indicated issues with missing dependencies (e.g., numpy) for the psmconvert
command. While the test structure has been improved, ensure that all required dependencies are properly installed in the testing environment.
🏁 Script executed:
#!/bin/bash
# Check if numpy is installed and its version
echo "Checking for numpy installation:"
pip list | grep numpy || echo "numpy not found!"
# Check if the test files exist
echo "Checking if test files exist:"
ls -la ${TMT_IDXML_FILE} ${TMT_STATIC_MS2_INFO_FILE} 2>/dev/null || echo "Test files not found"
# Try importing numpy in a Python script
echo "Attempting to import numpy:"
python -c "import numpy; print(f'Numpy version: {numpy.__version__}')" || echo "Failed to import numpy"
Length of output: 1492
Missing Dependency: Ensure numpy is installed in the testing environment
The latest test run confirms that numpy is not installed (output shows “numpy not found!” and a failure when attempting to import numpy). Please update the dependency configuration (e.g., add numpy to requirements.txt
or environment.yml
) so that the psmconvert
command has all necessary packages during testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
README.md (6)
35-36
: Header Formatting: Standardize "mzML Statistics" Headline.Consider capitalizing and formatting the header as "mzML Statistics" (instead of "mzml statistics") to match the conventional capitalization of the mzML format for clarity and consistency.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37
: Typographical Error in Section Description.There is a typo in the phrase "gow they are computed." It should be corrected to "how they are computed."
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
39-43
: Details Section Clarity.The details section is well-structured and clearly explains the purpose and output of the
mzmlstats
functionality. For further clarity, consider adding a brief example demonstrating the output filename format.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-49
: Field Descriptions and Markdown Formatting.The list describing each field is comprehensive and informative. However, please review the punctuation and spacing to address minor markdownlint warnings (e.g., trailing spaces and line-length issues) as mentioned in the static analysis hints. Paying attention to these details will further improve readability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities
: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt
: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
51-57
: MS/MS Signal Field Details Consistency.The additional fields for MS/MS signals are clearly described. Ensure that all field names (e.g.,
precursor_charge
,precursor_mz
,precursor_rt
, andprecursor_total_intensity
) are formatted uniformly with backticks and consistent punctuation.🧰 Tools
🪛 LanguageTool
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge
: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt
: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity
: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
58-69
: Note Block and Code Fencing Adjustments.The note section effectively explains how precursor-related information is computed. Consider the following improvements:
- Ensure that the fenced code block is surrounded by blank lines (per MD031 guidelines).
- Correct the typo "pyopnems" to "pyopenms" to accurately reference the library.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md
(1 hunks)quantmsutils/utils/constants.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/utils/constants.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities
: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt
: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge
: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt
: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity
: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
README.md (1)
71-73
: Contributions and Issues Section.The "Contributions and issues" section is clear and directs users appropriately to the GitHub repository for further engagement. No changes are needed in this part.
🧰 Tools
🪛 LanguageTool
[typographical] ~73-~73: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...(PLEASE_UNNECESSARY_COMMA)
🪛 markdownlint-cli2 (0.17.2)
73-73: Line length
Expected: 80; Actual: 196(MD013, line-length)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
README.md (3)
35-36
: Header Capitalization & ConsistencyThe new header "
#### mzml statistics
" is a clear indicator for this section. For consistency and clarity—especially given standard usage in proteomics—it would be better to capitalize it as "#### mzML Statistics
".🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-38
: Typo & Clarity in Descriptive TextThe introductory text contains a minor typographical error: "
gow they are computed
" should be corrected to "how they are computed
". Additionally, you might consider rephrasing the sentence for clarity. For example, starting with "The quantms-utils package provides multiple scripts…
" could improve the flow.Proposed Diff:
- Here are some details about the formats, the fields they contain and gow they are computed. + Here are some details about the formats, the fields they contain and how they are computed.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
35-38
: Markdown Formatting ImprovementsStatic analysis reports indicate minor markdown formatting issues in these new lines (e.g., trailing spaces on line 35 and an excessive line length on line 37). Addressing these by removing extra whitespace and breaking long lines into shorter, more readable segments will further improve the documentation’s clarity.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities
: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt
: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge
: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt
: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity
: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~60-~60: Possible missing preposition found.
Context: ...ng the first precursor in the spectrum. The following columns intensity
(if not a...
(AI_HYDRA_LEO_MISSING_IN)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level
: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Possible missing comma found.
Context: ...el: The MS level of the signal, all of them will be 2. -
mz_array`: The m/z array ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array
: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
74-74: Line length
Expected: 80; Actual: 313
(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334
(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
README.md (1)
39-69
: Comprehensive documentation of MS info outputsThe detailed explanation of the MS info output format, including column definitions and their computation methods, is excellent. This level of detail will help users understand and properly utilize the data.
However, there is a minor typo in line 61: "pyopnems" should be "pyopenms".
-precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index) -precursor_rt = precursor_spectrum.getRT() -purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True) -precursor_intensity = purity.target_intensity -total_intensity = purity.total_intensity +precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index) +precursor_rt = precursor_spectrum.getRT() +purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True) +precursor_intensity = purity.target_intensity +total_intensity = purity.total_intensity🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities
: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt
: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge
: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with
s...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt
: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity
: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97(MD013, line-length)
quantmsutils/mzml/mzml_statistics.py (2)
209-316
: Comprehensive spectrum transformation with better error handlingThe
transform_mzml_spectrum
method provides a more structured approach to processing spectra, with clear separation of MS1 and MS2 handling. The code now properly handles cases where peaks or precursor information might be missing.However, there's one improvement opportunity: the code could benefit from a better structured exception handling when dealing with spectrum processing.
Consider adding a try-except block around the main transformation logic to catch and log any unexpected errors that might occur during spectrum processing:
def transform_mzml_spectrum( self, i: int, spectrum: oms.MSSpectrum, mzml_exp: oms.MSExperiment, ) -> None: + try: # Extract peaks data mz_array, intensity_array = spectrum.get_peaks() peak_count = len(mz_array) scan_id = self._extract_scan_id(spectrum) # ... remaining implementation ... self.batch_data.append(row_data) # Write batch when it reaches specified size if len(self.batch_data) >= self.batch_size: self._write_batch() + except Exception as e: + logger.error(f"Error processing spectrum {i}: {e}") + # Continue processing other spectra
566-615
: Enhanced CLI command with better options and error handlingThe CLI command now includes better documentation, support for MS2 file generation, and improved error handling. However, there's an issue with the exception handling in line 615.
When catching and re-raising exceptions, it's better to use
raise ... from e
to preserve the original traceback:except Exception as e: logger.error(f"Error processing file {ms_path}: {e}") - raise click.Abort() + raise click.Abort() from e🧰 Tools
🪛 Ruff (0.8.2)
615-615: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md
(1 hunks)quantmsutils/mzml/mzml_statistics.py
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities
: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt
: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge
: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with
s...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt
: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity
: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level
: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array
: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
60-60: Line length
Expected: 80; Actual: 233
(MD013, line-length)
61-61: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
64-64: Line length
Expected: 80; Actual: 97
(MD013, line-length)
74-74: Line length
Expected: 80; Actual: 331
(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334
(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153
(MD013, line-length)
🪛 Ruff (0.8.2)
quantmsutils/mzml/mzml_statistics.py
29-29: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unused
Remove unused import: quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
615-615: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (14)
README.md (2)
35-38
: Well-structured introduction to mzML statistics functionalityThe new section provides a clear introduction to the mzML statistics functionality, highlighting its purpose within the quantms ecosystem. This documentation will be valuable for users.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
71-81
: Clear explanation of MS2 file formatThe documentation for the MS2 output format is concise and clearly explains the contents of the MS2 info parquet file. This will be helpful for users working with MS2 data.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~74-~74: Possible missing comma found.
Context: ...o produce a file containing all the MS2 spectra including the intesities and masses of ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~76-~76: Loose punctuation mark.
Context: ...w they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level
: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array
: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
74-74: Line length
Expected: 80; Actual: 331(MD013, line-length)
76-76: Line length
Expected: 80; Actual: 334(MD013, line-length)
78-78: Line length
Expected: 80; Actual: 140(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 153(MD013, line-length)
quantmsutils/mzml/mzml_statistics.py (12)
1-33
: Improved logging implementationAdding proper logging with configurable levels is a good improvement that will help with troubleshooting and monitoring. The module-level logger with a consistent format will make it easier to track the execution flow.
🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
29-29
: Remove unused importThe
PRECURSOR_INTENSITY_WINDOW
constant is imported but never used in the code.- PRECURSOR_INTENSITY_WINDOW,
🧰 Tools
🪛 Ruff (0.8.2)
29-29:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
imported but unusedRemove unused import:
quantmsutils.utils.constants.PRECURSOR_INTENSITY_WINDOW
(F401)
36-66
: Well-structured schema definitionsThe addition of dedicated functions for schema creation improves code organization and maintainability. These schemas clearly define the structure of the output files as documented in the README.
68-141
: Enhanced Bruker file processing with better error handlingThe updated
batch_write_bruker_d
function now includes proper logging, batch processing, and robust error handling. The function ensures that all schema fields are properly populated, even when they don't exist in the source data.
143-161
: Useful helper function for SQLite column checkingThe
column_exists
function is a good addition that makes the code more robust by allowing it to adapt to different SQLite schema structures.
169-202
: Improved constructor with better parameter documentationThe updated constructor now includes support for MS2 file generation and has comprehensive parameter documentation. This makes the code more maintainable and easier to understand.
322-340
: Improved scan ID extractionThe
_extract_scan_id
method is a good addition that handles different formats of scan IDs in a consistent way. This will make the output files more consistent and easier to work with.
341-397
: Well-structured precursor data extractionThe
_extract_first_precursor_data
method properly handles precursor information extraction, including fallbacks for when precursor information is not explicitly provided. The code follows good practices by including comprehensive error handling and logging.
398-436
: Efficient batch writing with error handlingThe
_write_batch
method now includes proper error handling and logging, making it more robust. The method efficiently writes batches of data to the output files.
437-453
: Improved finalization processThe
finalize
method now properly handles resource cleanup, ensuring that all writers are closed and references are cleared, which helps prevent resource leaks.
455-524
: Enhanced mzML streaming with comprehensive error handlingThe
batch_write_mzml_streaming
function now includes better error handling, acquisition date extraction, and supports generating MS2 files. The function is well-structured and follows good practices for logging and error reporting.
527-564
: Improved file path resolutionThe
resolve_ms_path
function now handles different file extensions and provides better logging for path resolution. This makes the tool more user-friendly by automatically finding the correct file even if the extension is not exactly as specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
README.md (4)
35-38
: Fix Typos and Grammar in the mzML Statistics SectionThe new "mzml statistics" section has a couple of language issues:
- The phrase “quantms-utils have multiple scripts…” should be corrected to “quantms-utils has multiple scripts…” for proper subject–verb agreement.
- The word “gow” in “…and gow they are computed” is a typo; it should be “how”.
Consider applying this diff:
-quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed. +quantms-utils has multiple scripts to generate mzML stats. These files are used by multiple tools and packages within the quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and how they are computed.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
40-70
: Ensure Consistent Library Naming and PunctuationWithin the "MS info and details" block:
- The note section refers to “pyopnems” which is likely a typo; it should be “pyopenms” to correctly reference the library.
- Additionally, several lines trigger static analysis warnings (e.g., loose punctuation and lengthy lines). Although these issues do not affect functionality, revising them will enhance readability.
A diff suggestion for the library naming issue:
-For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopnems code: +For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopenms code:🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities
: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt
: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge
: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with
s...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt
: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity
: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97(MD013, line-length)
72-82
: Correct Typo in MS2 DetailsIn the "MS2 info and details" section on line 75, the word “intesities” is misspelled. It should be “intensities”:
-`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak. +`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intensities and masses of every peak.🧰 Tools
🪛 LanguageTool
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level
: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array
: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
75-75: Line length
Expected: 80; Actual: 331(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153(MD013, line-length)
35-86
: Refine Markdown FormattingSeveral lines in the new sections are triggering markdownlint warnings (e.g., trailing spaces, excessive line lengths, and loose punctuation). Although these are minor formatting issues, addressing them will improve the overall readability and maintainability of the README. Consider reviewing these aspects using markdownlint tools.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). -ms_level
: The MS level of the signal, 1 for MS an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. -num_peaks
: The number of peaks in the MS. Compute ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). -summed_peak_intensities
: The sum of all intensities in the spect...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). -rt
: The retention time of the spectrum, cap...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: -precursor_charge
: The charge of the precursor ion, if the...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with
s...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. -precursor_rt
: The retention time of the precursor ion...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. -precursor_total_intensity
: The total intensity of the precursor io...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: -scan
: The scan accession for each MS and MS/M...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). -ms_level
: The MS level of the signal, all of them...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. -mz_array
: The m/z array of the peaks in the MS/MS...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...(UNLIKELY_OPENING_PUNCTUATION)
[typographical] ~86-~86: A comma between “please” and the verb is normally not needed. If you want to signal a pause, consider using an em dash instead.
Context: ... Contributions and issues are welcome. Please, open an issue in the [GitHub repository](htt...(PLEASE_UNNECESSARY_COMMA)
🪛 markdownlint-cli2 (0.17.2)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97(MD013, line-length)
75-75: Line length
Expected: 80; Actual: 331(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153(MD013, line-length)
86-86: Line length
Expected: 80; Actual: 196(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md
(1 hunks)quantmsutils/psm/psm_conversion.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- quantmsutils/psm/psm_conversion.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~44-~44: Loose punctuation mark.
Context: ... they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~45-~45: Loose punctuation mark.
Context: ...in/docs/README.adoc#scan). - ms_level
: The MS level of the signal, 1 for MS an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~46-~46: Loose punctuation mark.
Context: ... 1 for MS and 2 for MS/MS. - num_peaks
: The number of peaks in the MS. Compute ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~47-~47: Loose punctuation mark.
Context: ...um.get_peaks(). -
base_peak_intensity`: The max intensity in the spectrum (MS o...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...S or MS/MS). - summed_peak_intensities
: The sum of all intensities in the spect...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~49-~49: Loose punctuation mark.
Context: ...es in the spectrum (MS or MS/MS). - rt
: The retention time of the spectrum, cap...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~53-~53: Loose punctuation mark.
Context: ...dditional columns: - precursor_charge
: The charge of the precursor ion, if the...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Loose punctuation mark.
Context: ...sors()[0].getCharge(). -
precursor_mz`: The m/z of the precursor ion, if the si...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~54-~54: Possible missing comma found.
Context: ...: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with
s...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~55-~55: Loose punctuation mark.
Context: ...s()[0].getMZ(). -
precursor_intensity`: The intensity of the precursor ion, if ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...ation; see note below. - precursor_rt
: The retention time of the precursor ion...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...ntion time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ote below. - precursor_total_intensity
: The total intensity of the precursor io...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...w they are estimated and used: - scan
: The scan accession for each MS and MS/M...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...ain/docs/README.adoc#scan). - ms_level
: The MS level of the signal, all of them...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~79-~79: Loose punctuation mark.
Context: ...nal, all of them will be 2. - mz_array
: The m/z array of the peaks in the MS/MS...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~80-~80: Loose punctuation mark.
Context: ...ectrum.get_peaks(). -
intensity_array`: The intensity array of the peaks in the...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
README.md
35-35: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
37-37: Line length
Expected: 80; Actual: 272
(MD013, line-length)
42-42: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
42-42: Line length
Expected: 80; Actual: 315
(MD013, line-length)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
44-44: Line length
Expected: 80; Actual: 335
(MD013, line-length)
53-53: Line length
Expected: 80; Actual: 148
(MD013, line-length)
54-54: Line length
Expected: 80; Actual: 137
(MD013, line-length)
55-55: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
55-55: Line length
Expected: 80; Actual: 268
(MD013, line-length)
56-56: Line length
Expected: 80; Actual: 98
(MD013, line-length)
57-57: Line length
Expected: 80; Actual: 112
(MD013, line-length)
60-60: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
61-61: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
61-61: Line length
Expected: 80; Actual: 233
(MD013, line-length)
62-62: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
65-65: Line length
Expected: 80; Actual: 97
(MD013, line-length)
75-75: Line length
Expected: 80; Actual: 331
(MD013, line-length)
77-77: Line length
Expected: 80; Actual: 334
(MD013, line-length)
79-79: Line length
Expected: 80; Actual: 140
(MD013, line-length)
80-80: Line length
Expected: 80; Actual: 153
(MD013, line-length)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: build
Summary by CodeRabbit
Documentation
New Features
Chores
.qodo
to.gitignore
to prevent tracking of specific files..markdownlint.json
added for Markdown linting rules.