Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digestion Consolidation and Optimization #823

Merged
merged 27 commits into from
Jan 17, 2025

Conversation

nbollis
Copy link
Member

@nbollis nbollis commented Jan 14, 2025

PR Classification

Code cleanup and performance optimization
Consolidated the code so that Transcriptomics and Proteomics are based upon the same functions.

Pull request was loaded into MetaMorpheus 1.0.6 and profiled using Jenkins Data.

This PR reduces:

  • Total Runtime: -14%
  • Calls to Mod Fits: -4.9 billion calls
  • Calls to GetFixedModsOneIsNTerminus: -75 million calls
  • BioPolymerWithSetModsExtensions for full and essential sequences: -15% runtime due to string interpolation

PR Summary

This pull request focuses on improving code readability, performance, and maintainability by optimizing methods, refactoring logic, and removing unused code.

  • Updated BioPolymerWithSetModsExtensions to use nullable Modification? type and string interpolation.
  • Optimized DetermineFullSequence method by initializing StringBuilder with an initial capacity.
  • Refactored SetFixedModsOneIsNorFivePrimeTerminus method to use a ref parameter and simplified logic.
  • Replaced manual dictionary and list creation with pooled objects in ProteolyticPeptide.cs and NucleolyticOligo.cs.
  • Added try-finally blocks to ensure pooled objects are returned after use in ProteolyticPeptide.cs and NucleolyticOligo.cs.

Added `using` directive for `MzLibUtil`. Introduced a static readonly `HashSetPool<int>` named `HashSetPool` to manage a pool of hash sets. Updated `DigestionAgent` constructor to initialize `HashSetPool`. Refactored `GetDigestionSiteIndices` to use a hash set from `HashSetPool` for storing indices, ensuring no duplicates. Explicitly added start and end of protein sequence as cleavage sites. Implemented `try-finally` block to return hash set to pool after use. Final list of indices is now sorted before returning.
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 95.19520% with 16 lines in your changes missing coverage. Please review.

Project coverage is 77.43%. Comparing base (5443e36) to head (51dfbfe).

Files with missing lines Patch % Lines
mzLib/Proteomics/ProteolyticDigestion/Protease.cs 86.11% 1 Missing and 4 partials ⚠️
...teomics/ProteolyticDigestion/ProteolyticPeptide.cs 94.44% 1 Missing and 3 partials ⚠️
mzLib/Omics/Digestion/DigestionProduct.cs 89.65% 1 Missing and 2 partials ⚠️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 95.89% 0 Missing and 3 partials ⚠️
mzLib/Omics/BioPolymerWithSetModsExtensions.cs 94.44% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #823      +/-   ##
==========================================
+ Coverage   77.37%   77.43%   +0.06%     
==========================================
  Files         220      223       +3     
  Lines       33197    33280      +83     
  Branches     3441     3446       +5     
==========================================
+ Hits        25686    25772      +86     
+ Misses       6939     6936       -3     
  Partials      572      572              
Files with missing lines Coverage Δ
mzLib/MzLibUtil/ObjectPools/DictionaryPool.cs 100.00% <100.00%> (ø)
mzLib/MzLibUtil/ObjectPools/HashSetPool.cs 100.00% <100.00%> (ø)
mzLib/MzLibUtil/ObjectPools/ListPool.cs 100.00% <100.00%> (ø)
mzLib/Omics/Digestion/DigestionAgent.cs 90.32% <100.00%> (+0.84%) ⬆️
...ib/Omics/Modifications/ModificationLocalization.cs 97.95% <100.00%> (+0.45%) ⬆️
mzLib/Omics/BioPolymerWithSetModsExtensions.cs 95.50% <94.44%> (+0.15%) ⬆️
mzLib/Omics/Digestion/DigestionProduct.cs 95.96% <89.65%> (+1.19%) ⬆️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 96.63% <95.89%> (+0.24%) ⬆️
...teomics/ProteolyticDigestion/ProteolyticPeptide.cs 92.43% <94.44%> (+0.54%) ⬆️
mzLib/Proteomics/ProteolyticDigestion/Protease.cs 95.78% <86.11%> (+0.22%) ⬆️

- Simplified initial check for `possibleVariableModifications.Count` and replaced `yield return null` with `yield break`.
- Adjusted indentation and loop structure for clarity.
- Refactored nested loop to remove unnecessary braces and streamline logic.
- Simplified construction of `modificationPattern` dictionary by removing redundant checks and directly using `modIndex`.
@nbollis nbollis changed the title [After Pools] Digestion Optimization and Terminal Bug Fix Digestion Optimization and Terminal Bug Fix Jan 15, 2025
@nbollis nbollis changed the title Digestion Optimization and Terminal Bug Fix Digestion Consolidation and Optimization Jan 16, 2025
@nbollis nbollis removed the bug label Jan 16, 2025
@nbollis nbollis added Maintenance The user isn't impacted by it, it's purely behind the scenes and removed enhancement labels Jan 16, 2025
@nbollis nbollis marked this pull request as ready for review January 16, 2025 00:55
Alexander-Sol
Alexander-Sol previously approved these changes Jan 16, 2025
Copy link
Contributor

@Alexander-Sol Alexander-Sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what I'm looking at, but it sure looks cleaner now than it did before

trishorts
trishorts previously approved these changes Jan 16, 2025
@nbollis nbollis dismissed stale reviews from trishorts and Alexander-Sol via bff563e January 16, 2025 18:38
@nbollis nbollis merged commit 1b8b950 into smith-chem-wisc:master Jan 17, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maintenance The user isn't impacted by it, it's purely behind the scenes ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants