Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Transciptomics: Digestion, Fragmentaiton, Decoy Generation #744

Closed
wants to merge 114 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
a6b1639
correct Within calculation
Nov 18, 2021
fa4da8b
update unit tests
Nov 18, 2021
3246567
conflicts resolved back to upstream
Feb 4, 2022
a018d4d
Merge remote-tracking branch 'upstream/master'
Feb 15, 2022
15a37d0
Merge remote-tracking branch 'upstream/master'
Feb 17, 2022
892fa45
this is the spot
Feb 18, 2022
211013c
Merge remote-tracking branch 'upstream/master'
Feb 25, 2022
68104ee
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Mar 9, 2022
d715a08
Merge remote-tracking branch 'upstream/master'
Mar 16, 2022
3565522
Merge remote-tracking branch 'upstream/master'
Mar 23, 2022
72e7b53
Merge remote-tracking branch 'upstream/master'
Mar 29, 2022
593872a
Merge remote-tracking branch 'upstream/master'
trishorts Apr 13, 2022
42dd034
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Apr 13, 2022
fbeaec0
Merge remote-tracking branch 'upstream/master'
trishorts Jun 1, 2022
614ded7
Merge remote-tracking branch 'upstream/master'
Jun 14, 2022
47307c8
Merge branch 'master' of https://github.com/trishorts/mzLib
Jun 14, 2022
28e05ae
Merge remote-tracking branch 'upstream/master'
Jul 6, 2022
0a7c609
Merge remote-tracking branch 'upstream/master'
Jul 26, 2022
630d8c7
Merge remote-tracking branch 'upstream/master'
trishorts Jul 27, 2022
f6a386b
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Jul 27, 2022
d673800
Merge remote-tracking branch 'upstream/master'
Sep 11, 2022
675a0ae
Merge branch 'master' of https://github.com/trishorts/mzLib
Sep 11, 2022
15d4baf
Merge remote-tracking branch 'upstream/master'
Sep 27, 2022
03ca9f7
Merge remote-tracking branch 'upstream/master'
Oct 4, 2022
d0a4c79
Merge remote-tracking branch 'upstream/master'
Jan 30, 2023
894b998
Merge remote-tracking branch 'upstream/master'
Mar 15, 2023
88269a1
Merge remote-tracking branch 'upstream/master'
trishorts Apr 24, 2023
9a9b24a
Merge remote-tracking branch 'upstream/master'
trishorts Jun 29, 2023
b4ad231
add space
trishorts Jun 29, 2023
bc59b38
Merge remote-tracking branch 'upstream/master'
trishorts Oct 10, 2023
f3c83ae
first move
trishorts Nov 6, 2023
d6d934b
psmFromTsv unit tests
trishorts Nov 6, 2023
2db71cd
moved library spectrum
trishorts Nov 6, 2023
562f69d
empty unit test for library spectrum
trishorts Nov 6, 2023
d3dcbe9
m
trishorts Nov 6, 2023
2c4334a
library spectrum unit tests
trishorts Nov 7, 2023
a86d68e
lib spec unit tests
trishorts Nov 7, 2023
c7ce32d
PSMTSV unit tests
trishorts Nov 7, 2023
c610791
add tests for variants and localized glycans
trishorts Nov 7, 2023
5e09c14
capitalization convention
trishorts Nov 7, 2023
9055644
read internal ions test
trishorts Nov 7, 2023
74b80ad
uncomment lines
trishorts Nov 7, 2023
d1bc75c
moved fragmentation and library spectrum to new project Omics
trishorts Nov 8, 2023
cec311a
Revert "moved fragmentation and library spectrum to new project Omics"
trishorts Nov 9, 2023
8d88b32
someInterfaces
trishorts Nov 9, 2023
df0f605
good midpont
trishorts Nov 9, 2023
cad0d1c
omics classes and interfaces seem tobe working
trishorts Nov 9, 2023
8991e14
move LibrarySpectrum class to Omics. Create SpectrumMatchFromTsvHeade…
trishorts Nov 10, 2023
02bf807
not working
trishorts Nov 15, 2023
b7d15d6
Fixed up the PR
nbollis Nov 15, 2023
2502322
Merge pull request #2 from trishorts/tempPsmFromTsv
trishorts Nov 16, 2023
924e99f
fix broken test
trishorts Nov 16, 2023
10f53a2
some unit tests
trishorts Nov 16, 2023
d0a55b2
dhg
trishorts Nov 16, 2023
81f9338
Expanded test coverage on file classes
nbollis Nov 16, 2023
382c0da
new header and xlink psmtsv reader unit tests
trishorts Nov 20, 2023
1c779e6
Merge branch 'master' into PsmFrmTsv
trishorts Nov 24, 2023
b1df755
Merge branch 'master' into PsmFrmTsv
nbollis Nov 27, 2023
5833d0f
space update
trishorts Nov 27, 2023
3f1ee5e
update nuspec for omics and added peptide folder to omics fragmentatkion
trishorts Nov 27, 2023
015ec82
Merge branch 'master' into PsmFrmTsv
trishorts Nov 27, 2023
540f449
Moved around most everything that wil need to be for Transcriptomics …
nbollis Nov 27, 2023
b91f11a
Made all tests pass
nbollis Nov 27, 2023
d1c7035
Moved a few methods out of PeptideWithSetModifications and into IBioP…
nbollis Nov 27, 2023
a915405
Moved methods from ProteolyticPeptide to LysisProduct
nbollis Nov 28, 2023
0a2cd99
Marked RNase.tsv to copy always
nbollis Nov 28, 2023
c9f53db
Started Implementation
nbollis Nov 28, 2023
9571870
Cleaned up the code quite a bit
nbollis Nov 28, 2023
cc7a1a6
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 28, 2023
15c1054
Added in first tests
nbollis Nov 28, 2023
361a0d2
Added in next set of tests including: Fragmentation and fragmentation
nbollis Nov 28, 2023
179ac3c
Updated product class equalit members
nbollis Nov 28, 2023
66f4fd7
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 28, 2023
77f3192
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 28, 2023
e2c10f6
Merge branch 'RNA_SecondIncorporation' of https://github.com/nbollis/…
nbollis Nov 28, 2023
aec0c19
Merge branch 'RNA_SecondIncorporation' of https://github.com/nbollis/…
nbollis Nov 28, 2023
e5001ce
Merge branch 'RNA_SecondIncorporation' of https://github.com/nbollis/…
nbollis Nov 28, 2023
d81132a
Updated product class equalit members
nbollis Nov 28, 2023
022d63e
Merge branch 'RNA_FirstIncorporation' of https://github.com/nbollis/m…
nbollis Nov 28, 2023
58c41cf
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 28, 2023
1a7356a
This one method keeps fighting me
nbollis Nov 28, 2023
6f751c6
Added decoy Generation
nbollis Nov 28, 2023
f03b4d9
Updated nuspec to carry rnase.tsv and transcriptomics project
nbollis Nov 29, 2023
706ab75
Removed AnyCPU
nbollis Nov 29, 2023
9466cc0
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 29, 2023
10dee8c
Removed hard coded path in TestDigestion
nbollis Nov 29, 2023
fe50ef2
Removed all unnecessary using directives
nbollis Nov 29, 2023
0ab522d
Removed unused lines form NucleolyticOligo
nbollis Nov 30, 2023
6b1f48d
Added tests to ChemicalFormual operators
nbollis Nov 30, 2023
5d60413
Merge branch 'RNA_FirstIncorporation' into RNA_SecondIncorporation
nbollis Nov 30, 2023
c309a10
Merged in Master
nbollis Nov 30, 2023
a1977f5
Merge
nbollis Nov 30, 2023
107dd43
Increrased test coverage
nbollis Nov 30, 2023
386c784
Adjusted Nuspec
nbollis Nov 30, 2023
539c321
Changed pseudouridine to Y to better match the Psi symbol that common…
nbollis Nov 30, 2023
90846f9
Worked on diagram
nbollis Dec 4, 2023
0cef17c
Merged in master
nbollis Jan 8, 2024
35da90f
Updated from master
nbollis Jan 8, 2024
67a0d87
Made changes for final incorporation
nbollis Jan 18, 2024
99a1b75
Made Required Changes to interfaces
nbollis Jan 18, 2024
9fbf33c
Added clone method to digestion params
nbollis Jan 19, 2024
fb28fbe
Added nucleotide and Rnase
nbollis Jan 19, 2024
7dfe9ba
Excluded test classes from code coverage
nbollis Jan 19, 2024
80df342
Removed unused using directives
nbollis Jan 19, 2024
f106f85
Expanded test coverage
nbollis Jan 19, 2024
ab2001d
Merged in changes from Mzlib PR
nbollis Jan 22, 2024
e4a5866
Fixed unit tests by adjusting minLength in digestion params
nbollis Jan 22, 2024
894476f
Added novel terminus parmeter to the clone method
nbollis Jan 22, 2024
89bc795
Merged in Rna_baseClasses
nbollis Jan 23, 2024
d25f8ee
Merged in Rna_baseClasses
nbollis Jan 23, 2024
68b4ae2
merge
nbollis Jan 23, 2024
28bdc79
digestionparams change
nbollis Jan 29, 2024
b0a7207
Added transcription method
nbollis Feb 6, 2024
942ab61
changed variable name
nbollis Feb 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion mzLib/MzLibUtil/ClassExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace MzLibUtil
{
Expand Down Expand Up @@ -100,6 +101,45 @@ public static bool AllSame<T>(this IEnumerable<T> list)

return true;
}


/// <summary>
/// Transcribes a DNA sequence into an RNA sequence
/// </summary>
/// <param name="dna">The input dna sequence</param>
/// <param name="isCodingStrand">True if the input sequence is the coding strand, False if the input sequence is the template strand</param>
/// <returns></returns>
public static string Transcribe(this string dna, bool isCodingStrand = true)
{
var sb = new StringBuilder();
foreach (var residue in dna)
{
if (isCodingStrand)
{
sb.Append(residue == 'T' ? 'U' : residue);
}
else
{
switch (residue)
{
case 'A':
sb.Append('U');
break;
case 'T':
sb.Append('A');
break;
case 'C':
sb.Append('G');
break;
case 'G':
sb.Append('C');
break;
default:
sb.Append(residue);
break;
}
}
}
return sb.ToString();
}
}
}
2 changes: 2 additions & 0 deletions mzLib/Omics/Digestion/DigestionAgent.cs
Original file line number Diff line number Diff line change
Expand Up @@ -103,5 +103,7 @@ public List<int> GetDigestionSiteIndices(string sequence)
indices.Add(sequence.Length); // The end of the protein is treated as a cleavage site to retain the c-terminal peptide
return indices.Distinct().OrderBy(i => i).ToList();
}


}
}
12 changes: 10 additions & 2 deletions mzLib/Omics/Digestion/IDigestionParams.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Omics.Digestion
{
public interface IDigestionParams
public interface IDigestionParams
{
int MaxMissedCleavages { get; set; }
int MinLength { get; set; }
Expand All @@ -11,5 +11,13 @@ public interface IDigestionParams
int MaxMods { get; set; }
DigestionAgent DigestionAgent { get; }
FragmentationTerminus FragmentationTerminus { get; }
CleavageSpecificity SearchModeType { get; }

/// <summary>
/// new terminus parameter is for non and semi specific searches
/// </summary>
/// <param name="newTerminus"></param>
/// <returns></returns>
IDigestionParams Clone(FragmentationTerminus? newTerminus = null);
}
}
}
187 changes: 187 additions & 0 deletions mzLib/Omics/Digestion/LysisProduct.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
using Omics.Modifications;

namespace Omics.Digestion
{
public class LysisProduct
{
protected string _baseSequence;

public LysisProduct(IBioPolymer parent, int oneBasedStartResidue, int oneBasedEndResidue, int missedCleavages,
CleavageSpecificity cleavageSpecificityForFdrCategory, string? description = null, string? baseSequence = null)
{
Parent = parent;
OneBasedStartResidue = oneBasedStartResidue;
OneBasedEndResidue = oneBasedEndResidue;
MissedCleavages = missedCleavages;
CleavageSpecificityForFdrCategory = cleavageSpecificityForFdrCategory;
Description = description;
_baseSequence = baseSequence;
}

[field: NonSerialized] public IBioPolymer Parent { get; protected set; } // BioPolymer that this lysis product is a digestion product of
public string Description { get; protected set; } //unstructured explanation of source
public int OneBasedStartResidue { get; }// the residue number at which the peptide begins (the first residue in a protein is 1)
public int OneBasedEndResidue { get; }// the residue number at which the peptide ends
public int MissedCleavages { get; } // the number of missed cleavages this peptide has with respect to the digesting protease
public virtual char PreviousResidue => OneBasedStartResidue > 1 ? Parent[OneBasedStartResidue - 2] : '-';

public virtual char NextResidue => OneBasedEndResidue < Parent.Length ? Parent[OneBasedEndResidue] : '-';
public string BaseSequence =>
_baseSequence ??= Parent.BaseSequence.Substring(OneBasedStartResidue - 1,
OneBasedEndResidue - OneBasedStartResidue + 1);
public CleavageSpecificity CleavageSpecificityForFdrCategory { get; set; } //structured explanation of source
public int Length => BaseSequence.Length; //how many residues long the peptide is
public char this[int zeroBasedIndex] => BaseSequence[zeroBasedIndex];

protected static IEnumerable<Dictionary<int, Modification>> GetVariableModificationPatterns(Dictionary<int, List<Modification>> possibleVariableModifications, int maxModsForPeptide, int peptideLength)
{
if (possibleVariableModifications.Count == 0)
{
yield return null;
}
else
{
var possible_variable_modifications = new Dictionary<int, List<Modification>>(possibleVariableModifications);

int[] base_variable_modification_pattern = new int[peptideLength + 4];
var totalAvailableMods = possible_variable_modifications.Sum(b => b.Value == null ? 0 : b.Value.Count);
for (int variable_modifications = 0; variable_modifications <= Math.Min(totalAvailableMods, maxModsForPeptide); variable_modifications++)
{
foreach (int[] variable_modification_pattern in GetVariableModificationPatterns(new List<KeyValuePair<int, List<Modification>>>(possible_variable_modifications),
possible_variable_modifications.Count - variable_modifications, base_variable_modification_pattern, 0))
{
yield return GetNewVariableModificationPattern(variable_modification_pattern, possible_variable_modifications);
}
}
}
}

protected Dictionary<int, Modification> GetFixedModsOneIsNterminusOrFivePrime(int peptideLength,
IEnumerable<Modification> allKnownFixedModifications)
{
var fixedModsOneIsNterminus = new Dictionary<int, Modification>(peptideLength + 3);
foreach (Modification mod in allKnownFixedModifications)
{
switch (mod.LocationRestriction)
{
case "5'-terminal.":
case "Oligo 5'-terminal.":
case "N-terminal.":
case "Peptide N-terminal.":
//the modification is protease associated and is applied to the n-terminal cleaved residue, not at the beginign of the protein
if (mod.ModificationType == "Protease" && ModificationLocalization.ModFits(mod, Parent.BaseSequence, 1, peptideLength, OneBasedStartResidue))
{
if (OneBasedStartResidue != 1)
{
fixedModsOneIsNterminus[2] = mod;
}
}
//Normal N-terminal peptide modification
else if (ModificationLocalization.ModFits(mod, Parent.BaseSequence, 1, peptideLength, OneBasedStartResidue))
{
fixedModsOneIsNterminus[1] = mod;
}
break;

case "Anywhere.":
for (int i = 2; i <= peptideLength + 1; i++)
{
if (ModificationLocalization.ModFits(mod, Parent.BaseSequence, i - 1, peptideLength, OneBasedStartResidue + i - 2))
{
fixedModsOneIsNterminus[i] = mod;
}
}
break;

case "3'-terminal.":
case "Oligo 3'-terminal.":
case "C-terminal.":
case "Peptide C-terminal.":
//the modification is protease associated and is applied to the c-terminal cleaved residue, not if it is at the end of the protein
if (mod.ModificationType == "Protease" && ModificationLocalization.ModFits(mod, Parent.BaseSequence, peptideLength, peptideLength, OneBasedStartResidue + peptideLength - 1))
{
if (OneBasedEndResidue != Parent.Length)
{
fixedModsOneIsNterminus[peptideLength + 1] = mod;
}

}
//Normal C-terminal peptide modification
else if (ModificationLocalization.ModFits(mod, Parent.BaseSequence, peptideLength, peptideLength, OneBasedStartResidue + peptideLength - 1))
{
fixedModsOneIsNterminus[peptideLength + 2] = mod;
}
break;

default:
throw new NotSupportedException("This terminus localization is not supported.");
}
}
return fixedModsOneIsNterminus;
}


private static IEnumerable<int[]> GetVariableModificationPatterns(List<KeyValuePair<int, List<Modification>>> possibleVariableModifications,
int unmodifiedResiduesDesired, int[] variableModificationPattern, int index)
{
if (index < possibleVariableModifications.Count - 1)
{
if (unmodifiedResiduesDesired > 0)
{
variableModificationPattern[possibleVariableModifications[index].Key] = 0;
foreach (int[] new_variable_modification_pattern in GetVariableModificationPatterns(possibleVariableModifications,
unmodifiedResiduesDesired - 1, variableModificationPattern, index + 1))
{
yield return new_variable_modification_pattern;
}
}
if (unmodifiedResiduesDesired < possibleVariableModifications.Count - index)
{
for (int i = 1; i <= possibleVariableModifications[index].Value.Count; i++)
{
variableModificationPattern[possibleVariableModifications[index].Key] = i;
foreach (int[] new_variable_modification_pattern in GetVariableModificationPatterns(possibleVariableModifications,
unmodifiedResiduesDesired, variableModificationPattern, index + 1))
{
yield return new_variable_modification_pattern;
}
}
}
}
else
{
if (unmodifiedResiduesDesired > 0)
{
variableModificationPattern[possibleVariableModifications[index].Key] = 0;
yield return variableModificationPattern;
}
else
{
for (int i = 1; i <= possibleVariableModifications[index].Value.Count; i++)
{
variableModificationPattern[possibleVariableModifications[index].Key] = i;
yield return variableModificationPattern;
}
}
}
}

private static Dictionary<int, Modification> GetNewVariableModificationPattern(int[] variableModificationArray,
IEnumerable<KeyValuePair<int, List<Modification>>> possibleVariableModifications)
{
var modification_pattern = new Dictionary<int, Modification>();

foreach (KeyValuePair<int, List<Modification>> kvp in possibleVariableModifications)
{
if (variableModificationArray[kvp.Key] > 0)
{
modification_pattern.Add(kvp.Key, kvp.Value[variableModificationArray[kvp.Key] - 1]);
}
}

return modification_pattern;
}


}
}
19 changes: 6 additions & 13 deletions mzLib/Omics/Fragmentation/FragmentationTerminus.cs
Original file line number Diff line number Diff line change
@@ -1,19 +1,12 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Omics.Fragmentation
namespace Omics.Fragmentation
{
public enum FragmentationTerminus
{
Both, //N- and C-terminus
N, //N-terminus only
C, //C-terminus only
{
Both, //N- and C-terminus
N, //N-terminus only
C, //C-terminus only
None, //used for internal fragments, could be used for top down intact mass?
FivePrime, // 5' for NucleicAcids
ThreePrime, // 3' for NucleicAcids
}

}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
using Chemistry;
using MassSpectrometry;
using Omics.Fragmentation;

namespace Omics.Fragmentation.Peptide
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
namespace Omics.Fragmentation.Peptide
namespace Omics.Fragmentation
{
public class TerminusSpecificProductTypes
{
Expand Down
13 changes: 6 additions & 7 deletions mzLib/Omics/IBioPolymer.cs
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
using Chemistry;
using MassSpectrometry;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Omics.Digestion;
using Omics.Digestion;
using Omics.Modifications;

namespace Omics
{
public interface IBioPolymer
{
string Name { get; }
string FullName { get; }
string BaseSequence { get; }
int Length { get; }
string DatabaseFilePath { get; }
bool IsDecoy { get; }
bool IsContaminant { get; }
string Organism { get; }
string Accession { get; }
/// <summary>
/// The list of gene names consists of tuples, where Item1 is the type of gene name, and Item2 is the name. There may be many genes and names of a certain type produced when reading an XML protein database.
/// </summary>
IEnumerable<Tuple<string, string>> GeneNames { get; }
IDictionary<int, List<Modification>> OneBasedPossibleLocalizedModifications { get; }
char this[int zeroBasedIndex] => BaseSequence[zeroBasedIndex];

Expand Down
3 changes: 3 additions & 0 deletions mzLib/Omics/IBioPolymerWithSetMods.cs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ public interface IBioPolymerWithSetMods : IHasChemicalFormula
{
string BaseSequence { get; }
string FullSequence { get; }
string Description { get; }
double MostAbundantMonoisotopicMass { get; }
string SequenceWithChemicalFormulas { get; }
int OneBasedStartResidue { get; }
Expand All @@ -41,6 +42,8 @@ public void Fragment(DissociationType dissociationType, FragmentationTerminus fr
public void FragmentInternally(DissociationType dissociationType, int minLengthOfFragments,
List<Product> products);

public IBioPolymerWithSetMods Localize(int j, double massToLocalize);

public static string GetBaseSequenceFromFullSequence(string fullSequence)
{
StringBuilder sb = new StringBuilder();
Expand Down
1 change: 0 additions & 1 deletion mzLib/Omics/SpectrumMatch/SpectrumMatchFromTsv.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
using System.Globalization;
using System.Text.RegularExpressions;
using Chemistry;
using Omics.Fragmentation.Peptide;

namespace Omics.SpectrumMatch
{
Expand Down
Loading