Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add filterSpectra function (issue #41) #42

Merged
merged 4 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/check-bioc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ jobs:
BiocManager::install("msdata")
BiocManager::install("lgatto/rpx")
## Manually install dev version(s)
BiocManager::install("RforMassSpectrometry/MsBackendSql")
BiocManager::install("RforMassSpectrometry/ProtGenerics")

# BiocManager::install(c("devtools", "usethis", "vdiffr"), dependencies = TRUE, ask = FALSE, update = FALSE)
## For running the checks
Expand Down
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MsExperiment
Title: Infrastructure for Mass Spectrometry Experiments
Version: 1.5.2
Version: 1.5.3
Description: Infrastructure to store and manage all aspects related to
a complete proteomics or metabolomics mass spectrometry (MS)
experiment. The MsExperiment package provides light-weight and
Expand All @@ -26,7 +26,7 @@ Authors@R: c(person(given = "Laurent", family = "Gatto",
role = "aut"))
Depends:
R (>= 4.2),
ProtGenerics (>= 1.9.1),
ProtGenerics (>= 1.35.2),
Imports:
methods,
S4Vectors,
Expand All @@ -53,7 +53,7 @@ VignetteBuilder: knitr
BugReports: https://github.com/RforMassSpectrometry/MsExperiment/issues
URL: https://github.com/RforMassSpectrometry/MsExperiment
biocViews: Infrastructure, Proteomics, MassSpectrometry, Metabolomics, ExperimentalDesign, DataImport
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
Roxygen: list(markdown=TRUE)
Encoding: UTF-8
Collate:
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,13 @@ importFrom(methods,slotNames)
importFrom(methods,validObject)
importMethodsFrom(BiocGenerics,dbconn)
importMethodsFrom(ProtGenerics,"spectra<-")
importMethodsFrom(ProtGenerics,filterSpectra)
importMethodsFrom(S4Vectors,"mcols<-")
importMethodsFrom(S4Vectors,"metadata<-")
importMethodsFrom(S4Vectors,mcols)
importMethodsFrom(S4Vectors,metadata)
importMethodsFrom(Spectra,Spectra)
importMethodsFrom(Spectra,peaksVariables)
importMethodsFrom(Spectra,selectSpectraVariables)
importMethodsFrom(Spectra,spectraVariables)
importMethodsFrom(methods,show)
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# MsExperiment 1.5

## MsExperiment 1.5.3

- Add `filterSpectra` method to allow filtering of `Spectra` within an
`MsExperiment` while keeping possibly present relationships between samples
and spectra consistent.

## MsExperiment 1.5.2

- Add support to read/write sample data from/to a *MsBackendSql* database
Expand Down
26 changes: 25 additions & 1 deletion R/MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
}

#' The validity of the link matrix is evaluated only when adding the link. Also,
#' eventually existing links between the same entities will be **overwritten**.
#' possibly existing links between the same entities will be **overwritten**.
#'
#' @param x `LinkedMsExperiment`.
#'
Expand Down Expand Up @@ -404,3 +404,27 @@ readMsExperiment <- function(spectraFiles = character(),
spectra(x) <- Spectra(spectraFiles, ...)
linkSampleData(x, with = "sampleData.spectraOrigin = spectra.dataOrigin")
}

#' @title Consolidate links between samples and spectra after filtering
#'
#' @description
#'
#' If @spectra got filtered possibly present *links* between them and samples
#' will no longer be valid and need to be updated/fixed. This function
#' consolidates these links using a spectra variable `"._SPECTRA_IDX"` in
#' `@spectra` that needs to represent/contain the index of the spectra
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this ._SPECTRA_IDX variable already existed before this update ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no. but very good question! I specifically add this variable before I call the filter function on the Spectra object. That's the only way (at least that I found) how I can now to which spectra the data was filtered. So, the workflow is:

  • add the spectra variable ._SPECTRA_IDX to the Spectra containing the index 1:length spectra (I chose this name for the spectra variable to not overwrite any potentially existing spectra variable with the same name - unlikely that somebody will call a spectra variable with this name - at least I assume)
  • filter the Spectra object
  • after filtering I need to update/consolidate the link (mapping) between samples and Spectra (which is stored in the @sampleDataLinks[["spectra"]] - it contains a two column matrix, the first with the index of the sample, the second with the index of the spectrum assigned to this sample). I can use the ._SPECTRA_IDX variable for that, because this got also subset (along with the Spectra object). So, I need to only keep rows in the above mapping matrix with the spectrum index (second column) present in ._SPECTRA_IDX.
  • after that I remove the ._SPECTRA_IDX spectra variable again, because it's not needed anymore.

let me know if something is still unclear.

#' **before** filtering.
#'
#' @param x `MsExperiment`
#'
#' @author Johannes Rainer
#' @noRd
.update_sample_data_links_spectra <- function(x) {
sdl <- .sample_data_links(x, "spectra")[[1L]]
idx <- match(sdl[, 2L], x@spectra$._SPECTRA_IDX)
keep <- !is.na(idx)
sdl <- sdl[keep, , drop = FALSE]
sdl[, 2L] <- idx[keep]
x@sampleDataLinks[["spectra"]] <- sdl
x
}
64 changes: 62 additions & 2 deletions R/MsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,16 @@
#' arbitrary order is supported.
#' See the vignette for details and examples.
#'
#' - `filterSpectra`: subsets the `Spectra` within an `MsExperiment` using a
#' provided filter function (parameter `filter`). Parameters for the filter
#' function can be passed with parameter `...`. Any of the filter functions
#' of a [Spectra()] object can be passed with parameter `filter`. Possibly
#' present relationships between samples and spectra (*links*, see also
#' `linkSampleData`) are updated. Filtering affects only the spectra data
#' of the object, none of the other slots and data (e.g. `sampleData`) are
#' modified.
#' The function returns an `MsExperiment` with the filtered `Spectra` object.
#'
#' @return See help of the individual functions.
#'
#' @param spectra [Spectra()] object with the MS spectra data of the
Expand All @@ -171,6 +181,11 @@
#' @param experimentFiles [MsExperimentFiles()] defining (external) files
#' to data or annotation.
#'
#' @param filter for `filterSpectra`: any filter function supported by
#' [Spectra()] to filter the spectra object (such as `filterRt` or
#' `filterMsLevel`). Parameters for the filter function can be passed
#' through `...`.
#'
#' @param i for `[`: an `integer`, `character` or `logical` referring to the
#' indices or names (rowname of `sampleData`) of the samples to subset.
#'
Expand Down Expand Up @@ -205,7 +220,8 @@
#'
#' @param x an `MsExperiment`.
#'
#' @param ... optional additional parameters.
#' @param ... optional additional parameters. For `filterSpectra`: parameters
#' to be passed to the filter function (parameter `filter`).
#'
#' @name MsExperiment
#'
Expand Down Expand Up @@ -284,12 +300,26 @@
#' experimentFiles(mse[2])[["annotations"]]
#'
#' ## Subsetting will always keep the relationship between samples and linked
#' ## data elements. Subsetting will however eventually duplicate data elements
#' ## data elements. Subsetting will however possibly duplicate data elements
#' ## that are shared among samples. Thus, while in the original object the
#' ## element "annotations" has a single entry, subsetting with [1:2] will
#' ## result in an MsExperiment with duplicated entries in "annotations"
#' experimentFiles(mse)[["annotations"]]
#' experimentFiles(mse[1:2])[["annotations"]]
#'
#' ## Spectra within an MsExperiment can be filtered/subset with the
#' ## `filterSpectra` function and any of the filter functions supported
#' ## by `Spectra` objects. Below we restrict the spectra data to spectra
#' ## with a retention time between 200 and 210 seconds.
#' res <- filterSpectra(mse, filterRt, rt = c(200, 210))
#' res
#'
#' ## The object contains now much less spectra. The retention times for these
#' rtime(spectra(res))
#'
#' ## Relationship between samples and spectra was preserved by the filtering
#' a <- res[1L]
#' spectra(a)
NULL

#' @name MsExperiment-class
Expand Down Expand Up @@ -546,3 +576,33 @@ setMethod("[", "MsExperiment", function(x, i, j, ..., drop = FALSE) {
}
.extractSamples(x, i, newx = x)
})

#' @rdname MsExperiment
#'
#' @importMethodsFrom Spectra selectSpectraVariables
#'
#' @importMethodsFrom Spectra spectraVariables
#'
#' @importMethodsFrom Spectra peaksVariables
#'
#' @importMethodsFrom ProtGenerics filterSpectra
setMethod(
"filterSpectra", c("MsExperiment", "function"),
function(object, filter, ...) {
ls <- length(spectra(object))
if (!ls)
return(object)
have_links <- length(.sample_data_links(object, "spectra")) > 0
if (have_links)
object@spectra$._SPECTRA_IDX <- seq_len(ls)
object@spectra <- filter(object@spectra, ...)
if (have_links) {
if (ls != length(spectra(object)))
object <- .update_sample_data_links_spectra(object)
svs <- unique(c(spectraVariables(spectra(object)),
peaksVariables(spectra(object))))
object@spectra <- selectSpectraVariables(
object@spectra, svs[svs != "._SPECTRA_IDX"])
}
object
})
36 changes: 34 additions & 2 deletions man/MsExperiment.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions tests/testthat/test_MsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,19 @@ test_that("readMsExperiment works", {
expect_true(nrow(sampleData(a)) == 2)
expect_equal(sampleData(a)$other_ann, c("a", "b"))
})

test_that(".update_sample_data_links_spectra works", {
a <- readMsExperiment(fls)
tmp <- a
tmp@spectra$._SPECTRA_IDX <- seq_along(tmp@spectra)
tmp@spectra <- tmp@spectra[c(5, 14, 1000, 2, 200)]
res <- .update_sample_data_links_spectra(tmp)
expect_equal(res@sampleDataLinks[["spectra"]][, 1L], c(1L, 1L, 1L, 1L, 2L))
expect_equal(res@sampleDataLinks[["spectra"]][, 2L], c(4L, 1L, 2L, 5L, 3L))
expect_equal(res@spectra$scanIndex,
a@spectra$scanIndex[c(5, 14, 1000, 2, 200)])
expect_equal(res@sampleData, tmp@sampleData)

expect_true(length(spectra(res[1L])) == 4)
expect_true(length(spectra(res[2L])) == 1)
})
38 changes: 38 additions & 0 deletions tests/testthat/test_MsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,41 @@ test_that("otherData<-,otherData,MsExperiment works", {
otherData(m)[["NUM"]] <- NULL
expect_identical(length(otherData(m)), 0L)
})

test_that("filterSpectra,MsExperiment,function works", {
## empty object
res <- filterSpectra(MsExperiment(), filterMsLevel, 2L)
expect_s4_class(res, "MsExperiment")
expect_true(length(res) == 0L)

## an object without links.
res <- filterSpectra(mse, filterMsLevel, 2L)
expect_s4_class(res, "MsExperiment")
expect_equal(sampleData(res), sampleData(mse))
expect_true(length(spectra(res)) == 0L)
expect_equal(res@sampleDataLinks, mse@sampleDataLinks)

## an object with links between samples and spectra
tmp <- readMsExperiment(fls)
res <- filterSpectra(tmp, filterMsLevel, 2L)
expect_s4_class(res, "MsExperiment")
expect_equal(sampleData(res), sampleData(tmp))
expect_true(length(spectra(res)) == 0L)
expect_equal(names(res@sampleDataLinks), names(tmp@sampleDataLinks))
expect_true(nrow(res@sampleDataLinks[["spectra"]]) == 0L)

## Just reducing.
res <- filterSpectra(tmp, filterRt, c(200, 210))
expect_s4_class(res, "MsExperiment")
expect_equal(sampleData(res), sampleData(tmp))
expect_true(all(rtime(spectra(res)) >= 200 & rtime(spectra(res)) <= 210))
## check that sample/spectra links are valid
a <- spectra(res[1L])
ref <- filterRt(spectra(tmp[1L]), c(200, 210))
expect_equal(a$scanIndex, ref$scanIndex)
expect_equal(a$dataOrigin, ref$dataOrigin)
a <- spectra(res[2L])
ref <- filterRt(spectra(tmp[2L]), c(200, 210))
expect_equal(a$scanIndex, ref$scanIndex)
expect_equal(a$dataOrigin, ref$dataOrigin)
})
33 changes: 32 additions & 1 deletion vignettes/MsExperiment.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ plotSpectra(sp[1000])

For some experiments and data analyses an explicit link between data, data files
and respective samples is required. Such links enable an easy (and error-free)
subsetting or re-ordering of a whole experiment by sample and would also
subset or re-ordering of a whole experiment by sample and would also
simplify coloring and labeling of the data depending on the sample or of its
variables or conditions.

Expand Down Expand Up @@ -521,6 +521,37 @@ samples to elements does however not affect data consistency. A sample will
always be linked to the correct value/element.


# Subset and filter `MsExperiment`

As already shown above, `MsExperiment` objects can be subset with `[` which will
subset the data by sample. Depending on whether relationships (links) between
samples and any other data within the object are present also these are
correctly subset. In addition to this general subset operation, it is possible
to individually filter the spectra data within an `MsExperiment` using the
`filterSpectra` function. This function takes any filter function supported by
`Spectra` with parameter `filter`. Parameters for this filter function can be
passed through `...`. As an example we filter below the spectra data of our
`MsExperiment` keeping only spectra with an retention time between 200 and 210
seconds.

```{r}
#' Filter the Spectra using the `filterRt` function providing also the
#' parameters for this function.
res <- filterSpectra(lmse, filterRt, rt = c(200, 210))
res
```

The resulting `MsExperiment` contains now much fewer spectra. `filterSpectra`
did only filter the spectra data, but not any of the other data slots. It did
however update and consolidate the relationships between samples and spectra
(if present) after filtering:

```{r}
#' Extract spectra of the second sample after filtering
spectra(res[2L])
```


# Using `MsExperiment` with `MsBackendSql`

The `r Biocpkg("MsBackendSql")` provides functionality to store mass
Expand Down
Loading