Added seed arguments

Plant-Food-Research-Open · Aug 11, 2024 · c06bf36 · c06bf36
1 parent 4cd040b
commit c06bf36
Show file tree

Hide file tree

Showing 281 changed files with 348,102 additions and 234 deletions.
diff --git a/_targets.R b/_targets.R
@@ -198,7 +198,8 @@ list(
     mo_set_complete,
     group = "status",
     to_keep_ns = c("snps" = 1000, "rnaseq" = 1000),
-    filtered_set_target_name = "mo_presel_supervised"
+    filtered_set_target_name = "mo_presel_supervised",
+    seed_perf = c("snps" = -1591100874, "rnaseq" = 1791752001)
   ),
 
   ##=================##
@@ -262,7 +263,8 @@ list(
       folds = 10,
       nrepeat = 5,
       dist = "centroids.dist",
-      cpus = 3
+      cpus = 3,
+      seed = 1659021768
     )
   ),
 
@@ -384,7 +386,8 @@ list(
       folds = 10,
       nrepeat = 5,
       measure = "cor",
-      cpus = 3
+      cpus = 3,
+      seed = -584594170
     )
   ),
 
@@ -474,7 +477,8 @@ list(
       ny = so2pls_cv_res["ny"],
       nr_folds = 10,
       keepx_seq = c(seq(5, 30, 5), seq(40, 100, 10)),
-      keepy_seq = c(seq(5, 40, 5))
+      keepy_seq = c(seq(5, 40, 5)),
+      seed = -1138855226
     )
   ),
   tar_target(

diff --git a/_targets/meta/meta b/_targets/meta/meta
diff --git a/diablo.qmd b/diablo.qmd
@@ -284,6 +284,8 @@ The function `diablo_tune()` provides a wrapper around the `mixOmics::tune()` fu
 
 The `keepX_list` argument controls the grid of values to be tested as possible number of features to retain from each dataset. It should be in the form of a named list, with one element per dataset, and where each element is a vector of integers corresponding to the values to test. The names of the list should correspond to the names of the datasets in the `MultiDataSet` object. If no value is provided for `keepX_list`, six values ranging from 5 to 30 (by increments of 5) are tested for each dataset.
 
+We can also set the seed for the computations via the `seed` argument.
+
 ::: {.targets-chunk}
 ```{targets diablo-tune-res}
 tar_target(
@@ -296,7 +298,8 @@ tar_target(
     folds = 10,
     nrepeat = 5,
     dist = "centroids.dist",
-    cpus = 3
+    cpus = 3,
+    seed = 1659021768
   )
 )
 ```

diff --git a/docs/comparison.html b/docs/comparison.html
@@ -62,6 +62,7 @@
 <meta name="quarto:offset" content="./">
 <link href="./references.html" rel="next">
 <link href="./evaluation.html" rel="prev">
+<link href="./images/logo.png" rel="icon" type="image/png">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
 <script src="site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -1385,7 +1386,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <div class="cell">
 <div class="sourceCode" id="cb9"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://plant-food-research-open.github.io/moiraine/reference/comparison_heatmap_corr.html">comparison_heatmap_corr</a></span><span class="op">(</span><span class="va">output_list</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/comparison-heatmap-corr-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/comparison-heatmap-corr-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>The function generates two half-heatmaps. The heatmap on the left is a visualisation of the correlation between the features weight of the different latent dimensions; the one on the right shows the correlation between their samples score. Since correlation matrices are symmetric, only one triangle of each matrix is represented. The rows and columns of the heatmaps each correspond to one of the latent dimensions generated by one of the integration methods. Their name is abbreviated (C stands for component, F for Factor, JC for joint component, RSC for rnaseq-specific component and MSC for metabolome-specific component). The method through which each latent dimension was generated is indicated next to its name as a coloured annotation. In each heatmap, the rows and columns have been ordered according to a clustering performed on the correlation matrix, so that the latent dimensions most similar (in terms of samples score or features weight) are next to each other.</p>
@@ -1403,7 +1404,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/comparison-heatmap-corr-subset-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/comparison-heatmap-corr-subset-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>From the heatmaps, we can see that some latent dimensions constructed by the different methods seem to capture similar trends in the data. For example, MOFA factor 1, sPLS component 1, DIABLO component 1 and sO2PLS joint component 1 are all strongly correlated in terms of their samples score. Their correlation in terms of features weight is a bit lower, which is due to the fact that some methods perform features selection, therefore all non-selected features are given a weight of 0. Note that the sign of the correlation is interesting but not very important. We can also see some latent dimensions that seem correlated with respect to one metric but not the other. For example, there is a strong correlation between the samples score of MOFA factor 4 and sPLS component 2, but this is not reflected in their features weight. Again, that can be because one method performs latent selection and not the other. On the other hand, the correlation between sPLS component 3 and DIABLO component 2 is stronger when looking at their features weight than at their samples score.</p>
@@ -1418,7 +1419,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  legend_ncol <span class="op">=</span> <span class="fl">1</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/comparison-heatmap-corr-mofa-1.png" class="img-fluid" width="672"></p>
+<p><img src="comparison_files/figure-html/comparison-heatmap-corr-mofa-1.svg" class="img-fluid" width="672"></p>
 </div>
 </div>
 </section><section id="a-note-about-missing-features" class="level2" data-number="14.4"><h2 data-number="14.4" class="anchored" data-anchor-id="a-note-about-missing-features">
@@ -1494,7 +1495,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <div class="cell">
 <div class="sourceCode" id="cb13"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://plant-food-research-open.github.io/moiraine/reference/comparison_plot_correlation.html">comparison_plot_correlation</a></span><span class="op">(</span><span class="va">output_list</span><span class="op">[</span><span class="fl">2</span><span class="op">:</span><span class="fl">1</span><span class="op">]</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/comparison-plot-correlation-1.png" class="img-fluid" width="960"></p>
+<p><img src="comparison_files/figure-html/comparison-plot-correlation-1.svg" class="img-fluid" width="960"></p>
 </div>
 </div>
 <p>As with <code><a href="https://plant-food-research-open.github.io/moiraine/reference/comparison_heatmap_corr.html">comparison_heatmap_corr()</a></code>, the function displays the correlation between the latent dimensions’ samples score on the left, and between their features weight on the right, but using correlation plots rather than heatmaps. By default, only correlation coefficients above 0.2 have their value displayed (for better clarity), but this can be customised through the <code>min_show_corr</code> argument.</p>
@@ -1508,7 +1509,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/comparison-plot-correlation-subset-1.png" class="img-fluid" width="960"></p>
+<p><img src="comparison_files/figure-html/comparison-plot-correlation-subset-1.svg" class="img-fluid" width="960"></p>
 </div>
 </div>
 </section><section id="comparing-samples-score" class="level2" data-number="14.7"><h2 data-number="14.7" class="anchored" data-anchor-id="comparing-samples-score">
@@ -1525,7 +1526,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span><span class="op">)</span> <span class="op">+</span></span>
 <span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/scale_brewer.html">scale_colour_brewer</a></span><span class="op">(</span>palette <span class="op">=</span> <span class="st">"Set1"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-samples-score-pair-diablo-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-samples-score-pair-diablo-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>This function can also be used to compare the samples score of two latent dimensions from two integration different methods. This is done by passing to the function a list of length 2 containing the output of two different methods. The name of the latent dimensions to compare are provided as a named list, where each name corresponds to either the name of the method (if the input list is not named) or the name of the element in the input list. So for example, to compare the first latent dimension of MOFA and DIABLO:</p>
@@ -1539,7 +1540,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/scale_brewer.html">scale_colour_brewer</a></span><span class="op">(</span>palette <span class="op">=</span> <span class="st">"Set1"</span><span class="op">)</span></span>
 <span><span class="co">#&gt; Warning: Removed 9 rows containing missing values (`geom_point()`).</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-samples-score-pair-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-samples-score-pair-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>We can see that the samples score of the two latent dimensions are strongly correlated, showing that the two latent dimensions capture a similar trend in the data.</p>
@@ -1557,7 +1558,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-features_weight-pair-mofa-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-features_weight-pair-mofa-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>This function can also be used to compare the features weight of two latent dimensions from two different integration methods. This is done by passing to the function a list of length 2 containing the output of two different methods. The name of the latent dimensions to compare are provided as a named list, where each name corresponds to either the name of the method (if the input list is not named) or the name of the element in the input list. We will again compare MOFA factor 1 and DIABLO component 1:</p>
@@ -1573,7 +1574,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-features-weight-pair-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-features-weight-pair-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>We can see that while MOFA and DIABLO identify the same metabolites as being the most important to separate healthy and infected animals, the genomic markers and genes that are given the highest importance score by MOFA are not selected with DIABLO.</p>
@@ -1591,7 +1592,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-features-weight-raw-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-features-weight-raw-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>By default, the <code><a href="https://plant-food-research-open.github.io/moiraine/reference/plot_features_weight_pair.html">plot_features_weight_pair()</a></code> function uses the geometric consensus importance metric (more details in the next section) to highlight the 5 features identified as most important by both methods. Both the number of features highlighted and the metric used can be controlled, through the <code>top_n</code> and <code>metric</code> arguments, respectively. In the following section, we will expand on the concept of consensus importance and the different metrics available.</p>
@@ -1613,7 +1614,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <div class="cell">
 <div class="sourceCode" id="cb20"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://plant-food-research-open.github.io/moiraine/reference/show_consensus_metrics.html">show_consensus_metrics</a></span><span class="op">(</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/show-consensus-importance-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/show-consensus-importance-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 <p>In the plots, the consensus importance values have been normalised so that the highest value is 1. As we can see, metrics such as the geometric mean, harmonic mean, product or minimum will give higher consensus scores to features that are consistently assigned a high importance score across all methods, while features that have high importance score with one method but low score with the other will get a lower consensus score. Conversely, metrics such as the L2-norm or maximum prioritise features that are given a high importance score by at least one method, regardless of their importance score with other methods.</p>
@@ -1720,7 +1721,7 @@ <h1 class="title"><span id="sec-comparison" class="quarto-section-identifier"><s
 <span>  <span class="op">)</span></span>
 <span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
-<p><img src="comparison_files/figure-html/plot-features-weight-average-1.png" class="img-fluid" width="768"></p>
+<p><img src="comparison_files/figure-html/plot-features-weight-average-1.svg" class="img-fluid" width="768"></p>
 </div>
 </div>
 </section></section><section id="recap-targets-list" class="level2" data-number="14.10"><h2 data-number="14.10" class="anchored" data-anchor-id="recap-targets-list">

diff --git a/docs/comparison_files/figure-html/comparison-heatmap-corr-1.png b/docs/comparison_files/figure-html/comparison-heatmap-corr-1.png