Running DFA on the CV datasets #15

sgjoshi25 · 2020-07-28T15:02:17Z

Create filtered lists of the data based on the CV threshold and NaN filters.
Run DFA of the datasheets and Gene/RxnKO knockouts to find the full change between the conditions

…filtered datasheets based on CV and NaN filter to put into DFA

…heets. Then, it runs the outputted model against GeneKO and RxnKO knockouts to find the full change

ScottCampit · 2020-08-05T03:05:40Z

The code works and I don't see any apparent bugs, so kudos on getting it to run. Depending on what you have tried out so far, this may be due to the way the data is processed to compute the flux activity coefficients, as from your EDA, we do expect significant changes between metabolites. Thus, I would suggest the following:

Play with extreme kappa/kappa2 values, like 1E-12, 1E-6, 1, and 10. If this doesn't affect the model at all, then I would move onto the next point
Check the flux activity coefficients directly. If they are all the same, that would explain why you're getting the same results. Then, you should think about how to normalize the data to extract more information. MAV and Quantile norms are built into DFA, but you can try other methods if you think they would work better.
Try -dox model if possible. This is another thing they wanted me to do, as it is a more direct control than NT.

To improve the scripts, here are some of my suggestions you may want to implement:

You should keep the formatting for all of your sections and livescripts consistent - for instance, dfa_tu8902.mlx seems to be more polished than dfa_tu8902_filtered_data.mlx.
The latter file name is a bit of a misnomer I feel - that script is filtering data by CV, not running DFA on the filtered data, as I initially thought.
The latter has a lot of redundant code. You can make it to a for loop using the following pseudocode:

array_avg = table2array(unfiltered_dataset_avg(:,12:end));
array_std = table2array(unfiltered_dataset_std(:,12:end));
CV = array_std ./ array_avg;

cv_values = [...]
filenames = [...]
for i = 1:length(cv_values)
    bool = all((CV >= cv_values(i)),2);
    filtered_data = unfiltered_dataset_avg(bool,2), :) = [];
    nan_filtered_data = filtered_data(sum(double(bool), 2) > 5, :) = [];
    writetable(filtered_data, filename(i), ...);
end

sgjoshi25 added 2 commits July 28, 2020 10:49

This script takes in the mapped metabolic excelsheet and created new …

8f8af76

…filtered datasheets based on CV and NaN filter to put into DFA

This script takes in the filtered datasheet and runs DFA on the datas…

73aace0

…heets. Then, it runs the outputted model against GeneKO and RxnKO knockouts to find the full change

sgjoshi25 requested a review from ScottCampit July 28, 2020 15:02

ScottCampit approved these changes Aug 5, 2020

View reviewed changes

ScottCampit added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running DFA on the CV datasets #15

Running DFA on the CV datasets #15

sgjoshi25 commented Jul 28, 2020

ScottCampit commented Aug 5, 2020

Running DFA on the CV datasets #15

Are you sure you want to change the base?

Running DFA on the CV datasets #15

Conversation

sgjoshi25 commented Jul 28, 2020

ScottCampit commented Aug 5, 2020