Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running DFA on the CV datasets #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Running DFA on the CV datasets #15

wants to merge 2 commits into from

Conversation

sgjoshi25
Copy link
Contributor

Create filtered lists of the data based on the CV threshold and NaN filters.
Run DFA of the datasheets and Gene/RxnKO knockouts to find the full change between the conditions

…filtered datasheets based on CV and NaN filter to put into DFA
…heets. Then, it runs the outputted model against GeneKO and RxnKO knockouts to find the full change
@ScottCampit
Copy link
Member

The code works and I don't see any apparent bugs, so kudos on getting it to run. Depending on what you have tried out so far, this may be due to the way the data is processed to compute the flux activity coefficients, as from your EDA, we do expect significant changes between metabolites. Thus, I would suggest the following:

  • Play with extreme kappa/kappa2 values, like 1E-12, 1E-6, 1, and 10. If this doesn't affect the model at all, then I would move onto the next point

  • Check the flux activity coefficients directly. If they are all the same, that would explain why you're getting the same results. Then, you should think about how to normalize the data to extract more information. MAV and Quantile norms are built into DFA, but you can try other methods if you think they would work better.

  • Try -dox model if possible. This is another thing they wanted me to do, as it is a more direct control than NT.

To improve the scripts, here are some of my suggestions you may want to implement:

  • You should keep the formatting for all of your sections and livescripts consistent - for instance, dfa_tu8902.mlx seems to be more polished than dfa_tu8902_filtered_data.mlx.

  • The latter file name is a bit of a misnomer I feel - that script is filtering data by CV, not running DFA on the filtered data, as I initially thought.

  • The latter has a lot of redundant code. You can make it to a for loop using the following pseudocode:

array_avg = table2array(unfiltered_dataset_avg(:,12:end));
array_std = table2array(unfiltered_dataset_std(:,12:end));
CV = array_std ./ array_avg;

cv_values = [...]
filenames = [...]
for i = 1:length(cv_values)
    bool = all((CV >= cv_values(i)),2);
    filtered_data = unfiltered_dataset_avg(bool,2), :) = [];
    nan_filtered_data = filtered_data(sum(double(bool), 2) > 5, :) = [];
    writetable(filtered_data, filename(i), ...);
end

@ScottCampit ScottCampit added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants