Skip to content

Commit

Permalink
Merge pull request #211 from chhoumann/kb-299-future-works
Browse files Browse the repository at this point in the history
[KB-299] Future works
  • Loading branch information
chhoumann authored Jun 12, 2024
2 parents 0f04f53 + 1f1bb1a commit df66ac3
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion report_thesis/src/sections/future_work.tex
Original file line number Diff line number Diff line change
@@ -1 +1,23 @@
\section{Future Work}\label{sec:future_work}
\section{Future Work}\label{sec:future_work}
The findings of this study present several opportunities for future research.
Firstly, regarding our data partitioning algorithm detailed in Section~\ref{subsubsec:dataset_partitioning}, we observed the significance of identifying the optimal percentile value $p$.
This value is crucial for minimizing extreme values in the test set while preserving its overall representativeness.
Future work should explore quantitative methods for determining this optimal value.

Another potential improvement to the validation and testing approach we delineate is incorporating supplementary extreme value testing after the preimary evaluation.
This type of testing could be conducted using a small, separate subset of extreme values to assess the model's performance in these critical scenarios.
For example, this might involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model's effectiveness.

Tackling the challenges of limited data availability has proven important, as we mention throughout our report.
The small dataset size inherently restricts the number of extreme values present.
These extreme values are crucial for enhancing the model's generalizability, as they represent the most challenging cases to predict.
Future research could investigate methods for augmenting the dataset with synthetic data, including extreme values, to provide the model with more exposure to these cases during training.
This is a hard task, as it requires the production of synthethic data for a physics-based process.
We contemplate that some approximation may be sufficient, and could be used, for instance, as part of a transfer-learning project as inital training material.

Future work should also consider further experimentation with the choices of base estimators and meta-learners.
Our study demonstrated that various model and preprocessor configurations perform well.
However, identifying the optimal configurations and meta-learner for a specific oxide remains a challenging task.
In this study, we used a simple grouping method to ensure diversity in our base estimator selection, choosing from the top-performing configurations.
We also experimented briefly with a grid search approach, which could be examined further.
However, using a variation of the optimization framework we presented is likely to provide better trade-offs, as we have discussed.

0 comments on commit df66ac3

Please sign in to comment.