-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #211 from chhoumann/kb-299-future-works
[KB-299] Future works
- Loading branch information
Showing
1 changed file
with
23 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,23 @@ | ||
\section{Future Work}\label{sec:future_work} | ||
\section{Future Work}\label{sec:future_work} | ||
The findings of this study present several opportunities for future research. | ||
Firstly, regarding our data partitioning algorithm detailed in Section~\ref{subsubsec:dataset_partitioning}, we observed the significance of identifying the optimal percentile value $p$. | ||
This value is crucial for minimizing extreme values in the test set while preserving its overall representativeness. | ||
Future work should explore quantitative methods for determining this optimal value. | ||
|
||
Another potential improvement to the validation and testing approach we delineate is incorporating supplementary extreme value testing after the preimary evaluation. | ||
This type of testing could be conducted using a small, separate subset of extreme values to assess the model's performance in these critical scenarios. | ||
For example, this might involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model's effectiveness. | ||
|
||
Tackling the challenges of limited data availability has proven important, as we mention throughout our report. | ||
The small dataset size inherently restricts the number of extreme values present. | ||
These extreme values are crucial for enhancing the model's generalizability, as they represent the most challenging cases to predict. | ||
Future research could investigate methods for augmenting the dataset with synthetic data, including extreme values, to provide the model with more exposure to these cases during training. | ||
This is a hard task, as it requires the production of synthethic data for a physics-based process. | ||
We contemplate that some approximation may be sufficient, and could be used, for instance, as part of a transfer-learning project as inital training material. | ||
|
||
Future work should also consider further experimentation with the choices of base estimators and meta-learners. | ||
Our study demonstrated that various model and preprocessor configurations perform well. | ||
However, identifying the optimal configurations and meta-learner for a specific oxide remains a challenging task. | ||
In this study, we used a simple grouping method to ensure diversity in our base estimator selection, choosing from the top-performing configurations. | ||
We also experimented briefly with a grid search approach, which could be examined further. | ||
However, using a variation of the optimization framework we presented is likely to provide better trade-offs, as we have discussed. |