From 00999f344f9c8a8a07f46f108a63ad74de329254 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Wed, 12 Jun 2024 01:48:00 +0200 Subject: [PATCH 1/4] ready for review --- report_thesis/src/sections/future_work.tex | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/report_thesis/src/sections/future_work.tex b/report_thesis/src/sections/future_work.tex index c45c4e0d..2a4266e5 100644 --- a/report_thesis/src/sections/future_work.tex +++ b/report_thesis/src/sections/future_work.tex @@ -1 +1,17 @@ -\section{Future Work}\label{sec:future_work} \ No newline at end of file +\section{Future Work}\label{sec:future_work} +The findings of this study open several avenues for future research. +Firstly, for our data partitioning algorithm, described in Section~\ref{subsubsec:dataset_partitioning}, we noted that finding the optimal percentile value $p$ that minimizes extreme values in the test set, while maintaining its general representativeness, is important. +Future work should consider methods of quantitatively assesing and finding this value. +Such methods could include supplementary extreme value testing to the data partitioning algorithm, where after the primary evaluation, additional testing is conducted using a small, separate subset of extreme values to assess the model's performance on these critical cases. +For example, this could involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model. + +Another point of interest is limited data availability. +The small dataset size naturally limits the amount of how many extreme values are present. +These extreme values are an essential part of improving the model's generalizability, as they are the most challenging cases to predict. +Future work should investigate methods of augmenting the dataset with synthetic extreme value data to provide the model with more exposure to these cases during training. + +Future work should also consider further experimentation with the choices of base estimators and meta-learners. +Our study highlighted that multiple model and preprocessor configurations perform well. +However, determining which configurations and meta-learner is optimal for a given oxide is a challenging task. +In this study, we used a simple grouping to ensure diversity in our base estimator selection, chosen from the top-performing configurations. +This approach could be improved upon by, for example, developing more advanced selection methods that consider the base estimators and meta-learners in conjunction. \ No newline at end of file From c71e24e6031dd5f672d02401ea50218adc9797e4 Mon Sep 17 00:00:00 2001 From: Ivikhostrup <56341364+Ivikhostrup@users.noreply.github.com> Date: Wed, 12 Jun 2024 01:57:23 +0200 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com> --- report_thesis/src/sections/future_work.tex | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/report_thesis/src/sections/future_work.tex b/report_thesis/src/sections/future_work.tex index 2a4266e5..76b9625f 100644 --- a/report_thesis/src/sections/future_work.tex +++ b/report_thesis/src/sections/future_work.tex @@ -1,11 +1,13 @@ \section{Future Work}\label{sec:future_work} -The findings of this study open several avenues for future research. -Firstly, for our data partitioning algorithm, described in Section~\ref{subsubsec:dataset_partitioning}, we noted that finding the optimal percentile value $p$ that minimizes extreme values in the test set, while maintaining its general representativeness, is important. -Future work should consider methods of quantitatively assesing and finding this value. -Such methods could include supplementary extreme value testing to the data partitioning algorithm, where after the primary evaluation, additional testing is conducted using a small, separate subset of extreme values to assess the model's performance on these critical cases. -For example, this could involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model. +The findings of this study present several opportunities for future research. +Firstly, regarding our data partitioning algorithm detailed in Section~\ref{subsubsec:dataset_partitioning}, we observed the significance of identifying the optimal percentile value $p$. +This value is crucial for minimizing extreme values in the test set while preserving its overall representativeness. +Future work should explore quantitative methods for determining this optimal value. +Such methods could involve incorporating supplementary extreme value testing into the data partitioning algorithm. +After the primary evaluation, additional testing could be conducted using a small, separate subset of extreme values to assess the model's performance in these critical scenarios. +For example, this might involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model's effectiveness. -Another point of interest is limited data availability. +Another point of interest is the limited data availability. The small dataset size naturally limits the amount of how many extreme values are present. These extreme values are an essential part of improving the model's generalizability, as they are the most challenging cases to predict. Future work should investigate methods of augmenting the dataset with synthetic extreme value data to provide the model with more exposure to these cases during training. From 264de7be0fa6a941b2c4d01bb842c806c123dbb8 Mon Sep 17 00:00:00 2001 From: Ivikhostrup <56341364+Ivikhostrup@users.noreply.github.com> Date: Wed, 12 Jun 2024 02:02:08 +0200 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com> --- report_thesis/src/sections/future_work.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/report_thesis/src/sections/future_work.tex b/report_thesis/src/sections/future_work.tex index 76b9625f..66195a1e 100644 --- a/report_thesis/src/sections/future_work.tex +++ b/report_thesis/src/sections/future_work.tex @@ -8,12 +8,12 @@ \section{Future Work}\label{sec:future_work} For example, this might involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model's effectiveness. Another point of interest is the limited data availability. -The small dataset size naturally limits the amount of how many extreme values are present. -These extreme values are an essential part of improving the model's generalizability, as they are the most challenging cases to predict. -Future work should investigate methods of augmenting the dataset with synthetic extreme value data to provide the model with more exposure to these cases during training. +The small dataset size inherently restricts the number of extreme values present. +These extreme values are crucial for enhancing the model's generalizability, as they represent the most challenging cases to predict. +Future research should investigate methods for augmenting the dataset with synthetic extreme value data to provide the model with more exposure to these cases during training. Future work should also consider further experimentation with the choices of base estimators and meta-learners. -Our study highlighted that multiple model and preprocessor configurations perform well. -However, determining which configurations and meta-learner is optimal for a given oxide is a challenging task. -In this study, we used a simple grouping to ensure diversity in our base estimator selection, chosen from the top-performing configurations. -This approach could be improved upon by, for example, developing more advanced selection methods that consider the base estimators and meta-learners in conjunction. \ No newline at end of file +Our study demonstrated that various model and preprocessor configurations perform well. +However, identifying the optimal configurations and meta-learner for a specific oxide remains a challenging task. +In this study, we used a simple grouping method to ensure diversity in our base estimator selection, choosing from the top-performing configurations. +This approach could be improved upon by, for example, developing more advanced selection methods that consider the interactions between base estimators and meta-learners. \ No newline at end of file From 1f1bb1aa40104f769cbd9f76390e72c301a349f3 Mon Sep 17 00:00:00 2001 From: Ivikhostrup <56341364+Ivikhostrup@users.noreply.github.com> Date: Wed, 12 Jun 2024 02:26:04 +0200 Subject: [PATCH 4/4] Update report_thesis/src/sections/future_work.tex Co-authored-by: Christian Bager Bach Houmann --- report_thesis/src/sections/future_work.tex | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/report_thesis/src/sections/future_work.tex b/report_thesis/src/sections/future_work.tex index 66195a1e..342304b1 100644 --- a/report_thesis/src/sections/future_work.tex +++ b/report_thesis/src/sections/future_work.tex @@ -3,17 +3,21 @@ \section{Future Work}\label{sec:future_work} Firstly, regarding our data partitioning algorithm detailed in Section~\ref{subsubsec:dataset_partitioning}, we observed the significance of identifying the optimal percentile value $p$. This value is crucial for minimizing extreme values in the test set while preserving its overall representativeness. Future work should explore quantitative methods for determining this optimal value. -Such methods could involve incorporating supplementary extreme value testing into the data partitioning algorithm. -After the primary evaluation, additional testing could be conducted using a small, separate subset of extreme values to assess the model's performance in these critical scenarios. + +Another potential improvement to the validation and testing approach we delineate is incorporating supplementary extreme value testing after the preimary evaluation. +This type of testing could be conducted using a small, separate subset of extreme values to assess the model's performance in these critical scenarios. For example, this might involve slightly reducing the percentile value $p$ and using the extreme values that fall within this reduced range to evaluate the model's effectiveness. -Another point of interest is the limited data availability. +Tackling the challenges of limited data availability has proven important, as we mention throughout our report. The small dataset size inherently restricts the number of extreme values present. These extreme values are crucial for enhancing the model's generalizability, as they represent the most challenging cases to predict. -Future research should investigate methods for augmenting the dataset with synthetic extreme value data to provide the model with more exposure to these cases during training. +Future research could investigate methods for augmenting the dataset with synthetic data, including extreme values, to provide the model with more exposure to these cases during training. +This is a hard task, as it requires the production of synthethic data for a physics-based process. +We contemplate that some approximation may be sufficient, and could be used, for instance, as part of a transfer-learning project as inital training material. Future work should also consider further experimentation with the choices of base estimators and meta-learners. Our study demonstrated that various model and preprocessor configurations perform well. However, identifying the optimal configurations and meta-learner for a specific oxide remains a challenging task. In this study, we used a simple grouping method to ensure diversity in our base estimator selection, choosing from the top-performing configurations. -This approach could be improved upon by, for example, developing more advanced selection methods that consider the interactions between base estimators and meta-learners. \ No newline at end of file +We also experimented briefly with a grid search approach, which could be examined further. +However, using a variation of the optimization framework we presented is likely to provide better trade-offs, as we have discussed. \ No newline at end of file