Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389
Labels
untriaged
New issue has not been triaged
Hello,
I encounter an error that happens sometimes and sometimes not which I can't understand why that happens when I train a:
mlContext.Auto().CreateRegressionExperiment
(I have over 20 GB of free RAM when the error occurs)
General Exception:
Message: One or more errors occurred.
at System.Threading.Tasks.Task
1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.ML.AutoML.AutoMLExperiment.Run() at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator
1 preFeaturizer, IProgress`1 progressHandler)Inner Exception:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity at Microsoft.ML.AutoML.AutoMLExperiment.d__24.MoveNext()
I will explain step by step what I have done:
Step 1:
I have filled below IDataViews where each row has 50 Features and a "Label" Target Truth column.
(Pseudo code) IDataViews contains those number of rows with 51 Columns:
Each Feature value for all rows has float values and has been checked to ensure valid values against this function:
Step 2:
I now use those 2 IDataViews in below function to train the Model. But the training stops after 0-3 seconds all the time (I also use a loop to call the below function where I shuffle the data in each iteration to see if that could solve it but it doesn't help) and produces the above/below error which is:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity
As seen I have set training time to 240 seconds. Increasing does not seem to be the problem as training stops after 0-3 seconds.
Why is that happening as all feature values has valid "float" values?
If I for example use less rows. For example 150,000 rows. The error will most of the time not occur and the training of the models works fine.
But to not confuse, I also have other instances where I use more than those 250,000 rows and training can succeed.
So this error happens truly at random as it seems and how to understand why this is happening as the error doesn't tell exactly where and why this is happening. How to solve this problem?
Thank you!
The text was updated successfully, but these errors were encountered: