Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

jackpotcityco · 2025-02-18T17:28:11Z

Hello,

ML.net: 3.0.1
CPU: i7-12800h
24 MB Intel® Smart Cache
RAM: 64 GB

I encounter an error that happens sometimes and sometimes not which I can't understand why that happens when I train a:
mlContext.Auto().CreateRegressionExperiment

(I have over 20 GB of free RAM when the error occurs)

General Exception:
Message: One or more errors occurred.
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.ML.AutoML.AutoMLExperiment.Run() at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress`1 progressHandler)
Inner Exception:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity at Microsoft.ML.AutoML.AutoMLExperiment.d__24.MoveNext()

I will explain step by step what I have done:

Step 1:
I have filled below IDataViews where each row has 50 Features and a "Label" Target Truth column.

(Pseudo code) IDataViews contains those number of rows with 51 Columns:

             IDataView trainData          (Has 175000 rows)
             IDataView hold_out_data      (Has 75000 rows)

Each Feature value for all rows has float values and has been checked to ensure valid values against this function:

bool IsValid(float value)
{
    // A valid number is not NaN and not Infinity
    return !float.IsNaN(value) && !float.IsInfinity(value);
}

Step 2:
I now use those 2 IDataViews in below function to train the Model. But the training stops after 0-3 seconds all the time (I also use a loop to call the below function where I shuffle the data in each iteration to see if that could solve it but it doesn't help) and produces the above/below error which is:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity

As seen I have set training time to 240 seconds. Increasing does not seem to be the problem as training stops after 0-3 seconds.
Why is that happening as all feature values has valid "float" values?

If I for example use less rows. For example 150,000 rows. The error will most of the time not occur and the training of the models works fine.
But to not confuse, I also have other instances where I use more than those 250,000 rows and training can succeed.

So this error happens truly at random as it seems and how to understand why this is happening as the error doesn't tell exactly where and why this is happening. How to solve this problem?

Thank you!

        void Model_Training(IDataView trainData, IDataView hold_out_data)
        {
            var mlContext = new MLContext();
            var cts = new CancellationToken();
            ExperimentBase<RegressionMetrics, RegressionExperimentSettings> regression_Experiment = null;
            regression_Experiment = mlContext.Auto().CreateRegressionExperiment(new RegressionExperimentSettings
            {
                MaxExperimentTimeInSeconds = 240,
                CacheBeforeTrainer = CacheBeforeTrainer.Off,
                CacheDirectoryName = "C:/Aintelligence/temp/cache",
                MaximumMemoryUsageInMegaByte = 16384,
                OptimizingMetric = RegressionMetric.RSquared,
                CancellationToken = cts
            });

            // Progress handler for regression
            var regressionProgressHandler = new Progress<RunDetail<RegressionMetrics>>(ph =>
            {
                if (ph.ValidationMetrics != null) { progress(Math.Round(ph.ValidationMetrics.RSquared, 3), ph.TrainerName, ph.ValidationMetrics, ph.Model); }
            });
            void progress(double metricValue, string TrainerName, object ValidationMetrics, ITransformer Model)
            {
                //Log this info
                var logInfo = (TrainerName, ValidationMetrics, Model);
            }
            try
            {
                //Do something with the results
                var results = regression_Experiment.Execute(trainData, hold_out_data, labelColumnName: "Label", progressHandler: regressionProgressHandler);
            }
            catch (Exception ex)
            {
                //Log this error
                string str = $"General Exception:\nMessage: {ex.Message}\n{ex.StackTrace}\n{(ex.InnerException != null ? $"Inner Exception:\n{ex.InnerException.Message}\n{ex.InnerException.StackTrace}\n" : "")}";
            }
        }

The text was updated successfully, but these errors were encountered:

dotnet-policy-service bot added the untriaged New issue has not been triaged label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

jackpotcityco commented Feb 18, 2025 •

edited

Loading

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

Comments

jackpotcityco commented Feb 18, 2025 • edited Loading

jackpotcityco commented Feb 18, 2025 •

edited

Loading