Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

Open
jackpotcityco opened this issue Feb 18, 2025 · 0 comments
Labels
untriaged New issue has not been triaged

Comments

@jackpotcityco
Copy link

jackpotcityco commented Feb 18, 2025

Hello,

  • ML.net: 3.0.1
  • CPU: i7-12800h
  • 24 MB Intel® Smart Cache
  • RAM: 64 GB

I encounter an error that happens sometimes and sometimes not which I can't understand why that happens when I train a:
mlContext.Auto().CreateRegressionExperiment

(I have over 20 GB of free RAM when the error occurs)

General Exception:
Message: One or more errors occurred.
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.ML.AutoML.AutoMLExperiment.Run() at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress`1 progressHandler)
Inner Exception:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity at Microsoft.ML.AutoML.AutoMLExperiment.d__24.MoveNext()

I will explain step by step what I have done:

Step 1:
I have filled below IDataViews where each row has 50 Features and a "Label" Target Truth column.

(Pseudo code) IDataViews contains those number of rows with 51 Columns:

             IDataView trainData          (Has 175000 rows)
             IDataView hold_out_data      (Has 75000 rows)

Each Feature value for all rows has float values and has been checked to ensure valid values against this function:

bool IsValid(float value)
{
    // A valid number is not NaN and not Infinity
    return !float.IsNaN(value) && !float.IsInfinity(value);
}

Step 2:
I now use those 2 IDataViews in below function to train the Model. But the training stops after 0-3 seconds all the time (I also use a loop to call the below function where I shuffle the data in each iteration to see if that could solve it but it doesn't help) and produces the above/below error which is:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity

As seen I have set training time to 240 seconds. Increasing does not seem to be the problem as training stops after 0-3 seconds.
Why is that happening as all feature values has valid "float" values?

If I for example use less rows. For example 150,000 rows. The error will most of the time not occur and the training of the models works fine.
But to not confuse, I also have other instances where I use more than those 250,000 rows and training can succeed.

So this error happens truly at random as it seems and how to understand why this is happening as the error doesn't tell exactly where and why this is happening. How to solve this problem?

Thank you!

        void Model_Training(IDataView trainData, IDataView hold_out_data)
        {
            var mlContext = new MLContext();
            var cts = new CancellationToken();
            ExperimentBase<RegressionMetrics, RegressionExperimentSettings> regression_Experiment = null;
            regression_Experiment = mlContext.Auto().CreateRegressionExperiment(new RegressionExperimentSettings
            {
                MaxExperimentTimeInSeconds = 240,
                CacheBeforeTrainer = CacheBeforeTrainer.Off,
                CacheDirectoryName = "C:/Aintelligence/temp/cache",
                MaximumMemoryUsageInMegaByte = 16384,
                OptimizingMetric = RegressionMetric.RSquared,
                CancellationToken = cts
            });

            // Progress handler for regression
            var regressionProgressHandler = new Progress<RunDetail<RegressionMetrics>>(ph =>
            {
                if (ph.ValidationMetrics != null) { progress(Math.Round(ph.ValidationMetrics.RSquared, 3), ph.TrainerName, ph.ValidationMetrics, ph.Model); }
            });
            void progress(double metricValue, string TrainerName, object ValidationMetrics, ITransformer Model)
            {
                //Log this info
                var logInfo = (TrainerName, ValidationMetrics, Model);
            }
            try
            {
                //Do something with the results
                var results = regression_Experiment.Execute(trainData, hold_out_data, labelColumnName: "Label", progressHandler: regressionProgressHandler);
            }
            catch (Exception ex)
            {
                //Log this error
                string str = $"General Exception:\nMessage: {ex.Message}\n{ex.StackTrace}\n{(ex.InnerException != null ? $"Inner Exception:\n{ex.InnerException.Message}\n{ex.InnerException.StackTrace}\n" : "")}";
            }
        }
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
untriaged New issue has not been triaged
Projects
None yet
Development

No branches or pull requests

1 participant