-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computation consistently gets stuck during the first trial in mipro #1970
Comments
Are you using Ollama? Make sure your ollama run can handle multiple threads at once. There's a config for that in Ollama. Alternatively, just replace num_threads=16 with num_threads=1. |
Thank you for getting back to me. No, I am not using ollama. I use an Azure Databricks Spark cluster with 64 GB Memory, 16 Cores, and no GPU. I will run it with num_threads=1. |
Could you please advise how to debug MIPROv2? How to see where is it, what is it doing, and how to monitor its performance. At present, it is a black box for me. |
You can set verbose=True Or just paste the stage where it gets stuck here (MIPRO prints a lot of logs) and we can help |
In stderr file of Azure Databricks, I am getting a message about ANTLR: appcds_setup elapsed time: 0.000 |
Verbose was True from day one. The problem is that I do not see any logs except those posted in the previous comment. When MIPRO starts the first trial (num_trials=1), everything gets stuck. The logs are empty. There is nothing there. What else should I enable to get more visibility? Please advise. |
Hi @GaryChizm , could you share the stack trace and printed console messages you see when running your program with MIPRO? |
Here is the cell that has been running over 2 hours with practically no CPU activity on the cluster: print("This is before Configure Logging") Step 1: Reconfigure existing loggers to capture all eventsfor handler in logging.root.handlers[:]: Force File Creationlog_file_path = os.path.join(os.getcwd(), "execution_logs.log") Configure logginglogging.basicConfig( logging.debug("Reconfigured logging to capture all events.") Step 2: Granular logging for troubleshootingtry: print("This is before Initialize metric and student program")
except Exception as e: print("This is before Initialize MIPROv2") Step 3: Investigate resource bottleneckstry: Step 4: Monitor External Dependenciestry: Initialize MIPROv2try: print("This is after Initialize MIPROv2 before final optimized program") Compile the final optimized programtry: Step 5: Test CPU utilization with dummy computationtry: print("This is before Final logging statements") Final logging statementslogging.info("Execution completed successfully.") I managed to create a log file: execution_logs.log, but there is nothing there except one comment #Log file initialized. Stderr file on Azure Databricks contains 0 bytes. |
Please note that I tried to use verbose=True and Python resources to enable logging. However, all I am getting is very limited. Stderr: 2025/01/02 21:14:55 INFO dspy.teleprompt.mipro_optimizer_v2: Could you please advise how to enable additional logging for MIPROv2? |
I added this code: import logging Configure logging to write logs to a filelogging.basicConfig( |
Hi @GaryChizm , thanks for sharing this code. Could you actually remove all logging code and just run the base DSPy code with no MIPROv2 optimizer and share what the execution outputs are? Essentially, just running this code cell
If this execution works, another thing I would try is configuring MIPRO with
lmk if this works or if you face any other issues! |
Thank you. It works. Please let me know about adjusting MIPRO parameters to optimize output while avoiding cluster overload since GPU cannot be used at the moment. I am trying to improve metrics (the results of MIPRO optimization). |
Dear Sirs,
I hope this message finds you well. I am reaching out to seek your guidance regarding an issue I am encountering with the MIPROv2 optimizer in DSPy.
When running the optimizer, the computation consistently gets stuck during the first trial and never progresses to completion. The compute process continues indefinitely without any indication of errors or progress. Despite using a minimal configuration to reduce the computational load, the problem persists. Another optimizer works quite well with this data.
Here are the key settings and context of the issue:
Datasets are not very large. 20 text files (40-70k each)
Parameters are as follows:
print("This is before teleprompt import MIPROv2")
from dspy.teleprompt import MIPROv2
metric = nonlinear_score
RCR_student_program = RCR_Classifier_Pipeline(prompt_model=lm)
print("This is before teleprompter = MIPROv2(")
teleprompter = MIPROv2(
prompt_model=lm,
task_model=RCR_student_program,
metric=metric_with_assertions_retry_5,
verbose=True,
auto="light",
max_bootstrapped_demos=2, # Set a limit on bootstrapped demos
max_labeled_demos=2, # Set a limit on labeled demos
num_candidates=3, # Limit the number of candidates
num_threads=16,
metric_threshold=0.80, # Set a metric threshold
)
print("This is after teleprompter = MIPROv2 before telepromter.compile")
final_optimized_program = teleprompter.compile(
student=RCR_student_program,
trainset=trainset,
valset=valset,
minibatch=True, # Enable minibatch processing
minibatch_size=25, # Set minibatch size
num_trials=1, # Set the number of trials
minibatch_full_eval_steps=10 # Set the number of full evaluation steps
)
print("This is after teleprompter = MIPROv2 before telepromter.compile")
Models:Both prompt_model=gpt4ominiptu and task_model=mrtr-gpt-4-32k are lightweight to ensure computational efficiency. DSPy version is 2.5.43.
Environment: Azure Databricks spark cluster. 64 GB Memory, 16 Cores, no GPU. Another optimizer works with this config quite well. Everything gets stuck with teleprompter.complile.
Stderr file contains one warning message about ANTLR parser (reapets 3-4 times): ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
I was trying to get to the root cause of the problem, but the documentation for MIPRO is not up to date. I would greatly appreciate your help in diagnosing and resolving this problem. If additional information or logs would help, please let me know, and I would be happy to provide them.
Thank you for your time and support.
Best regards,
Igor Chizhov
[email protected]
The text was updated successfully, but these errors were encountered: