-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic memory logging #4234
Add basic memory logging #4234
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Added memory monitoring functionality to track and log process memory usage and CPU metrics across the application, with persistent storage and rotation capabilities.
- Added new
celery_worker_monitoring
inbackend/supervisord.conf
to handle dedicated memory monitoring tasks - Implemented
memory_monitoring.py
with rotating file handler (10MB limit, 5 backups) and structured logging format for memory/CPU metrics - Added persistent
log_store
volume mounted at/var/log/persisted-logs
across all Docker Compose configurations - Integrated
emit_process_memory
function in indexing tasks to track worker memory usage every 60 seconds - Added memory monitoring to both
api_server
andbackground
services in production configuration for comprehensive coverage
10 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile
|
||
# Create a dedicated logger for memory monitoring | ||
memory_logger = logging.getLogger("memory_monitoring") | ||
memory_logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Set propagate=False to prevent duplicate logs if parent logger has handlers
memory_logger.setLevel(logging.INFO) | |
memory_logger.setLevel(logging.INFO) | |
memory_logger.propagate = False |
# optional, only for debugging purposes | ||
volumes: | ||
- log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Volume mount is marked as optional/debugging but is required for memory monitoring to work. Consider removing the 'optional' comment or clarifying when it's needed.
except Exception as e: | ||
logger.error(f"Error monitoring worker memory: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Generic exception handling masks specific issues like NoSuchProcess or AccessDenied. Consider handling psutil.Error explicitly.
try: | ||
process = psutil.Process(pid) | ||
memory_info = process.memory_info() | ||
cpu_percent = process.cpu_percent(interval=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: cpu_percent() with interval=0.1 blocks execution. Consider using cpu_percent() without interval for non-blocking operation.
volumes: | ||
- log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Volume mount is only added to background service, but memory_monitoring.py suggests api_server might also need access to write logs
volumes: | ||
- log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider adding size limits to the log_store volume to prevent unbounded disk usage
pid = job.process.pid | ||
if pid is not None: | ||
# Only emit memory info once per minute (60 seconds) | ||
current_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.monotonic preferable in most cases
# Regular application logger | ||
logger = setup_logger() | ||
|
||
# Set up a dedicated memory monitoring logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be sufficient information in the log line to distinguish between various indexing connector sources so we can zero in on what failed? Is there any intention to use this from sources other than indexing where we would need more metadata to distinguish those from indexing metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should already be possible via the process_name param (+ additional_metadata). I'm not sure if that's what you're referring to?
Description
Fixes https://linear.app/danswer/issue/DAN-1544/add-basic-memory-loggin
How Has This Been Tested?
Ran docker compose locally, verified I saw the logs as expected in the dir I expected + verified that on container restart the logs were persisted.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.