Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failed jobs working directory cleanup as a celery periodic task #19594

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions doc/source/admin/galaxy_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5747,4 +5747,41 @@
:Type: int


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``enable_failed_jobs_working_directory_cleanup``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:Description:
Enables the cleanup of failed Galaxy job's working directories.
Runs in a Celery task.
:Default: ``false``
:Type: bool


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``failed_jobs_working_directory_cleanup_days``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:Description:
The number of days to keep failed Galaxy job's working directories
before attempting to delete them if
enable_failed_jobs_working_directory_cleanup is ``true``. Runs in
a Celery task.
:Default: ``5``
:Type: int


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``failed_jobs_working_directory_cleanup_interval``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:Description:
The interval in seconds between attempts to delete all failed
Galaxy job's working directories from the filesystem (every 24
hours by default) if enable_failed_jobs_working_directory_cleanup
is ``true``. Runs in a Celery task.
:Default: ``86400``
:Type: int



3 changes: 3 additions & 0 deletions lib/galaxy/celery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ def schedule_task(task, interval):
if config.object_store_cache_monitor_driver in ["auto", "celery"]:
schedule_task("clean_object_store_caches", config.object_store_cache_monitor_interval)

if config.enable_failed_jobs_working_directory_cleanup:
schedule_task("cleanup_jwds", config.failed_jobs_working_directory_cleanup_interval)

if beat_schedule:
celery_app.conf.beat_schedule = beat_schedule

Expand Down
3 changes: 2 additions & 1 deletion lib/galaxy/celery/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ def dispatch_pending_notifications(notification_manager: NotificationManager):


@galaxy_task(action="clean up job working directories")
def cleanup_jwds(sa_session: galaxy_scoped_session, object_store: BaseObjectStore, days: int = 5):
def cleanup_jwds(sa_session: galaxy_scoped_session, object_store: BaseObjectStore, config: GalaxyAppConfiguration):
"""Cleanup job working directories for failed jobs that are older than X days"""

def get_failed_jobs():
Expand All @@ -530,6 +530,7 @@ def delete_jwd(job):
log.error(f"Error deleting job working directory: {path} : {e.strerror}")

failed_jobs = get_failed_jobs()
days = config.failed_jobs_working_directory_cleanup_days

if not failed_jobs:
log.info("No failed jobs found within the last %s days", days)
Expand Down
28 changes: 22 additions & 6 deletions lib/galaxy/config/sample/galaxy.yml.sample
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# Galaxy is configured by default to be usable in a single-user development
# environment. To tune the application for a multi-user production
# environment, see the documentation at:
#
#
# https://docs.galaxyproject.org/en/master/admin/production.html
#
#
# Throughout this sample configuration file, except where stated otherwise,
# uncommented values override the default if left unset, whereas commented
# values are set to the default value. Relative paths are relative to the root
# Galaxy directory.
#
#
# Examples of many of these options are explained in more detail in the Galaxy
# Community Hub.
#
#
# https://galaxyproject.org/admin/config
#
#
# Config hackers are encouraged to check there before asking for help.
#
#
# Configuration for Gravity process manager.
# ``uwsgi:`` section will be ignored if Galaxy is started via Gravity commands (e.g ``./run.sh``, ``galaxy`` or ``galaxyctl``).
gravity:
Expand Down Expand Up @@ -3067,3 +3067,19 @@ galaxy:
# affects s3fs file sources.
#file_source_listings_expiry_time: 60

# Enables the cleanup of failed Galaxy job's working directories. Runs
# in a Celery task.
#enable_failed_jobs_working_directory_cleanup: false

# The number of days to keep failed Galaxy job's working directories
# before attempting to delete them if
# enable_failed_jobs_working_directory_cleanup is ``true``. Runs in a
# Celery task.
#failed_jobs_working_directory_cleanup_days: 5

# The interval in seconds between attempts to delete all failed Galaxy
# job's working directories from the filesystem (every 24 hours by
# default) if enable_failed_jobs_working_directory_cleanup is
# ``true``. Runs in a Celery task.
#failed_jobs_working_directory_cleanup_interval: 86400

21 changes: 21 additions & 0 deletions lib/galaxy/config/schemas/config_schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4240,3 +4240,24 @@ mapping:
Number of seconds before file source content listings are refreshed. Shorter times will result in more
queries while browsing a file sources. Longer times will result in fewer requests to file sources but
outdated contents might be displayed to the user. Currently only affects s3fs file sources.

enable_failed_jobs_working_directory_cleanup:
type: bool
default: false
required: false
desc: |
Enables the cleanup of failed Galaxy job's working directories. Runs in a Celery task.

failed_jobs_working_directory_cleanup_days:
type: int
required: false
default: 5
desc: |
The number of days to keep failed Galaxy job's working directories before attempting to delete them if enable_failed_jobs_working_directory_cleanup is ``true``. Runs in a Celery task.

failed_jobs_working_directory_cleanup_interval:
type: int
required: false
default: 86400
desc: |
The interval in seconds between attempts to delete all failed Galaxy job's working directories from the filesystem (every 24 hours by default) if enable_failed_jobs_working_directory_cleanup is ``true``. Runs in a Celery task.
Loading