Check specific (or all) buckets for files >options.max_age old. To be run from cron.
Use case: you have staging buckets where files are pushed to. Some sort of data processing job comes along to consume them (and moves, or removes the files). This is a great way to monitor if any files are left behind.
Check a bucket named 'my-files' for objects older than 48 hours:
python check-s3-age.py -b my-files -a 48 -d
Check all buckets for objects older than 1 year:
python check-s3-age.py -a 8760
Check all buckets for objects older than 24 hours, if the object name starts with foo:
Note: the object name prefix does not include the bucket name.
python check-s3-age.py -p foo -d
Everything goes to stdout. This is meant to be run from cron.
If the --deep (-d) option is used, the script will aggregate results per directory.
Given these objects in a bucket: a, a/dir/1, a/dir/2, b/zot, b/bzzt, foo/bar/baz/123 This is the output, if they are all older than max-age:
There are 6 files more than 24 hours old in s3://foo.
The breakdown:
{ '.': 1,
'a/dir': 2,
'b': 2,
'foo/bar/baz': 1
}
If you organize things into a few structured "directories," this will work quite well. If --deep is not used, the script will simply report a total number (unless you're crazy enough to use --list; then it'll list all objects in s3).
- config: .boto
- non-standard packages: python-dateutil, boto