Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctt 334 reconciliation branch #1

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
137 changes: 137 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Repo specific ignores
*_customized.conf
*_customized.sql


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Mac
.DS_Store
25 changes: 25 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
v1.0.4-20241101 JP
- Exclude resource_v4 objects from backups

v1.0.3-20241101 JP
- Backup the development Service Index database also
- Backup the development Warehouse database also

v1.0.2-20231110 JP
- Fix a bunch of sbin/* file problems

v1.0.1-20231110 JP
- Enable backup crontab

v1.0.0-20231110 JP
- Fork of https://github.com/access-ci-org/Operations_Router_Management-Tools
- Rename since these tools are for managing the warehouse and not routers per-se
- Primarily cleaning up the wareouse backup tool
- TODO: review/convert bin/es_reload.py

tag-0.8-20241101 JP
- Backup the development Service Index database also

tag-0.7-20241023 JP
- Add model serviceindex.misc_urls to backup script

tag-0.6-20220301 JP
- Implement sbin/database_backup*

Expand Down
5 changes: 1 addition & 4 deletions README
Original file line number Diff line number Diff line change
@@ -1,4 +1 @@
Scripts to manage AMIE queues and users have been removed from this package.
In ACCESS-CI, AMIE no longer uses rabbitmq queues.
The scripts can still be found in the original XSEDE repository at
https://github.com/XSEDE/Discovery_Management-Tools
Tools for managing the Information Sharing Platform Warehouse
4 changes: 2 additions & 2 deletions conf/es_reload.conf.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"DEBUG": true,
"ELASTIC_HOSTS": ["localhost:9200"],
"LOG_LEVEL": "info",
"LOG_FILE": "/soft/warehouse-apps-1.0/Management-Tools/var/es_reload.log",
"PID_FILE": "/soft/warehouse-apps-1.0/Management-Tools/var/es_reload.pid"
"LOG_FILE": "/soft/applications-2.0/warehouse_management/var/es_reload.log",
"PID_FILE": "/soft/applications-2.0/warehouse_management/var/es_reload.pid"
}
12 changes: 0 additions & 12 deletions sbin/create_amie_source_user.sh

This file was deleted.

2 changes: 1 addition & 1 deletion sbin/database_backup.crontab
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0 21 * * * /bin/bash -l -c /soft/warehouse-apps-1.0/Management-Tools/PROD/sbin/database_backup.sh
0 21 * * * /bin/bash -l -c %APP_HOME%/sbin/database_backup.sh
100 changes: 77 additions & 23 deletions sbin/database_backup.sh
Original file line number Diff line number Diff line change
@@ -1,38 +1,92 @@
#!/bin/bash
MY_BASE=/soft/warehouse-apps-1.0/Management-Tools

PYTHON=python3
PYTHON_BASE=/soft/python/python-3.8.11-base
###
# Run %APP_NAME%: Database backup
###

APP_NAME=database_backup
APP_HOME=%APP_HOME%

DBNAME1=warehouse2
DBHOST1=opsdb-dev.cluster-clabf5kcvwmz.us-east-2.rds.amazonaws.com
DBUSER1=info_django

DBNAME2=warehouse3
DBHOST2=opsdb-dev.cluster-clabf5kcvwmz.us-east-2.rds.amazonaws.com
DBUSER2=info_django

S3DIR=s3://backup.operations.access-ci.org/operations-api.access-ci.org/rds.backup/

# Override in shell environment
if [ -z "$PYTHON_BASE" ]; then
PYTHON_BASE=%PYTHON_BASE%
fi

####### Everything else should be standard #######

PYTHON_BIN=python3
export LD_LIBRARY_PATH=${PYTHON_BASE}/lib
PYTHON_ROOT=/soft/awscli/python
source ${PYTHON_ROOT}/bin/activate
source ${APP_HOME}/python/bin/activate

BACKUP_DIR=${MY_BASE}/backups/
BACKUP_DIR=${APP_HOME}/backups/
[ ! -d ${BACKUP_DIR} ] && mkdir ${BACKUP_DIR}

exec 1>> ${BACKUP_DIR}/database_backup.log
exec 1>> ${BACKUP_DIR}/${APP_NAME}.log
echo Starting at `date`

DBHOST=information-warehouse-prod-cluster.cluster-clabf5kcvwmz.us-east-2.rds.amazonaws.com
DATE=`date +'%s'`

/usr/pgsql-13/bin/pg_dump -h ${DBHOST} -U django_owner -n django -d warehouse \
>${BACKUP_DIR}/django.dump.${DATE}
DUMPNAME=django.${DBNAME1}.dump.${DATE}
pg_dump -h ${DBHOST1} -U ${DBUSER1} -n public -d ${DBNAME1} \
--exclude-table=public.resource_v4_resourcev4 \
--exclude-table=public.resource_v4_resourcev4local \
--exclude-table=public.resource_v4_resourcev4relation \
>${BACKUP_DIR}/${DUMPNAME}
gzip -9 ${BACKUP_DIR}/${DUMPNAME}
aws s3 cp ${BACKUP_DIR}/${DUMPNAME}.gz ${S3DIR} --only-show-errors --profile newbackup

/usr/pgsql-13/bin/pg_dump -h ${DBHOST} -U xcsr_owner -n xcsr -d warehouse \
>${BACKUP_DIR}/xcsr.dump.${DATE}
# Minimum backup without history for development environments
MINDUMPNAME=django.${DBNAME1}.mindump.${DATE}
pg_dump -h ${DBHOST1} -U ${DBUSER1} -n public -d ${DBNAME1} \
--exclude-table=public.resource_v4_resourcev4 \
--exclude-table=public.resource_v4_resourcev4local \
--exclude-table=public.resource_v4_resourcev4relation \
--exclude-table=public.glue2_entityhistory \
--exclude-table=public.warehouse_state_processingerror \
>${BACKUP_DIR}/${MINDUMPNAME}
gzip -9 ${BACKUP_DIR}/${MINDUMPNAME}
aws s3 cp ${BACKUP_DIR}/${MINDUMPNAME}.gz ${S3DIR} --only-show-errors --profile newbackup

/usr/pgsql-13/bin/pg_dump -h ${DBHOST} -U glue2_owner -n glue2 -d warehouse --exclude-table-data=glue2_db_entityhistory \
>${BACKUP_DIR}/glue2.dump.${DATE}
###

#zip all dumps to save disk
gzip -9 ${BACKUP_DIR}/django.dump.${DATE}
gzip -9 ${BACKUP_DIR}/xcsr.dump.${DATE}
gzip -9 ${BACKUP_DIR}/glue2.dump.${DATE}
DUMPNAME=django.${DBNAME2}.dump.${DATE}
pg_dump -h ${DBHOST2} -U ${DBUSER2} -n info -n info_django -d ${DBNAME2} \
--exclude-table=info.resource_v4_resourcev4 \
--exclude-table=info.resource_v4_resourcev4local \
--exclude-table=info.resource_v4_resourcev4relation \
--exclude-table=info_django.resource_v4_resourcev4 \
--exclude-table=info_django.resource_v4_resourcev4local \
--exclude-table=info_django.resource_v4_resourcev4relation \
>${BACKUP_DIR}/${DUMPNAME}
gzip -9 ${BACKUP_DIR}/${DUMPNAME}
aws s3 cp ${BACKUP_DIR}/${DUMPNAME}.gz ${S3DIR} --only-show-errors --profile newbackup

aws s3 cp ${BACKUP_DIR}/django.dump.${DATE}.gz s3://xci.xsede.org/info.xsede.org/rds.backup/ --only-show-errors --profile newbackup
aws s3 cp ${BACKUP_DIR}/xcsr.dump.${DATE}.gz s3://xci.xsede.org/info.xsede.org/rds.backup/ --only-show-errors --profile newbackup
aws s3 cp ${BACKUP_DIR}/glue2.dump.${DATE}.gz s3://xci.xsede.org/info.xsede.org/rds.backup/ --only-show-errors --profile newbackup
# Minimum backup without history for development environments
MINDUMPNAME=django.${DBNAME2}.mindump.${DATE}
pg_dump -h ${DBHOST2} -U ${DBUSER2} -n info -n info_django -d ${DBNAME2} \
--exclude-table=info.resource_v4_resourcev4 \
--exclude-table=info.resource_v4_resourcev4local \
--exclude-table=info.resource_v4_resourcev4relation \
--exclude-table=info.glue2_entityhistory \
--exclude-table=info.warehouse_state_processingerror \
--exclude-table=info_django.resource_v4_resourcev4 \
--exclude-table=info_django.resource_v4_resourcev4local \
--exclude-table=info_django.resource_v4_resourcev4relation \
--exclude-table=info_django.glue2_entityhistory \
--exclude-table=info_django.warehouse_state_processingerror \
>${BACKUP_DIR}/${MINDUMPNAME}
gzip -9 ${BACKUP_DIR}/${MINDUMPNAME}
aws s3 cp ${BACKUP_DIR}/${MINDUMPNAME}.gz ${S3DIR} --only-show-errors --profile newbackup

#aws s3 ls s3://xci.xsede.org/info.xsede.org/rds.backup/\*.${DATE} --profile newbackup

Expand All @@ -41,14 +95,14 @@ find ${BACKUP_DIR} -mtime +2 -name \*dump\* -exec rm {} \;

# Delete s3 files older than seven days
let maxage=60*60*24*7
aws s3 ls s3://xci.xsede.org/info.xsede.org/rds.backup/ --profile newbackup | awk '{print $4}' | while read filename
aws s3 ls ${S3DIR} --profile newbackup | awk '{print $4}' | while read filename
do
echo "${filename}"
fileepoch="$(cut -d'.' -f3 <<<"${filename}")"
if [ -n "${fileepoch}" ] && [ "${fileepoch}" -eq "${fileepoch}" ] 2>/dev/null; then
let fileage=${DATE}-${fileepoch}
if [ "${fileage}" -gt "${maxage}" ]; then
aws s3 rm s3://xci.xsede.org/info.xsede.org/rds.backup/${filename} --profile newbackup
aws s3 rm ${S3DIR}/${filename} --profile newbackup
fi
fi
done
34 changes: 24 additions & 10 deletions sbin/es_reload.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,32 @@
#!/bin/bash

### RePublish Tool
MY_BASE=/soft/warehouse-apps-1.0/Management-Tools
###
# Run %APP_NAME%: OpenSearch reload
###

PYTHON=python3
PYTHON_BASE=/soft/python/python-3.7.7-base
PYTHON_ROOT=/soft/warehouse-apps-1.0/Management-Tools/python
source ${PYTHON_ROOT}/bin/activate
APP_NAME=es_reload
APP_HOME=%APP_HOME%

export PYTHONPATH=$DAEMON_DIR/lib:/soft/warehouse-1.0/PROD/django_xsede_warehouse
export DJANGO_CONF=/soft/warehouse-apps-1.0/Management-Tools/conf/django_xsede_warehouse.conf
export DJANGO_SETTINGS_MODULE=xsede_warehouse.settings
# Override in shell environment
if [ -z "$PYTHON_BASE" ]; then
PYTHON_BASE=%PYTHON_BASE%
fi

$MY_BASE/PROD/bin/es_reload.py -c $MY_BASE/conf/es_reload.conf "$@"
####### Everything else should be standard #######
APP_SOURCE=${APP_HOME}/PROD
APP_BIN=${APP_SOURCE}/bin/${APP_NAME}.py
APP_OPTS="-c ${APP_HOME}/conf/${APP_NAME}.conf"

PYTHON_BIN=python3
export LD_LIBRARY_PATH=${PYTHON_BASE}/lib
source ${APP_HOME}/python/bin/activate

export PYTHONPATH=${APP_SOURCE}/lib:${WAREHOUSE_DJANGO}
export APP_CONFIG=${APP_HOME}/conf/django_prod_router.conf
export DJANGO_SETTINGS_MODULE=Operations_Warehouse_Django.settings

echo "Starting: ${PYTHON_BIN} ${APP_BIN} $@ ${APP_OPTS}"
${PYTHON_BIN} ${APP_BIN} $@ ${APP_OPTS}
RETVAL=$?
echo rc=$RETVAL
exit $RETVAL
Loading