This document is draft technical documentation for maintain_ice_links.py
. The purpose of the script
is to scan linked EDD and ICE instances for inconsistencies, then to update ICE's experiment
links to reference the EDD studies that the strains are used in. From within JBEI's network,
see SYNBIO-1190 for a more detailed discussion.
At the time of writing, this script is close, but not yet ready for production use. Further work on it has been deferred in favor of higher priority work, especially because as it nears completion and remaining bugs become less apparent, each round of testing takes ~3 hours to complete (largely unsupervised).
Transferring notes on remaining known work on the script into this file so they're accessible to anyone with access to the script. Recent tests indicate that at a minimum, the following should be done before using this script on the production databases:
- Retest legacy Perl URL pattern matching legacy Perl-style EDD URLs no longer appear to be detected by the script according to a test on 8/24/17 against edd-test.jbei.org
- Resolve REST API strain resource intent this script was initally coded to depend on
/rest/strains/
and/rest/strains/studies/
API resources, which more recently are under debate, and have been temporarily disabled in EDD's initial REST API - Add a user prompt to confirm target URL’s_ as in more recent work on
create_lines.py
. There's currently no confirmation of the targeted deployment URL's captured in configuration files, and developers may change these for other purposes (e.g. for runningcreate_lines.py
). This is especially dangerous if the script has gone unused for a while, and has new maintenance-generated bugs that can potentially cause problems in production if used there by accident. - Confirm URL updates are correct
One of the most recent commits to this branch was a tentative correction to the URLs it generates
for EDD studies (see EddApi). The change was made only in the context of a single EDDApi method
to correct an extra '/', but should may also result in a similar addition to
maintain_ice_links
so it correctly handles its parameter related toalternate_base_url
, if used elsewhere in the script. See the relevant commit - Resolve dry run / actual run differences in detected
- of up-to-date links changes -- appears that since earlier tests, something has changed that
broke the
-dry run
feature and caused it to start actually updating links in ICE. Despite warnings in the code and documentation, it's dangerous to leave this feature broken, and fixing it may reveal other problems in the immature API / this script
- of up-to-date links changes -- appears that since earlier tests, something has changed that
broke the
- Re-confirm summary statistics: with the exception of known issues listed above, current statistics generated by the script are thought to be correct, and were carefully spot checked in earlier development. Database contents have changed since that time, and the supporting EDD REST client code has also changed a lot (and much of it is only used here at present). It's worth re-confirming this stuff, particularly since initial tests of statitics revealed bugs that would otherwise have gone undetected.
- Re-run tests against test EDD/ICE instances (see suggested example testing process below).
Due to the intermediate state of development of both EDD and ICE during initial work on
maintain_ice_links.py
, most of the initial testing was done againstedd-test
andice-test
, which are more divergent than the production deployments, and therefore more useful for identifying problems that should be resolved bymaintain_ice-links.py
, and for testing its features (some of which are needed, but don't yet occur in the production databases). - Clarify / fix
-update_strain_text
: need to brush up memory of how EDD strains are named (e.g. if ICE strain has an alias). This feature isn't scrictly required for an initial run to add in the hundreds of missing links from ICE to EDD noted during recent tests. Can potentially be revisited later. - Clarify summary output re: ICE scan on first look following a successful scan, it appears that some strains were skipped. In fact, an optimization was used to avoid reprocessing strains that were used in EDD.
- Update URL pattern matching recent updates to EDD have added slug-based URLs as the default, and the script has not yet been updated to detect those URLs. Instead, the new URLs are treated as external references, which is incorrect.
Options: there are many, mostly for helping to test the script in various environments or in
different stages of development. It's best to just run
python -m edd.rest.scripts.maintain_ice_links
and read the help.
Runtime: The script's runtime is heavily dependent on the amount of data in the EDD/ICE databases, as well as the speed of connections to them. As of 10/12/16, test runs on a development laptop take about an hour each, though little effort has gone into optimizing the runtime on this script. It shouldn't have to run often, and should run mostly unsupervised, so it's probably not worth the effort to optimize.
- The
-update_strain_text
option hasn't been fully tested at present. See EDD-XXX and ICE-XXX. Probably need some additional input on whether / how to go about this (alias?) - Not optimized. First pass at this script is just to get it working, and unclear whether optimization work will be worth the additional development time / complexity.
The scripts -dry_run
option is an important feature for speeding up the testing process for large
changes to the script or related REST API's. However, it depends on wrapper classes that descend
from IceApi and EddApi. If you alter the script to use different methods of those Api's, it's
important to change the method overrides as well so you don't accidentally make database changes.
There's a reminder prompt when you run the script, but it's easy to get in the habit of cutting-and-
pasting commands that have the -no_warn
option already set to hide the prompt. Use it carefully!!
his option was used heavily during initial testing of the script, but is purposefully removed from
examples below.
See below for sample instructions for testing maintain_ice_links.py against local
deployments of EDD and ICE. This is a general outline for the initial testing performed
before running this script on the production versions of EDD and ICE for the first time. It's
probably not optimized in every case, though it should give helpful hints on important steps /
problems encountered during some variants of the testing process. Note that testing commands below
work, but behave a bit strangely with regard to user input when piped to tee
. You might want to
run a few times without tee
to figure out what's being asked for during the login process.
With current LBNL IT policy and EDD software, you won't be able to directly connect to postgres.jbei.org or to login on your local EDD instance unless you're connected to the wired network.
This may seem like overkill, but it's very helpful to make results comparable across multiple runs while squashing bugs.
-
Create the dump:
pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d test_regdb -f ice_test_dump.sql
-
Replace user/database names to
ice_local_test
/reguser
to avoid having to change local ICE config
-
Create the dump
pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d regdb -f ice_prod_dump.sql
-
Replace database name to
ice_local_test
-
Create the dump
pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d eddprod -f edd_prod_dump.sql
-
Replace database name 'eddprod' with 'edd'
If EDD/ICE are newly installed in the test environment, look below at "Set Predictable State".
-
EDD
cd edd docker-compose up -d
-
Start ICE
cd ../ice mvn:jetty run
-
Log in via the web interfaces to confirm your account has admin access
- EDD will have an 'Administration' link at top right if your account has administrator or some accellerated privileges. The script doesn't currently have fine-grained checks for this, so it may fail later (though it will be relatively early in the process)
-
ICE:
update accounts set type = 'ADMIN' where email = '[email protected]';
-
EDD
user = User.objects.get(username='mark.forrer') user.is_superuser = True user.save()
Use the included reset_docker_databases.sh script to simplify repeated database restores to a known state. You'll have to copy and edit reset_docker_database.conf-example to match your local configuration, then run the script to drop and restore both EDD and ICE databases.
Edit edd.rest.scripts.local_settings.py to set target deployments, for example, for local EDD/ICE instances:
LOCAL_DOCKER_EDD_URL = 'https://192.168.99.100:443'
DOCKER_CONTAINER_INTERNAL_URL = 'https://localhost:8000'
EDD_URL = LOCAL_DOCKER_EDD_URL
VERIFY_EDD_CERT = False
LOCAL_ICE_URL = 'https://localhost:8443'
ICE_TEST_URL = 'http://registry-test.jbei.org:8443'
ICE_URL = LOCAL_ICE_URL
VERIFY_ICE_CERT = False
DEFAULT_LOCALE = b'en_US.UTF-8' # override Docker container default to work in OSX
Doing a dry run first helps to quickly identify configuration / software syntax errors without
polluting the test databases with partial changes. If making significant changes to the script, or
following significant changes to EDD / ICE, consider using the -test_edd_strain_limit
option to
test progressively larger numbers of EDD strains. Also consider using command line options to test a
single EDD strain / ICE entry, or initially omitting the -scan_ice_entries
option to focus just on
the faster/most useful portion of the script that only examines strains found in EDD's database.
Always save the script's output to file, since it takes around an hour for each full run on a
development laptop.
You can help to test the -dry-run
option by commenting out the lines that set the write_enabled
property for the EDD/ICE client side API instances created just following the initial user login.
The base level API code should raise Exceptions if any real mutator methods accidentally get called
(e.g. because of easy-to-miss maintenance oversights)
python -m edd.rest.scripts.maintain_ice_links -username mark.forrer \
-dry_run -scan_ice_entries -test_edd_url https://edd.jbei.org/ 2>&1 | tee 1-dry-run.txt
Do a full run of the script, and consider using a combination of grep / summary stastics computed by the script to identify logic errors. Also consider checking for unexpected differences in summary results from the dry run mode (which is a bit brittle) Comparisons of this type nearly always turn up bugs that would otherwise go undetected.
python -m edd.rest.scripts.maintain_ice_links -username mark.forrer -scan_ice_entries \
-test_edd_url https://edd.jbei.org/ 2>&1 | tee 2-first-run.txt
Do a second full run of the script to make sure all of the changes attempted by the first run actually stuck. A lot of effort went into writing the consistency checks that cause the script to update ICE's links. Use them to help.
python -m edd.rest.scripts.maintain_ice_links -username mark.forrer -scan_ice_entries \
-test_edd_url https://edd.jbei.org/ 2>&1 | tee 3-second-run.txt
As used presently at JBEI, the EDD/ICE test database contents have diverged significantly from production as a result of testing use. The differences are a significant advantage in this case, since they exercise the script more fully by exhibiting more inconsistencies han the EDD/ICE production databases. As a result, they're better stress tests of the script. Dump files are saved to Google Drive that capture the state of these databases before the script was run to correct them.