Skip to content

Latest commit

 

History

History
223 lines (175 loc) · 12.6 KB

Maintain_Links.md

File metadata and controls

223 lines (175 loc) · 12.6 KB

maintain_ice_links.py

This document is draft technical documentation for maintain_ice_links.py. The purpose of the script is to scan linked EDD and ICE instances for inconsistencies, then to update ICE's experiment links to reference the EDD studies that the strains are used in. From within JBEI's network, see SYNBIO-1190 for a more detailed discussion.

Remaining work before use

At the time of writing, this script is close, but not yet ready for production use. Further work on it has been deferred in favor of higher priority work, especially because as it nears completion and remaining bugs become less apparent, each round of testing takes ~3 hours to complete (largely unsupervised).

Transferring notes on remaining known work on the script into this file so they're accessible to anyone with access to the script. Recent tests indicate that at a minimum, the following should be done before using this script on the production databases:

  • Retest legacy Perl URL pattern matching legacy Perl-style EDD URLs no longer appear to be detected by the script according to a test on 8/24/17 against edd-test.jbei.org
  • Resolve REST API strain resource intent this script was initally coded to depend on /rest/strains/ and /rest/strains/studies/ API resources, which more recently are under debate, and have been temporarily disabled in EDD's initial REST API
  • Add a user prompt to confirm target URL’s_ as in more recent work on create_lines.py. There's currently no confirmation of the targeted deployment URL's captured in configuration files, and developers may change these for other purposes (e.g. for running create_lines.py). This is especially dangerous if the script has gone unused for a while, and has new maintenance-generated bugs that can potentially cause problems in production if used there by accident.
  • Confirm URL updates are correct One of the most recent commits to this branch was a tentative correction to the URLs it generates for EDD studies (see EddApi). The change was made only in the context of a single EDDApi method to correct an extra '/', but should may also result in a similar addition to maintain_ice_links so it correctly handles its parameter related to alternate_base_url, if used elsewhere in the script. See the relevant commit
  • Resolve dry run / actual run differences in detected
    • of up-to-date links changes -- appears that since earlier tests, something has changed that broke the -dry run feature and caused it to start actually updating links in ICE. Despite warnings in the code and documentation, it's dangerous to leave this feature broken, and fixing it may reveal other problems in the immature API / this script
  • Re-confirm summary statistics: with the exception of known issues listed above, current statistics generated by the script are thought to be correct, and were carefully spot checked in earlier development. Database contents have changed since that time, and the supporting EDD REST client code has also changed a lot (and much of it is only used here at present). It's worth re-confirming this stuff, particularly since initial tests of statitics revealed bugs that would otherwise have gone undetected.
  • Re-run tests against test EDD/ICE instances (see suggested example testing process below). Due to the intermediate state of development of both EDD and ICE during initial work on maintain_ice_links.py, most of the initial testing was done against edd-test and ice-test, which are more divergent than the production deployments, and therefore more useful for identifying problems that should be resolved by maintain_ice-links.py, and for testing its features (some of which are needed, but don't yet occur in the production databases).
  • Clarify / fix -update_strain_text: need to brush up memory of how EDD strains are named (e.g. if ICE strain has an alias). This feature isn't scrictly required for an initial run to add in the hundreds of missing links from ICE to EDD noted during recent tests. Can potentially be revisited later.
  • Clarify summary output re: ICE scan on first look following a successful scan, it appears that some strains were skipped. In fact, an optimization was used to avoid reprocessing strains that were used in EDD.
  • Update URL pattern matching recent updates to EDD have added slug-based URLs as the default, and the script has not yet been updated to detect those URLs. Instead, the new URLs are treated as external references, which is incorrect.

Running the Script

Options: there are many, mostly for helping to test the script in various environments or in different stages of development. It's best to just run python -m edd.rest.scripts.maintain_ice_links and read the help.

Runtime: The script's runtime is heavily dependent on the amount of data in the EDD/ICE databases, as well as the speed of connections to them. As of 10/12/16, test runs on a development laptop take about an hour each, though little effort has gone into optimizing the runtime on this script. It shouldn't have to run often, and should run mostly unsupervised, so it's probably not worth the effort to optimize.

Limitations

  1. The -update_strain_text option hasn't been fully tested at present. See EDD-XXX and ICE-XXX. Probably need some additional input on whether / how to go about this (alias?)
  2. Not optimized. First pass at this script is just to get it working, and unclear whether optimization work will be worth the additional development time / complexity.

Maintenance Concerns

The scripts -dry_run option is an important feature for speeding up the testing process for large changes to the script or related REST API's. However, it depends on wrapper classes that descend from IceApi and EddApi. If you alter the script to use different methods of those Api's, it's important to change the method overrides as well so you don't accidentally make database changes.
There's a reminder prompt when you run the script, but it's easy to get in the habit of cutting-and- pasting commands that have the -no_warn option already set to hide the prompt. Use it carefully!! his option was used heavily during initial testing of the script, but is purposefully removed from examples below.

Testing process for maintain_ice_links.py

See below for sample instructions for testing maintain_ice_links.py against local deployments of EDD and ICE. This is a general outline for the initial testing performed before running this script on the production versions of EDD and ICE for the first time. It's probably not optimized in every case, though it should give helpful hints on important steps / problems encountered during some variants of the testing process. Note that testing commands below work, but behave a bit strangely with regard to user input when piped to tee. You might want to run a few times without tee to figure out what's being asked for during the login process.

Be on the wired JBEI network

With current LBNL IT policy and EDD software, you won't be able to directly connect to postgres.jbei.org or to login on your local EDD instance unless you're connected to the wired network.

Create reference database dumps so tests are repeatable

This may seem like overkill, but it's very helpful to make results comparable across multiple runs while squashing bugs.

Dump the ICE test database:

  • Create the dump:

    pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d test_regdb -f ice_test_dump.sql

  • Replace user/database names to ice_local_test / reguser to avoid having to change local ICE config

Dump the ICE prod database:

  • Create the dump

    pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d regdb -f ice_prod_dump.sql

  • Replace database name to ice_local_test

Dump the EDD prod database:

  • Create the dump

    pg_dump -Fp -C -E UTF8 -h postgres.jbei.org -U mark.forrer -d eddprod -f edd_prod_dump.sql

  • Replace database name 'eddprod' with 'edd'

Start local EDD / ICE

If EDD/ICE are newly installed in the test environment, look below at "Set Predictable State".

  • EDD

    cd edd
    docker-compose up -d	
    
  • Start ICE

    cd ../ice
    mvn:jetty run
    

Confirm admin access to ICE / EDD

  • Log in via the web interfaces to confirm your account has admin access

    • EDD will have an 'Administration' link at top right if your account has administrator or some accellerated privileges. The script doesn't currently have fine-grained checks for this, so it may fail later (though it will be relatively early in the process)
  • ICE: update accounts set type = 'ADMIN' where email = '[email protected]';

  • EDD

    user = User.objects.get(username='mark.forrer')
    user.is_superuser = True
    user.save()
    

Set Predictable Database State

Use the included reset_docker_databases.sh script to simplify repeated database restores to a known state. You'll have to copy and edit reset_docker_database.conf-example to match your local configuration, then run the script to drop and restore both EDD and ICE databases.

Configure which target deployments are searched/modified by the script

Edit edd.rest.scripts.local_settings.py to set target deployments, for example, for local EDD/ICE instances:

LOCAL_DOCKER_EDD_URL = 'https://192.168.99.100:443'
DOCKER_CONTAINER_INTERNAL_URL = 'https://localhost:8000'
EDD_URL = LOCAL_DOCKER_EDD_URL
VERIFY_EDD_CERT = False

LOCAL_ICE_URL = 'https://localhost:8443'
ICE_TEST_URL = 'http://registry-test.jbei.org:8443'
ICE_URL = LOCAL_ICE_URL
VERIFY_ICE_CERT = False

DEFAULT_LOCALE = b'en_US.UTF-8'  # override Docker container default to work in OSX

Run the script (dry run)

Doing a dry run first helps to quickly identify configuration / software syntax errors without polluting the test databases with partial changes. If making significant changes to the script, or following significant changes to EDD / ICE, consider using the -test_edd_strain_limit option to test progressively larger numbers of EDD strains. Also consider using command line options to test a single EDD strain / ICE entry, or initially omitting the -scan_ice_entries option to focus just on the faster/most useful portion of the script that only examines strains found in EDD's database. Always save the script's output to file, since it takes around an hour for each full run on a development laptop.

You can help to test the -dry-run option by commenting out the lines that set the write_enabled property for the EDD/ICE client side API instances created just following the initial user login. The base level API code should raise Exceptions if any real mutator methods accidentally get called (e.g. because of easy-to-miss maintenance oversights)

python -m edd.rest.scripts.maintain_ice_links -username mark.forrer \
       -dry_run -scan_ice_entries -test_edd_url https://edd.jbei.org/ 2>&1 | tee 1-dry-run.txt

Run the script (actual run)

Do a full run of the script, and consider using a combination of grep / summary stastics computed by the script to identify logic errors. Also consider checking for unexpected differences in summary results from the dry run mode (which is a bit brittle) Comparisons of this type nearly always turn up bugs that would otherwise go undetected.

python -m edd.rest.scripts.maintain_ice_links -username mark.forrer -scan_ice_entries \
       -test_edd_url https://edd.jbei.org/ 2>&1 | tee 2-first-run.txt

Re-Run the script

Do a second full run of the script to make sure all of the changes attempted by the first run actually stuck. A lot of effort went into writing the consistency checks that cause the script to update ICE's links. Use them to help.

python -m edd.rest.scripts.maintain_ice_links -username mark.forrer -scan_ice_entries \
      -test_edd_url https://edd.jbei.org/ 2>&1 | tee 3-second-run.txt

Consider repeating tests on test databases

As used presently at JBEI, the EDD/ICE test database contents have diverged significantly from production as a result of testing use. The differences are a significant advantage in this case, since they exercise the script more fully by exhibiting more inconsistencies han the EDD/ICE production databases. As a result, they're better stress tests of the script. Dump files are saved to Google Drive that capture the state of these databases before the script was run to correct them.