- Design documentation
- Setup - Docker (recommended)
- Setup - local environment (advanced)
- Testing the API (i.e. submitting tasks)
- Creating database migrations
- Coding Conventions
The design documentation file has some detail as to how the API server hands off tasks to compute backends and performs file marshalling.
Although some config defaults are provided in config.py
, we do not recommend editing this file. Instead, to override the FOO
configuration parameters simply set APP_FOO
as an environment variable and Flask will automatically pick it up.
The Batch backend will create a pool per TES Task submitted against the API and then run one task on the pool for each executor supplied in the TES Task.
When debugging, fix your requests to a single batch pool by setting the DEBUG_HARDCODED_BATCH_POOL_ID
parameter to your Batch pool ID. If this variable is not set, a new pool will be created for each TES task (preferred in production for data isolation)
Pools when Flask uses the dev environment will have a SSH user added automatically.
AKS backend implementation to come at a later date.
A Docker Compose file is available to quickly get setup with the API server. If you do not have Docker, install it now.
Docker will map your local directory into the container so that local changes are automatically reflected (and reload) in the Flask application running inside the container.
To run the Flask app and all dependencies:
docker-compose up -d
The API is now accessible locally at http://localhost:5000, but additional configuration (below) is required before you submit a task.
The database is run locally as a PostgreSQL container. It uses a persistent volume, so the database only needs to be created on the initial run (or after destroying and re-creating the postgres
container). To seed the database, run:
docker-compose run --rm app flask create-db
docker-compose run --rm app flask db stamp head
The web portal will be missing styling until local assets are generated. You can do so at any time by running:
yarn install --modules-folder ./tesazure/static/node_modules
docker-compose run --rm app flask assets build
If you want to change configuration (such as mapped ports, or Flask app configuration) you can do so by creating a docker compose override file, docker-compose.override.yml
, with the following contents:
version: "3.7"
services:
app:
environment:
- FLASK_APP=serve.py
- FLASK_ENV=development
- APP_COMPUTE_BACKEND=batch
...
- PYTHONUNBUFFERED=1 # https://github.com/pallets/flask/issues/1420
Docker Compose will automatically read in the overrides. This avoids accidental check-in of secrets into Git.
We recommend Visual Studio Code's remote debugging feature (more details here).
You will need to setup a configuration inside .vscode/launch.json
to tell Code how to connect to the debugger running inside the container. Below is a working configuration for debugging.
{
"name": "Docker + Flask (Remote Debug)",
"type": "python",
"request": "attach",
"port": 5050,
"host": "localhost",
"pathMappings": [
{"localRoot": "${workspaceFolder}", "remoteRoot": "/var/www/tes-azure"}
],
"debugOptions": [
"RedirectOutput"
]
}
Visual Studio Code will not start the containers for you, so make sure to run through the above setup steps first.
If you make a large change to the application (such as adding packages to requirements.txt
for example), you will need to rebuild and restart the containers like this:
docker-compose build
docker-compose up -d
We highly recommend using virtual environments with pipenv. Once installed, setup your local environment with a simple:
pipenv install -d
You will need to bring your own postgresql server and configure the SQLALCHEMY_DATABASE_URI
configuration variable accordingly. Edit Copy api-server.env.sample
to api-server.env
and edit it:
FLASK_APP=serve.py
...
APP_SQLALCHEMY_DATABASE_URI=...
Source those environment variables into your current shell:
set -o allexport
. api-server.env
set +o allexport
Seed the database and build assets directly:
pipenv run flask create-db
pipenv run flask db stamp head
yarn install --modules-folder ./tesazure/static/node_modules
pipenv run flask assets build
Source your environment variables (see above) then kick it off with:
pipenv run flask run
The API is now available at http://localhost:5000.
Add the following to your JSON file:
{
"name": "Flask (Remote Debug)",
"type": "python",
"request": "attach",
"port": 5050,
"host": "localhost",
"debugOptions": [
"RedirectOutput"
]
}
You will then need to start Flask with the ptvsd
debugger instead:
pipenv run python -m ptvsd --host 0.0.0.0 --port 5050 -m flask run --host 0.0.0.0 --port 5000 --no-debugger --no-reload`
To test locally, install dependencies into your virtual environment:
pipenv install -d
Running tox will build a virtual environment to run tests and check code style.
Tox is currently configured to test against the versions of Python available on Azure DevOps CI/CD build agents. If you have a different version installed locally, simply edit tox.ini:
envlist = py37,pep8
...
basepython = python3.7
...
If you add a dependency, remember to refresh the Tox environment with pipenv run tox -r
Run pipenv run pytest
to start unit tests. A specific test file's path can be specified to only run that test.
If you are debugging and want to prevent output capture (i.e. to permit print()
calls in unit tests, use the -s
argument).
Use Postman to submit requests against the API, such as for POST http://localhost/v1/tasks to create a task. You will find the JSON body for sample/reference TES tasks in the resources folder.
See also the task-execution-schemas swagger documentation for more details on API endpoints and expected parameters.
Database migrations can be auto-generated using Flask-Migrate.
Check the current migration:
pipenv run flask db current
After updating the database models, re-create the tables and auto-generate a new migration:
pipenv run flask db migrate -m "add foo table"
Note that the message gets serialized into the migration filename (e.g. somehash_add_foo_table.py
) so keep it descriptive of the changes but brief.
Upgrade an old DB to the current migration revision:
pipenv run flask db upgrade
Tell Flask that the current DB is fully upgraded:
pipenv run flask db stamp
-
Imports should always be done at the top of a file. Group imports into the following three groups, alphabetically sorted:
- Standard library imports
- Third-party imports
- Application-specific imports
Within each group, all
import foo
lines should come first followed by allfrom foo import bar
.For example:
import json import uuid from datetime import datetime, timedelta import azure.batch.batch_service_client as batch import azure.batch.batch_auth as batchauth import azure.batch.models as batchmodels from azure.storage.blob import BlockBlobService, BlobPermissions from flask import current_app from .. import common as backend_common from ... import models as tesmodels
-
Prefer absolute references when reasonable (for example,
datetime.timedelta
instead oftimedelta
). -
Import top-level Azure SDK modules and reference relatively from there.
Many pieces of the code uses the Azure SDKs, which can have naming conflicts (e.g. tes-azure has its own
models
module, as do many of the Azure sub-modules likeazure.batch.models
). To avoid naming conflicts, we recommend importing only top-level Azure SDK packages:import azure.batch as azbatch import azure.batch.batch_auth as azbatch_auth import azure.storage as azstorage
This way
models
, unless otherwise noted, always refers to the TES application models. Azure SDK models can be referenced easily via e.g.azbatch.models
.
With the above guidance, the first example can be cleaned up significantly:
import datetime
import uuid
import azure.batch as azbatch
import azure.batch.batch_auth as azbatch_auth
import azure.storage.blob as azblob
from flask import current_app
from .. import common as backend_common
from ... import models as tesmodels
and it becomes very clear which models are in use:
# Default state inheritance, with 'active' as QUEUED unless we get more detailed into from tasks
state_map = {
azbatch.models.JobState.active: tesmodels.TaskStatus.QUEUED,
azbatch.models.JobState.completed: tesmodels.TaskStatus.COMPLETE,
azbatch.models.JobState.deleting: tesmodels.TaskStatus.CANCELED,
azbatch.models.JobState.disabled: tesmodels.TaskStatus.PAUSED,
azbatch.models.JobState.disabling: tesmodels.TaskStatus.PAUSED,
azbatch.models.JobState.enabling: tesmodels.TaskStatus.PAUSED,
azbatch.models.JobState.terminating: tesmodels.TaskStatus.CANCELED,
}
tes_task.state = state_map.get(batch_job.state, tesmodels.TaskStatus.UNKNOWN)
Application that needs Flask app context should request the app
fixture and monkey-patching mocks should be done via the mocker
fixture:
from tesazure.extensions import compute_backend
class TestCase:
def test_mything_scenariodescriptor(self, app, mocker):
mocked_batch_client = mocker.patch('azure.batch.batch_service_client.BatchServiceClient')
# this call to the flask extension needs app context
compute_backend.backend.foo()
Never use the mock built-in methods to create assertions, Yelp did a great write-up on why in their post assert_called_once: Threat or Menace. Essentially, mock methods will happily carry on if you make a typo or if the API changes in the future.
To ensure correctness, use assert()
checks on mock properties like mock.call_args
, mock.call_args_list
, mock.mock_calls
, and mock.call_count
.
Note that mock.mock_calls
and mock.call_args
return a mock._Call
object which is a wrapped tuple object. The correct way to interact with it is call[0]
which returns a tuple of the value args passed (i.e. suitable for use with *args
) and call[1]
returns the keyword args passed (i.e. suitable for use with **kwargs
)
For example:
mocked_object = mocker.patch('RestThing.client.ThingClient')
args, kwargs = mocked_object.do_something.call_args
assert(isinstance(args[0], MyClass))
If you need to mock configuration, do not attempt to edit app.config directly. This will fail for extensions like the backend, which initialize only at app creation time. Instead, mutate the configuration via pytest decorator:
@pytest.mark.options(CONFIG_VAR='value')
When a mocked class is instantiated (common in the Azure SDKs), be sure to use mock chaining (mocked_client.return_value
) to ensure you retrieve the Mock
instance created as a result of the instantiation. For example the BatchServiceClient
might be mocked, but running client = BatchServiceClient()
results in a new mock (that mocked_client.return_value
returns).
If you do not do mock chaining per above, call tracking behavior will fail since e.g. BatchServiceClient().job.add
is actually being called on the new mock from instantiation, not mocked_client
.
Here's a concrete example of what won't work:
def test_initialize_pool(self, app, mocker):
mocked_batch_client = mocker.patch('azure.batch.batch_service_client.BatchServiceClient')
compute_backend.backend._initializePool()
# fails, since we are operating on the mocked client class, not the Mock instance consumed by the target code
assert(mocked_batch_client.pool.add.call_count == 1)
This example does work:
def test_initialize_pool(self, app, mocker):
mocked_batch_client = mocker.patch('azure.batch.batch_service_client.BatchServiceClient')
compute_backend.backend._initializePool()
# this retrieves the new Mock instance returned by BatchServiceClient() in the target code, so it succeeds
assert(mocked_batch_client.return_value.pool.add.call_count == 1)