Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 #781

Closed
wants to merge 16 commits into from
Closed

S3 #781

wants to merge 16 commits into from

Conversation

fritzdavenport
Copy link
Contributor

@fritzdavenport fritzdavenport commented Aug 3, 2020

Status

Ready for review

Description of Changes

Adds an S3-compatible API container using Minio

Fixes #171 .

Changes proposed in this pull request:

  • Adds minio to docker-compose
  • Updates app to upload to minio
  • switches docker-compose to use host network (needed to access minio from both front and backend, http://localhost:9000/bla vs http://minio/bla)
  • switches test_functional.py to access site via docker-compose rather than a pytest flask, uses docker-compose database rather than a pytest database.

Notes for Deployment

Shouldn't affect prod. Should be QA'd in staging to ensure that the normal S3 upload and viewing is unaffected.

Screenshots (if appropriate)

Screenshot from 2020-08-03 17-37-30

Tests and linting

  • I have rebased my changes on current develop

  • pytests pass in the development environment on my local machine

  • flake8 checks pass

@@ -18,11 +18,11 @@ create_db: start
done
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host network change required a change from run to exec, not entirely sure why....

if os.getenv("S3_ENDPOINT"):
return boto3.client('s3', config=config, endpoint_url=os.getenv('S3_ENDPOINT'))
else:
return boto3.client('s3', config=config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use local minio if an env var is set, otherwise use S3 as normal

@@ -226,8 +234,22 @@ def compute_hash(data_to_hash):
return hashlib.sha256(data_to_hash).hexdigest()


def get_presigned_image_url(s3_path: str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulled code from upload_obj_to_s3 out to it's own function

img_id = random.choice(images).id
return models.Face(officer_id=officer.id,
img_id=img_id,
original_image_id=img_id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found + fixed a bug. We only ever used the one face on the dev data, probably unintentional.

department_id=1) for x in range(5)] + \
[models.Image(
filepath=utils.get_presigned_image_url('/images/test_cop{}.png'.format(x + 1)),
department_id=2) for x in range(5)]
Copy link
Contributor Author

@fritzdavenport fritzdavenport Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the /static/... image path to the presigned url

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet, we're no longer hackily relying on two different routes between testing and production!

# shutdown server
driver.get("http://localhost:5000/shutdown")
time.sleep(3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the special pytest flask invocation from the browser fixture (it was starting a new flask with all the defaults, including a default database connection that I couldn't override)

total = Officer.query.filter_by(department_id=dept_id).count()
perpage = BaseConfig.OFFICERS_PER_PAGE
total = list(engine.execute('select count(1) from officers where department_id = 1'))[0][0]
perpage = DevelopmentConfig.OFFICERS_PER_PAGE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly couldn't figure out how to get Officer.query to use my new engine pointing to the docker container... if anyone has an idea, lemme know.

/usr/bin/mc config host add minio http://localhost:9000 ${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY};
/usr/bin/mc policy set public minio/${S3_BUCKET_NAME};
"
network_mode: "host"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last mc container sets an access policy on the "s3" minio bucket to public, and then shuts down, like a pre-init container in kube

EXPOSE 3000

ENV PATH="/usr/src/app/geckodriver:${PATH}"
ENV SECRET_KEY 4Q6ZaQQdiqtmvZaxP1If
ENV SQLALCHEMY_DATABASE_URI postgresql://openoversight:terriblepassword@postgres/openoversight-dev
ENV SQLALCHEMY_DATABASE_URI postgresql://openoversight:terriblepassword@localhost:5432/openoversight-dev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host network swap

Copy link
Contributor

@r4v5 r4v5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a real good start! Thank you for introducing us to this project, it has some great potential for both local dev and also self-contained production use by people hosting small departments.

I left a few inline comments.

I think CI is failing because we no longer are using the static routes and thus are invoking Boto in the tests / test-data-creation parts. You can add a fallback to the current_app.config['S3_BUCKET'] getenv() call, or specify a default in the development and testing configs and that should at least let the tests get further.

I'd have to yield to @redshiftzero on the test_functional.py changes -- she knows more about testing Flask with Selenium than I do by a large margin -- but otherwise this looks real promising.

@@ -29,6 +30,13 @@
SAVED_UMASK = os.umask(0o077)


def get_s3_connection(config=None):
if os.getenv("S3_ENDPOINT"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about moving this into app/config.py and setting a constant there, so we are pulling all our env vars from one place? Could plop it right above https://github.com/lucyparsons/OpenOversight/blob/develop/OpenOversight/app/config.py#L31 and then do a if current_app.config['S3_ENDPOINT']: here (and in the endpoint_url setting on the next line)?

def upload_obj_to_s3(file_obj, dest_filename):
s3_client = boto3.client('s3')
if os.getenv('S3_CONNECTION'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above re: os.getenv :)

return models.Face(officer_id=officer.id,
img_id=num,
original_image_id=random.choice(images).id)
# for num in range(1, len(images)): # <-- bug here, we always return on the first one, the for-loop is unneeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# for num in range(1, len(images)): # <-- bug here, we always return on the first one, the for-loop is unneeded

Good catch!

department_id=1) for x in range(5)] + \
[models.Image(
filepath=utils.get_presigned_image_url('/images/test_cop{}.png'.format(x + 1)),
department_id=2) for x in range(5)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet, we're no longer hackily relying on two different routes between testing and production!

- postgres:/var/lib/postgresql/data
ports:
- "5432:5432"
network_mode: "host"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change all the containers to network_mode: "host"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we refer to the other docker services from both the frontend (unsigned url image lookups in HTML) and the backend (image uploading) - can't hit http://minio:1234 from the browser, and it's not exposed in a format the browser can hit otherwise. The alternative that I was considering was making a proxy route from the backend - but I think that'd introduce more "are we in testing?" conditionals in the codebase that I was trying to avoid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I see you already mentioned that in the PR summary and I missed it, sorry.
So basically we only save one string for the URL of an image, but the backend would need a different host to connect to minio than the front-end. That makes sense!
The docker-documentation says "The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server." Would this mean that with those changes, the dev-environment is only functional on Linux? In that case, I think we should definitely explore other options

Copy link
Contributor Author

@fritzdavenport fritzdavenport Sep 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I never knew that! That sucks! I code on linux. I'll look into the proxy endpoint if that's okay? The image can be http://localhost/proxy/file123.png or something? and the backend can fetch it from S3/minio?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I never knew that! That sucks! I code on linux.

I am also on Linux, so I cannot confirm that it doesn't work, but that's what I read from the documentation.

I'll look into the proxy endpoint if that's okay? The image can be http://localhost/proxy/file123.png or something? and the backend can fetch it from S3/minio?

Sounds good to me. Thank you for investigating this!

@fritzdavenport
Copy link
Contributor Author

Can't see the @abandoned-prototype comment re: replicating the tests that failed in CI/CD, but I'll try to wipe my env clean to see if I can get the tests to fail locally. I was a bit stuck and put this down for a while due to it. I'll also clear out the merge conflicts in the next few days.

@fritzdavenport
Copy link
Contributor Author

Closing until proxy endpoint and passing tests

@abandoned-prototype
Copy link
Collaborator

abandoned-prototype commented Sep 6, 2020

Can't see the @abandoned-prototype comment re: replicating the tests that failed in CI/CD, but I'll try to wipe my env clean to see if I can get the tests to fail locally. I was a bit stuck and put this down for a while due to it. I'll also clear out the merge conflicts in the next few days.

Sorry, I added that to the issue instead: #171 (comment) So the first thing that happened locally was that the minio image wouldn't start because of missing MINIO_ACCESS_KEY so it complains that the key is too short. So I added the same default as in the web-environment for both services.

Now, if I want to upload an image it throws an error with the following stacktrace:

web_1       | 127.0.0.1 - - [06/Sep/2020 23:36:42] "POST /upload/department/1 HTTP/1.1" 500 -
web_1       | Traceback (most recent call last):
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 2309, in __call__
web_1       |     return self.wsgi_app(environ, start_response)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 2295, in wsgi_app
web_1       |     response = self.handle_exception(e)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1741, in handle_exception
web_1       |     reraise(exc_type, exc_value, tb)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 35, in reraise
web_1       |     raise value
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
web_1       |     response = self.full_dispatch_request()
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1815, in full_dispatch_request
web_1       |     rv = self.handle_user_exception(e)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1718, in handle_user_exception
web_1       |     reraise(exc_type, exc_value, tb)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 35, in reraise
web_1       |     raise value
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1813, in full_dispatch_request
web_1       |     rv = self.dispatch_request()
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1799, in dispatch_request
web_1       |     return self.view_functions[rule.endpoint](**req.view_args)
web_1       |   File "/usr/local/lib/python3.5/site-packages/flask_limiter/extension.py", line 544, in __inner
web_1       |     return obj(*a, **k)
web_1       |   File "/usr/src/app/OpenOversight/app/main/views.py", line 1009, in upload
web_1       |     image = upload_image_to_s3_and_store_in_db(file_to_upload, current_user.get_id(), department_id=department_id)
web_1       |   File "/usr/src/app/OpenOversight/app/utils.py", line 525, in upload_image_to_s3_and_store_in_db
web_1       |     url = upload_obj_to_s3(image_buf, new_filename)
web_1       |   File "/usr/src/app/OpenOversight/app/utils.py", line 250, in upload_obj_to_s3
web_1       |     s3_client = get_s3_connection(Config(signature_version='s3v4'))
web_1       |   File "/usr/src/app/OpenOversight/app/utils.py", line 35, in get_s3_connection
web_1       |     return boto3.client('s3', config=config, endpoint_url=current_app.config['S3_ENDPOINT'])
web_1       |   File "/usr/local/lib/python3.5/site-packages/boto3/__init__.py", line 83, in client
web_1       |     return _get_default_session().client(*args, **kwargs)
web_1       |   File "/usr/local/lib/python3.5/site-packages/boto3/session.py", line 263, in client
web_1       |     aws_session_token=aws_session_token, config=config)
web_1       |   File "/usr/local/lib/python3.5/site-packages/botocore/session.py", line 861, in create_client
web_1       |     client_config=config, api_version=api_version)
web_1       |   File "/usr/local/lib/python3.5/site-packages/botocore/client.py", line 76, in create_client
web_1       |     verify, credentials, scoped_config, client_config, endpoint_bridge)
web_1       |   File "/usr/local/lib/python3.5/site-packages/botocore/client.py", line 295, in _get_client_args
web_1       |     verify, credentials, scoped_config, client_config, endpoint_bridge)
web_1       |   File "/usr/local/lib/python3.5/site-packages/botocore/args.py", line 79, in get_client_args
web_1       |     timeout=(new_config.connect_timeout, new_config.read_timeout))
web_1       |   File "/usr/local/lib/python3.5/site-packages/botocore/endpoint.py", line 289, in create_endpoint
web_1       |     raise ValueError("Invalid endpoint: %s" % endpoint_url)
web_1       | ValueError: Invalid endpoint: 'http://localhost:9000'

I can access localhost:9000 from my browser. And even from within the web-container I can access localhost:9000, although on curl I only get

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied.</Message>
<Resource>/</Resource><RequestId>163256429D8FAB5F</RequestId>
<HostId>68b327b4-1cec-4960-a7d5-a2b962e2213d</HostId></Error>

@abandoned-prototype abandoned-prototype mentioned this pull request Nov 10, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mock S3 with folder in dev/testing
3 participants