Download collected data #62

nikromen · 2023-12-15T00:01:13Z

Fixes #49

This solution raises two concerns for the future once we have a
lot of data collected:

it will be a while until the data will be tarred - creating delay before
actual download (after cca 200 feedbacks relly noticable delay :/)
look at example with only 300 (tiny!!) feedbacks:
```
[root@backend persistent]# time tar -zcf results.tar.gz results/
real m9.473s
user m9.328s
sys m0.483s
```
downloading takes also some time

-> thus blocking the whole worker during this. IIRC we have 8 workers thus
8 downloads and API is unresponsive.

Solution:

how to solve the delay before download:
Data taring before downloads will be really slow #64
the issue above is the slowest, once that will be resolved and people
still complain, do this:
Downloads may block API #65

This also creates good base for resolving #33 - only thing needed is to periodically download the data to somewhere.

nikromen · 2023-12-15T00:17:12Z

Hmm flake8 complains about backend/api.py:270:53: E702 multiple statements on one line (semicolon) which is a semicolon in string - this isn't an error. Running pre-commit or flake8 locally don't result in this error - it should be some weird bug in our CI perhaps

jpodivin

Great work, except for the issue on line 270 in api.py

docker/cron/Dockerfile

FrostyX

Thank you very much for the PR.
It is really good overall, but I pointed out some details worth fixing.

files/cron/tar_persistent.sh

backend/api.py

docker/cron/Dockerfile

docker/production/Dockerfile.cron

docker/production/Dockerfile.website

files/cron/tar_persistent.sh

FrostyX · 2023-12-19T23:04:23Z

Also, please note there is a merge conflict in backend/api.py

The only part needed to do is offer a download button at the frontend page which will call /frontend/download endpoint - I tried to do it in clojure but failed :D could you please @FrostyX implement one?

Sure, I will add it in a separate PR :-)

but we will need to switch to ASGI server - gunicorn to uvicorn.

I thought it is recommended to use gunicorn for production
https://www.uvicorn.org/deployment/

As a general rule, you probably want to:

Run uvicorn --reload from the command line for local development.

Run gunicorn -k uvicorn.workers.UvicornWorker for production.

Additionally run behind Nginx for self-hosted deployments.

TomasTomecek · 2024-01-02T08:52:14Z

Also, this solution raises a slight concern for the future - once we have a lot
of data collected it will be a while until the data downloads, thus blocking
the whole worker during this. IIRC we have 8 workers thus 8 downloads and API
is unresponsive. This screams for async/await solution - fastapi directly
supports this (only adding async before endpoint, the iterator is already
written in this commit), but we will need to switch to ASGI server - gunicorn
to uvicorn.

One small concern - we should probably create another container for production
(cron) instead of running it all at once with script start.sh but I don't know
yet how to deploy multiple containers to openshift.

Very good points Jirka and valid concerns.

In this stage of our prototype we need to balance between production and
prototype code. Production code is ideal but takes much more time to write. In
this case, it would take a few more days to address both of these concerns. But
at the same time, your proposed solution will work well for the forseeable
future. It just won't scale once we get big.

The best thing to do now is to make sure we track all of this debt in our backlog.

Once this is merged, please open 2 separate issues for both of these.

nikromen · 2024-01-02T13:11:38Z

thanks for the reviews! PR updated, PTAL.

I thought it is recommended to use gunicorn for production

yes, it's one of recommended usages, but since gunicorn is not asynchronous server we couldn't use it in the backend code if we want (that's one of the features of uvicorn and fastapi) - but now it doesn't matter since we don't use it. And since we will probably use nginx in prod (#61) we could freely switch to uvicorn when needed since uvicorn + nginx is one of recommended usages for prod

Once this is merged, please open 2 separate issues for both of these.

+1, I somehow learned how to do the multi container solution so only the gunicorn concern remains - but switching with nginx (#61) shouldn't be hard to do.

openshift/README.md

TomasTomecek · 2024-01-04T18:56:19Z

I spent most of my today trying to deploy this and unfortunately it can't work in the current design in the openshift environment we got :/

we have only a single RWO PV which means that only a single pod can't access it - this rules out using k8s cronjobs
cronie can't work in an openshift pod because /run is not writable:

$ oc logs log-detective-website-5887cf64f-j6jfb -c log-detective-cron
crond: can't open or create /run/crond.pid: Permission denied

this can't be configured: /run configuration is a compile option

So... what can we do?

Change the architecture of this feature and do the tar-stuff in the def download() python code
Try to create a RWX PV and utilize k8s cronjobs
ditch cronie and do while true; do sleep $one_day; $do_stuff; done
I'm open to more ideas

jpodivin · 2024-01-04T21:21:37Z

I spent most of my today trying to deploy this and unfortunately it can't work in the current design in the openshift environment we got :/

1. we have only a single RWO PV which means that only a single pod can't access it - this rules out using [k8s cronjobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/)

2. cronie can't work in an openshift pod because `/run` is not writable:

$ oc logs log-detective-website-5887cf64f-j6jfb -c log-detective-cron
crond: can't open or create /run/crond.pid: Permission denied

this can't be configured: /run configuration is a compile option

So... what can we do?

1. Change the architecture of this feature and do the tar-stuff in the `def download()` python code

2. Try to create a RWX PV and utilize k8s cronjobs

3. ditch cronie and do `while true; do sleep $one_day; $do_stuff; done`

4. I'm open to more ideas

I would argue for 1.. It isn't necessarily the most elegant approach, but it's definitely fastest to implement.

This solution raises two concerns for the future once we have a lot of data collected: - it will be a while until the data will be tarred - creating delay before actual download (after cca 200 feedbacks relly noticable delay :/) look at example with only 300 (tiny!!) feedbacks: [root@backend persistent]# time tar -zcf results.tar.gz results/ real m9.473s user m9.328s sys m0.483s - downloading takes also some time -> thus blocking the whole worker during this. IIRC we have 8 workers thus 8 downloads and API is unresponsive. Solution: - how to solve the delay before download: fedora-copr#64 - the issue above is the slowest, once that will be resolved and people still complain, do this: fedora-copr#65

nikromen · 2024-01-07T16:16:10Z

I wanted to avoid solution 1. because of #64 but yes it is the only solution I can use right now (more details in the #64)

Also this PR is already deployed on log-detective.com so you can have a look

TomasTomecek

LGTM, let's deploy and test it out :)

FrostyX · 2024-01-08T10:55:12Z

Makefile

@@ -1,6 +1,9 @@
+CONTAINER_ENGINE ?= $(shell command -v podman 2> /dev/null || echo docker)


This doesn't look right to me. At least, the default must docker-compose not docker.

[jkadlcik@zeratul log-detective-website]$ make build-prod /usr/bin/podman -f docker-compose.prod.yaml build --no-cache Error: unknown shorthand flag: 'f' in -f make: *** [Makefile:5: build-prod] Error 125

This works though

CONTAINER_ENGINE=docker-compose make build-prod

you are right, I forgot to add -compose to it, thanks for noticing :)

fixed, could you please try if it works with docker-compose for you?

- ... creating env files (TODO: environment variables are still hardcoded on some places) - moving useful tools from base image to result image in production to have these tools available in terminal in openshift - adjusting openshifft deployement documentation

nikromen · 2024-01-08T11:05:50Z

LGTM, let's deploy and test it out :)

already deployed at https://log-detective.com/ :)

nikromen force-pushed the download-data branch 2 times, most recently from 2f1a783 to fc88a80 Compare December 15, 2023 00:08

nikromen mentioned this pull request Dec 15, 2023

Data backups in production #33

Closed

nikromen mentioned this pull request Dec 15, 2023

flake8 complains about semicolon in string #63

Closed

jpodivin reviewed Dec 15, 2023

View reviewed changes

nikromen force-pushed the download-data branch 2 times, most recently from 3f13b1d to d0631d6 Compare December 15, 2023 15:28

xsuchy reviewed Dec 18, 2023

View reviewed changes

docker/cron/Dockerfile Outdated Show resolved Hide resolved

FrostyX reviewed Dec 19, 2023

View reviewed changes

nikromen force-pushed the download-data branch from d0631d6 to 3b38fa9 Compare January 2, 2024 13:01

nikromen requested a review from FrostyX January 2, 2024 13:01

FrostyX reviewed Jan 3, 2024

View reviewed changes

openshift/README.md Show resolved Hide resolved

nikromen force-pushed the download-data branch 4 times, most recently from 5e337f0 to 43c9189 Compare January 3, 2024 17:24

nikromen force-pushed the download-data branch 3 times, most recently from 23e94f5 to 4105ed2 Compare January 7, 2024 15:37

nikromen requested review from xsuchy, FrostyX, jpodivin and TomasTomecek January 7, 2024 15:37

nikromen force-pushed the download-data branch from 4105ed2 to 004ca02 Compare January 7, 2024 15:40

nikromen force-pushed the download-data branch from 004ca02 to b313655 Compare January 7, 2024 15:58

nikromen mentioned this pull request Jan 7, 2024

Create stg instance #66

Open

TomasTomecek approved these changes Jan 8, 2024

View reviewed changes

FrostyX reviewed Jan 8, 2024

View reviewed changes

nikromen added 2 commits January 8, 2024 12:04

frontend: Add link for downloading collected data

e3133c3

nikromen force-pushed the download-data branch from b313655 to e3133c3 Compare January 8, 2024 11:04

nikromen merged commit 574d6dd into fedora-copr:main Jan 8, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download collected data #62

Download collected data #62

nikromen commented Dec 15, 2023 •

edited

Loading

nikromen commented Dec 15, 2023

jpodivin left a comment

FrostyX left a comment

FrostyX commented Dec 19, 2023

TomasTomecek commented Jan 2, 2024

nikromen commented Jan 2, 2024

TomasTomecek commented Jan 4, 2024

jpodivin commented Jan 4, 2024

nikromen commented Jan 7, 2024 •

edited

Loading

TomasTomecek left a comment

FrostyX Jan 8, 2024

nikromen Jan 8, 2024

nikromen Jan 8, 2024

nikromen commented Jan 8, 2024

		@@ -1,6 +1,9 @@
		CONTAINER_ENGINE ?= $(shell command -v podman 2> /dev/null \|\| echo docker)

Download collected data #62

Download collected data #62

Conversation

nikromen commented Dec 15, 2023 • edited Loading

nikromen commented Dec 15, 2023

jpodivin left a comment

Choose a reason for hiding this comment

FrostyX left a comment

Choose a reason for hiding this comment

FrostyX commented Dec 19, 2023

TomasTomecek commented Jan 2, 2024

nikromen commented Jan 2, 2024

TomasTomecek commented Jan 4, 2024

jpodivin commented Jan 4, 2024

nikromen commented Jan 7, 2024 • edited Loading

TomasTomecek left a comment

Choose a reason for hiding this comment

FrostyX Jan 8, 2024

Choose a reason for hiding this comment

nikromen Jan 8, 2024

Choose a reason for hiding this comment

nikromen Jan 8, 2024

Choose a reason for hiding this comment

nikromen commented Jan 8, 2024

nikromen commented Dec 15, 2023 •

edited

Loading

nikromen commented Jan 7, 2024 •

edited

Loading