Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download collected data #62

Merged
merged 3 commits into from
Jan 8, 2024
Merged

Conversation

nikromen
Copy link
Member

@nikromen nikromen commented Dec 15, 2023

Fixes #49

This solution raises two concerns for the future once we have a
lot of data collected:

  • it will be a while until the data will be tarred - creating delay before
    actual download (after cca 200 feedbacks relly noticable delay :/)
    look at example with only 300 (tiny!!) feedbacks:
    [root@backend persistent]# time tar -zcf results.tar.gz results/
    real m9.473s
    user m9.328s
    sys m0.483s
    
  • downloading takes also some time

-> thus blocking the whole worker during this. IIRC we have 8 workers thus
8 downloads and API is unresponsive.

Solution:

This also creates good base for resolving #33 - only thing needed is to periodically download the data to somewhere.

@nikromen nikromen force-pushed the download-data branch 2 times, most recently from 2f1a783 to fc88a80 Compare December 15, 2023 00:08
@nikromen
Copy link
Member Author

Hmm flake8 complains about backend/api.py:270:53: E702 multiple statements on one line (semicolon) which is a semicolon in string - this isn't an error. Running pre-commit or flake8 locally don't result in this error - it should be some weird bug in our CI perhaps

Copy link
Collaborator

@jpodivin jpodivin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, except for the issue on line 270 in api.py

@nikromen nikromen force-pushed the download-data branch 2 times, most recently from 3f13b1d to d0631d6 Compare December 15, 2023 15:28
docker/cron/Dockerfile Outdated Show resolved Hide resolved
Copy link
Member

@FrostyX FrostyX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR.
It is really good overall, but I pointed out some details worth fixing.

files/cron/tar_persistent.sh Outdated Show resolved Hide resolved
backend/api.py Outdated Show resolved Hide resolved
docker/cron/Dockerfile Outdated Show resolved Hide resolved
docker/production/Dockerfile.cron Outdated Show resolved Hide resolved
docker/production/Dockerfile.cron Outdated Show resolved Hide resolved
docker/production/Dockerfile.website Outdated Show resolved Hide resolved
files/cron/tar_persistent.sh Outdated Show resolved Hide resolved
@FrostyX
Copy link
Member

FrostyX commented Dec 19, 2023

Also, please note there is a merge conflict in backend/api.py

The only part needed to do is offer a download button at the frontend page which will call /frontend/download endpoint - I tried to do it in clojure but failed :D could you please @FrostyX implement one?

Sure, I will add it in a separate PR :-)

but we will need to switch to ASGI server - gunicorn to uvicorn.

I thought it is recommended to use gunicorn for production
https://www.uvicorn.org/deployment/

As a general rule, you probably want to:

  • Run uvicorn --reload from the command line for local development.
  • Run gunicorn -k uvicorn.workers.UvicornWorker for production.
  • Additionally run behind Nginx for self-hosted deployments.

@TomasTomecek
Copy link
Collaborator

Also, this solution raises a slight concern for the future - once we have a lot
of data collected it will be a while until the data downloads, thus blocking
the whole worker during this. IIRC we have 8 workers thus 8 downloads and API
is unresponsive. This screams for async/await solution - fastapi directly
supports this (only adding async before endpoint, the iterator is already
written in this commit), but we will need to switch to ASGI server - gunicorn
to uvicorn.

One small concern - we should probably create another container for production
(cron) instead of running it all at once with script start.sh but I don't know
yet how to deploy multiple containers to openshift.

Very good points Jirka and valid concerns.

In this stage of our prototype we need to balance between production and
prototype code. Production code is ideal but takes much more time to write. In
this case, it would take a few more days to address both of these concerns. But
at the same time, your proposed solution will work well for the forseeable
future. It just won't scale once we get big.

The best thing to do now is to make sure we track all of this debt in our backlog.

Once this is merged, please open 2 separate issues for both of these.

@nikromen
Copy link
Member Author

nikromen commented Jan 2, 2024

thanks for the reviews! PR updated, PTAL.

I thought it is recommended to use gunicorn for production

yes, it's one of recommended usages, but since gunicorn is not asynchronous server we couldn't use it in the backend code if we want (that's one of the features of uvicorn and fastapi) - but now it doesn't matter since we don't use it. And since we will probably use nginx in prod (#61) we could freely switch to uvicorn when needed since uvicorn + nginx is one of recommended usages for prod

Once this is merged, please open 2 separate issues for both of these.

+1, I somehow learned how to do the multi container solution so only the gunicorn concern remains - but switching with nginx (#61) shouldn't be hard to do.

@nikromen nikromen force-pushed the download-data branch 4 times, most recently from 5e337f0 to 43c9189 Compare January 3, 2024 17:24
@TomasTomecek
Copy link
Collaborator

I spent most of my today trying to deploy this and unfortunately it can't work in the current design in the openshift environment we got :/

  1. we have only a single RWO PV which means that only a single pod can't access it - this rules out using k8s cronjobs
  2. cronie can't work in an openshift pod because /run is not writable:
$ oc logs log-detective-website-5887cf64f-j6jfb -c log-detective-cron
crond: can't open or create /run/crond.pid: Permission denied

this can't be configured: /run configuration is a compile option

So... what can we do?

  1. Change the architecture of this feature and do the tar-stuff in the def download() python code
  2. Try to create a RWX PV and utilize k8s cronjobs
  3. ditch cronie and do while true; do sleep $one_day; $do_stuff; done
  4. I'm open to more ideas

@jpodivin
Copy link
Collaborator

jpodivin commented Jan 4, 2024

I spent most of my today trying to deploy this and unfortunately it can't work in the current design in the openshift environment we got :/

1. we have only a single RWO PV which means that only a single pod can't access it - this rules out using [k8s cronjobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/)

2. cronie can't work in an openshift pod because `/run` is not writable:
$ oc logs log-detective-website-5887cf64f-j6jfb -c log-detective-cron
crond: can't open or create /run/crond.pid: Permission denied

this can't be configured: /run configuration is a compile option

So... what can we do?

1. Change the architecture of this feature and do the tar-stuff in the `def download()` python code

2. Try to create a RWX PV and utilize k8s cronjobs

3. ditch cronie and do `while true; do sleep $one_day; $do_stuff; done`

4. I'm open to more ideas

I would argue for 1.. It isn't necessarily the most elegant approach, but it's definitely fastest to implement.

@nikromen nikromen force-pushed the download-data branch 3 times, most recently from 23e94f5 to 4105ed2 Compare January 7, 2024 15:37
This solution raises two concerns for the future once we have a
lot of data collected:
- it will be a while until the data will be tarred - creating delay before
  actual download (after cca 200 feedbacks relly noticable delay :/)
  look at example with only 300 (tiny!!) feedbacks:
  [root@backend persistent]# time tar -zcf results.tar.gz results/
  real m9.473s
  user m9.328s
  sys m0.483s
- downloading takes also some time

-> thus blocking the whole worker during this. IIRC we have 8 workers thus
8 downloads and API is unresponsive.

Solution:
- how to solve the delay before download:
  fedora-copr#64
- the issue above is the slowest, once that will be resolved and people
  still complain, do this:
  fedora-copr#65
@nikromen
Copy link
Member Author

nikromen commented Jan 7, 2024

I wanted to avoid solution 1. because of #64 but yes it is the only solution I can use right now (more details in the #64)

Also this PR is already deployed on log-detective.com so you can have a look

@nikromen nikromen mentioned this pull request Jan 7, 2024
Copy link
Collaborator

@TomasTomecek TomasTomecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's deploy and test it out :)

Makefile Outdated
@@ -1,6 +1,9 @@
CONTAINER_ENGINE ?= $(shell command -v podman 2> /dev/null || echo docker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me. At least, the default must docker-compose not docker.

[jkadlcik@zeratul log-detective-website]$ make build-prod
/usr/bin/podman -f docker-compose.prod.yaml build --no-cache
Error: unknown shorthand flag: 'f' in -f
make: *** [Makefile:5: build-prod] Error 125

This works though

CONTAINER_ENGINE=docker-compose make build-prod

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I forgot to add -compose to it, thanks for noticing :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, could you please try if it works with docker-compose for you?

- ... creating env files (TODO: environment variables are still hardcoded
on some places)
- moving useful tools from base image to result image in
production to have these tools available in terminal in openshift
- adjusting openshifft deployement documentation
@nikromen
Copy link
Member Author

nikromen commented Jan 8, 2024

LGTM, let's deploy and test it out :)

already deployed at https://log-detective.com/ :)

@nikromen nikromen merged commit 574d6dd into fedora-copr:main Jan 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offer download of the raw collected data
5 participants