-
Notifications
You must be signed in to change notification settings - Fork 12
v1.0b #25
base: main
Are you sure you want to change the base?
v1.0b #25
Conversation
…h:*-cuda*-cudnn*-runtime & update Dockerfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzip, the latest solution launched with errors, here are full logs:
inca-smc-mlops-challenge-solution-84c9b6cf74-z5jwt.log
@rsolovev, found a problem with with wrong /dir in supervisord.conf - solved. |
amazon approved for me the g4dn.2xlarge instance, will try run and optimise amount of workers in solution |
please run test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzip here are the logs for the latest commit:
inca-smc-mlops-challenge-solution-758765f579-pwhwd.log
pod is running without restarts, but every curl request (even from pod's localhost) is hanging -- no response. There seems to be no problems with GPU/CUDA --
root@inca-smc-mlops-challenge-solution-758765f579-pwhwd:/solution# nvidia-smi
Thu Jun 1 10:14:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 44C P0 39W / 70W | 7632MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@inca-smc-mlops-challenge-solution-758765f579-pwhwd:/solution# python
Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> torch.cuda.get_device_name(0)
'Tesla T4'
>>>
although I cant see any logs related to redis (even unsuccessful ones), and redis-related env is set:
root@inca-smc-mlops-challenge-solution-758765f579-pwhwd:/solution# echo "$REDIS_HOST $REDIS_PASSWORD"
inca-redis-master.default.svc.cluster.local <redacted>
Thank you for run . checked, the problem was that the server did not validate the data. Solved. |
@rsolovev Added validation of input data in incoming requests. This must fixed previous issue. p.s. launched on g4dn.2xlarge , look like I can try to fit in one more cluster of workers. got some tests , in result only 8 workers(models) can fit in 16gb gpu memory. but it can be probably give better results at large amount of task. Queue made like 1,2,3,4,5 and 7,8,9,4,5 (where 4 and 5 solving task for both trains) , will try chose 2 the most faster from this 5 models and put them on double work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzip thank you! api now responds with the intended output, but the key names are a bit off -- it is expected to identify model answers by model's author rather than model's name. Please check this section of readme. Without this format, we won't be able to properly execute tests, can you please change the output? Thank you in advance
p.s the result I've got for my curl was:
curl --request POST \
> --url http://localhost:8000/process \
> --header 'Content-Type: application/json' \
> --data '"I live in London"'
{"twitter-xlm-roberta-base-sentiment": {"score": 0.759354829788208, "label": "NEUTRAL"}, "language-detection-fine-tuned-on-xlm-roberta-base": {"score": 0.9999200105667114, "label": "English"}, "twitter-xlm-roberta-crypto-spam": {"score": 0.8439149856567383, "label": "SPAM"}, "xlm_roberta_base_multilingual_toxicity_classifier_plus": {"score": 0.9999451637268066, "label": "LABEL_0"}, "Fake-News-Bert-Detect": {"score": 0.95546954870224, "label": "LABEL_0"}}
@rsolovev The output changed. Now by model's author. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsolovev thank you so much for run test. yes i will update now i have prepared conf for fit all video memory and will compare results |
@rsolovev Please, run another test , will see what more workers can improve. p.s please give 2 minutes after starting the docker instance so that all workers are loaded into the GPU memory completely before starting the test |
Question: I noticed that some participants use model optimization approaches, but in the task it is noted that "Model's performance optimization is not allowed." , from the architecture side, the "bottleneck" is the models and the memory they occupy, if exceptions are allowed in this, then please confirm, I can then completely fit the second group of workers into memory or reduce the time to calculate its answer. I see the possibility of improving the results on the maximum volume of incoming requests by 2 times due to the possibility of optimization. Of course, I still have a plan in reserve to completely split the queues of groups of workers, but when 2 workers from the first group help the second group only because 2 gigabytes of GPU memory was not enough to fill the entire group, it's a shame) |
Hey there, @vzip, we had an extensive internal debate regarding this, and a compromise we agreed on is: But please bear in mind, that there won't be any "bonus points" just for a very performant model-optimized solution, as we have various determining factors when choosing the best solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsolovev Thank you for run test. So strange why I see different in my infra. But it is ok, I swap some workers , please run test it again. And will continue try find more efficiently solution. p.s. thinking maybe redis not in same host can change timings, will cut redis and look, but in my solution redis giving important thing, if on a long distance some will crushed the taks will be safe and for shure will be done. And next onnx will test, because it change speed of solving tasks by model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi everybody. This is first example based on tornado for keeping req's for resp's , redis with multiple db's and pipelines, ioredis and Asyncio for tasker, and that's all. It's pretty simple but work stable and fast. Easy to extend the clusters of workers for expand power for processing on ML tasks. Because it's only one part in all app that making queue. I did tests on t2.xlarge aws ec2 and all processing was on CPU, i ran 5 ml worker instances it eating 8gb ram stable, and 10ml workers like a 2 cluster it is making average responds quicker x2. I plan early days make tests on GPU.
ps. config.py need to be update for make settings for more clusters run together - soon will released
Thank you All and have a good time!