Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support batch processing #62

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tianshanghong
Copy link

@tianshanghong tianshanghong commented Mar 5, 2023

Support batch processing for translation.

  • It keeps the the epub format the same as the previous version, avoiding the issue mentioned in add batch func #2
  • It speeds up the translation speed significantly. From my local test, it shortened the whole translation time of test_books/animal_farm.epub around 4.5 6 minutes with --batch_size=20.
    • I have not tuned the batch size yet. It should work faster if we increase the param.
  • It adds a new flag to the program, --batch_size. Users can customize their own batch size according to their OpenAI API plan. (detailes in OpenAI API rate limits)

@yihong0618
Copy link
Owner

can you info it in both README and README-CN

@yihong0618
Copy link
Owner

yihong0618 commented Mar 5, 2023

I think this kind of batch also hit the openai api limit?
image

@tianshanghong tianshanghong mentioned this pull request Mar 5, 2023
@yihong0618
Copy link
Owner

And will exit in my env

I add -b 10
image

@yihong0618
Copy link
Owner

the same problem as #2 I wonder it maybe some problem for some users.

@yihong0618
Copy link
Owner

yihong0618 commented Mar 5, 2023

And will exit in my env

I add -b 10 image

this maybe caused by I am using multiple keys.

image

@yihong0618
Copy link
Owner

after test maybe multiple keys problem.

@tianshanghong
Copy link
Author

tianshanghong commented Mar 5, 2023

Thanks for the reply! I have not tested it with multiple keys yet. I only have one key in global OPENAI_API_KEY env in a ubuntu 22.04 server version.

Just ran a benchmarking test again to get more precise data:

$ rm -rf __pycache__/ && rm -rf test_books/.animal_farm.temp.bin && time python3 make_book.py --book_name test_books/animal_farm.epub --no_limit --language "Simplified Chinese" --batch_size 20

......
real    6m6.450s
user    0m4.659s
sys     0m0.297s

I will update the README files later today.

@zengzzzzz
Copy link

zengzzzzz commented Mar 6, 2023

if you want to use async io,the async io sleep is better than time sleep , and for most users they still have the limit , that 's not a good news

@tianshanghong
Copy link
Author

tianshanghong commented Mar 6, 2023

@yihong0618 I tested with two API keys in my global env variable, but did not have the issue in your screenshot. Could you verify your API keys are valid?

@zengzzzzz

if you want to use async io,the async io sleep is better than time sleep

Sounds good. Will upate this.

and for most users they still have the limit , that 's not a good news

I'm a bit confused here. I guess every user who has API Key should have the same limit rate from OpenAI. Did I miss anything?

@yihong0618
Copy link
Owner

yihong0618 commented Mar 6, 2023

yes one key of mine has problem...
can we ignore the error let the code do not fail?

@tianshanghong
Copy link
Author

@yihong0618

I did not change the error handling part of the code. But I guess I got your point after reading #14 - to skip the malformed API key automatically. Is that correct?

@tianshanghong
Copy link
Author

tianshanghong commented Mar 6, 2023

@yihong0618 I added lock for def get_key and fixed the retry machanism for malformed key. Here is my screenshot for testing.

Screenshot 2023-03-05 at 10 04 33 PM

@yihong0618
Copy link
Owner

cool will test tonight(timezone +8)

@yihong0618
Copy link
Owner

yihong0618 commented Mar 6, 2023

the code in my env will hang...
image

@GOWxx
Copy link

GOWxx commented Mar 10, 2023

Feedback

Summary

I merged #62 into the main branch on my fork at https://github.com/GOWxx/bilingual_book_maker/tree/test_batch_processing, resolved the conflicts, and used the --batch_size=100 parameter. The effect was very good, and the lemo.epub was translated in just a few seconds.

However, the same code runs normally on macOS 13.1 but not on CentOS 8.6.

Normal operation on Mac OS

macos_bilingual_book

Not working on CentOS

centos_bilingual_book

Status:
image

@GOWxx
Copy link

GOWxx commented Mar 11, 2023

It seems to be an issue with openai.ChatCompletion.acreate. The same code can run on macOS, but not on CentOS.

image

image

wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants