Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Added terminology function. #125

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

goldengrape
Copy link

I have made some changes to the translation tool that I think would be beneficial to the project. Specifically, I have added a terminology function which can be loaded using the --terminology terminology_filename command line option. While this feature may slow down the translation speed, it significantly improves the accuracy of translating professional terminology.

Please consider merging these changes into the main branch. Let me know if you have any questions or concerns.

Thank you for your time and attention.

@yihong0618
Copy link
Owner

用中文回了哈,非常感谢,但可能不会很快合并,我需要测测,带来不便不好意思。

有几个小问题:

  1. 有个红x 的原因是 CI 没过,需要安装 README black .
  2. 这里不太建议用 pandas 有点重,可以直接 read file 然后 parse 搞定
  3. 如果使用 terminology 的话需要提醒用户这样会多消耗 token
  4. 同时可以增加 README

@goldengrape
Copy link
Author

没事没事,我自己玩着先,
在计算相似性的时候也用到pandas批处理和排序,不仅仅是读取的时候用。
我找本书测试下会多出多少token来,embedding是用ada,单价是ChatGPT的1/5,但不知道往返之类的过程会怎样

@ConanChou
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants