Skip to content

Commit

Permalink
Merge pull request #7 from SUSYUSTC/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
SUSYUSTC authored Mar 22, 2023
2 parents 229fe80 + 46e1f6c commit 1c46732
Show file tree
Hide file tree
Showing 10 changed files with 332 additions and 174 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@ example_old
**egg**
build
**pycache**
**TENCENT**
**DEFAULT**
dist/*
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@

<p align="center"> English | <a href="README.zh.md"> 简体中文 </a></p>

**Since google translate cannot be used in China mainland, we will add support of other translation engines as soon as possible.**

This is a project to provide translation of scientific papers with heavy math symbols from any language to any language while keeping the math symbols unchanged. In most translation softwares you wouldn't be able to keep equations and it would annoy you.
This project is based on the following two tools:
1. [mathpix](https://mathpix.com/): it provides an interface to convert text+equation images to latex code. Unfortunately, it is not totally free. The price can be seen at https://mathpix.com/pricing. In further developments, we will try our best to reduce the number of requests to save your money. (This project itself is 100% free and open-source!)
Expand All @@ -42,13 +40,16 @@ Here's an example of what you get finally.
Although it is currently a small project, we are aware that this project has received much more attention that we expected. We are planning more developments for better user experience.

## Releases
### Mar 21, 2023
We add tencent translation option for users with IP in China mainland.
### Mar 16, 2023
We are now supporting all operating systems! Now you can install simply by `pip install mathtranslate`.
We are now supporting all operating systems! Now you can install simply by `pip install --upgrade mathtranslate`.

## Requirements
1. A [mathpix](https://mathpix.com/) account. Unfortunately, it is not totally free. The current price is free for 100 screenshots (requires an educational email in registeration) and $5 per month for 5000 screenshots.
2. Python3 and pip.
3. texlive (or any other tool to generate pdf from tex). For Chinese you would need CJK package.
4. (For users with IP address in China mainland): A [tencent translation api account](https://cloud.tencent.com/product/tmt). After registering you can get secret ID and secret key at [tencent console](https://console.cloud.tencent.com/cam/capi). Tencent Translate is the translation API with the highest free quota in our knowledge besides Google Translate, with a free quota of 5 million characters per month, and no fee will be deducted if there is no manual recharge (that is, there is no need to worry about misuse).

## Installation
`pip install --upgrade mathtranslate`
Expand All @@ -57,10 +58,11 @@ We are now supporting all operating systems! Now you can install simply by `pip
1. Download mathpix. In the Settings-Formatting, change "Inline math delimiters" and "Block mode delimiters" to "\\( ... \\)" and "\\[ ... \\]", respectively.
<img src="https://user-images.githubusercontent.com/30529122/225747242-07b89c34-4f16-40f9-bebc-d0c0b1c4c8e8.png" width="600">

2. Use mathpix to screenshot what you want to translate, copy the output latex code and save in a txt file. Mathpix currently recognizes continuous text (which can be one or more paragraphs). You can also screenshot and copy multiple separated texts and put them in the same txt file, we will automatically identify and merge the paragraphs separated by pictures or pages in the next step.
3. Assume the filename you saved in the previous step is `main.txt`. Run `translate_tex.py main.txt`. You will get a translated tex file `main.tex` and a corresponding pdf file `main.pdf` in case `xelatex` is installed on your machine.
4. Since this project is small, sometimes you need to slightly change the final tex file for compilation.
5. The default behavior is translating English into Chinese. If you want to translate from/to other languages, you can use `translate_tex.py --list` to find the code of your interested language and then run `translate_tex.py main.txt -from <code_from> -to <code_to>`.
2. (For tencent translation API users) Run `translate_tex.py --setkey` to store API ID and key.
3. Use mathpix to screenshot what you want to translate, copy the output latex code and save in a txt file. Mathpix currently recognizes continuous text (which can be one or more paragraphs). You can also screenshot and copy multiple separated texts and put them in the same txt file, we will automatically identify and merge the paragraphs separated by pictures or pages in the next step.
4. Assume the filename you saved in the previous step is `main.txt`. Run `translate_tex.py main.txt`. You will get a translated tex file `main.tex` and a corresponding pdf file `main.pdf` in case `xelatex` is installed on your machine.
5. Since this project is small, sometimes you need to slightly change the final tex file for compilation.
6. You can change default settings of translation languages and engine by command line argument '-engine', '-from', '-to'. For exmample `translate_tex.py -engine tencent main.txt`. You can also change setting permanently by `translate_tex.py --setdefault`. See more details by `translate_tex.py --help`.

## Examples
In the example directory, you can see `main.txt` which is the mathpix output of a part of `paper.pdf`. Run `translate_tex.py main.txt` and you will get the `main.tex` and `main.pdf`. `translated.png` is what you should expect to see in the `main.pdf`.
Expand Down
16 changes: 9 additions & 7 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@

<p align="center"> <a href="README.md">English</a> | 简体中文 </p>

**由于谷歌翻译在中国大陆无法使用,我们会尽快添加其他翻译引擎的支持。**

大多数翻译软件无法很好地处理论文中的数学公式,许多科研工作者为此饱受困扰。本项目可以将含有大量数学公式的科研论文在任何语言之间翻译。
该项目基于以下两个工具:
1. [mathpix](https://mathpix.com/): 提供了一个将 text+equation 图片转换成 latex 代码的接口。不幸的是它不是完全免费的,价格可以在 https://mathpix.com/pricing 查看。在后续开发中我们会尽量减少需要的使用次数以节省您的开支。(我们的软件本身是完全免费开源的!)
Expand All @@ -42,13 +40,16 @@
虽然它目前是一个小项目,但我们知道这个项目受到的关注比我们预期的要多得多。 我们正在计划更多的开发,以获得更好的用户体验。

## 发布
### 2023年3月21日
对IP地址在中国大陆的用户,我们增加了腾讯翻译的选项。
### 2023年3月16日
我们已经完成了对各操作系统的兼容。现在只需要 `pip install mathtranslate` 就可以完成安装。
我们已经完成了对各操作系统的兼容。现在只需要 `pip install --upgrade mathtranslate` 就可以完成安装。

## 安装需求
1. 一个 [mathpix](https://mathpix.com/) 帐户。 不幸的是,它不是完全免费的。目前 mathpix 免费提供 100 个截图(注册时需要一封edu电子邮件)或者以每月 5 美元的价格提供 5000 个截图。
2. Python3 和 pip。
3. texlive (或者任何可以从tex生成pdf的工具),中文输出需要 CJK 包。
4. (中国大陆IP用户):一个 [腾讯翻译 API](https://cloud.tencent.com/product/tmt) 帐户。 注册后可以在 [腾讯控制台](https://console.cloud.tencent.com/cam/capi) 获取 secret ID 和 secret key 。 腾讯翻译是除谷歌翻译之外我们认知范围内免费额度最高的翻译 API,每月有500万字符免费额度,且不手动充值情况下不会扣费(即不用担心误操作)。

## 安装
`pip install --upgrade mathtranslate`
Expand All @@ -57,10 +58,11 @@
1. 下载 mathpix。 在 Settings-Formatting 中,将“Inline math delimiters”和“Block mode delimiters”分别改为“\\( ... \\)”和“\\[ ... \\]”。
<img src="https://user-images.githubusercontent.com/30529122/225747242-07b89c34-4f16-40f9-bebc-d0c0b1c4c8e8.png" width="600">

2. 用 mathpix 把你要翻译的内容截图,复制输出的 latex 代码,保存到 txt 文件中。mathpix 目前可以识别连贯的文字(可以是一段或多段)。您也可以连续截图-复制多段分隔开的文字放在同一个 txt 文件中,我们在下一步的翻译中会自动识别与合并被图片或者分页隔开的段落。
3. 假设您上一步保存的文件名为 `main.txt`。在此文件夹中运行 `translate_tex.py main.txt`。 您将获得一个翻译后的 tex 文件 `main.tex`,如果您的机器上安装了`xelatex`的话也会同时生成 pdf 文件。
4. 由于本项目较小,有时需要对最终的 tex 文件稍作改动进行编译。
5. 默认方式是将英文翻译成中文。 如果您需要想使用其他语言,可以使用 `translate_tex.py --list` 找到您感兴趣的语言的代码,然后运行 `translate_tex.py main.txt -from <code_from> -to <code_to>`
2. (腾讯翻译API用户)运行`translate_tex.py --setkey`来存储API ID和key。
3. 用 mathpix 把你要翻译的内容截图,复制输出的 latex 代码,保存到 txt 文件中。mathpix 目前可以识别连贯的文字(可以是一段或多段)。您也可以连续截图-复制多段分隔开的文字放在同一个 txt 文件中,我们在下一步的翻译中会自动识别与合并被图片或者分页隔开的段落。
4. 假设您上一步保存的文件名为 `main.txt`。在此文件夹中运行 `translate_tex.py main.txt`。 您将获得一个翻译后的 tex 文件 `main.tex`,如果您的机器上安装了`xelatex`的话也会同时生成 pdf 文件。
5. 由于本项目较小,有时需要对最终的 tex 文件稍作改动进行编译。
6. 您可以通过命令行参数“-engine”、“-from”、“-to”更改翻译语言和引擎的默认设置。 例如 `translate_tex.py -engine tencent main.txt`。 您还可以通过 `translate_tex.py --setdefault` 永久更改设置。 您可以通过 `translate_tex.py --help` 查看更多细节。

## 例子
在示例目录中,您可以看到 `main.txt`,它是 `paper.pdf` 的一部分的 mathpix 输出。 运行 `translate_tex.py main.txt`,您会获得 `main.tex``main.pdf``translated.png` 是你在 `main.pdf` 里预期会看到的内容。
Expand Down
2 changes: 1 addition & 1 deletion example/clear.sh
Original file line number Diff line number Diff line change
@@ -1 +1 @@
rm -f ./main.aux ./main.log ./main.tex ./main.pdf
rm -f ./main.aux ./main.log ./main.tex ./main.pdf ./text_new ./text_old
6 changes: 4 additions & 2 deletions mathtranslate/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
__version__ = "1.1.3"
__version__ = "1.2.0"
__author__ = "Jiace Sun"

import os
ROOT = os.path.dirname(os.path.abspath(__file__))
from . import config
from .translate_tex import translate_tex
from . import tencent
from . import translation
from .translation import translate
50 changes: 47 additions & 3 deletions mathtranslate/config.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,47 @@
default_engine = 'google'
default_language_from = 'en'
default_language_to = 'zh-CN'
import os
from . import ROOT


def read_variable(path, default):
if os.path.exists(f'{ROOT}/{path}'):
return open(f'{ROOT}/{path}').read().replace(' ', '').replace('\n', '')
else:
return default


def set_variable(path, default):
var = input().replace(' ', '').replace('\n', '')
if var != '':
return print(var, file=open(f'{ROOT}/{path}', 'w'))
else:
return default


def reread():
global default_engine, default_language_from, default_language_to, tencent_secret_id, tencent_secret_key
default_engine = read_variable(default_engine_path, default_engine_default)
default_language_from = read_variable(default_language_from_path, default_language_from_default)
default_language_to = read_variable(default_language_to_path, default_language_to_default)
tencent_secret_id = read_variable(tencent_secret_id_path, tencent_secret_id_default)
tencent_secret_key = read_variable(tencent_secret_key_path, tencent_secret_key_default)


default_engine_path = 'DEFAULT_ENGINE'
default_language_from_path = 'DEFAULT_LANGUAGE_FROM'
default_language_to_path = 'DEFAULT_LANGUAGE_TO'
tencent_secret_id_path = 'TENCENT_ID'
tencent_secret_key_path = 'TENCENT_KEY'

default_engine_default = 'google'
default_language_from_default = 'en'
default_language_to_default = 'zh-CN'
tencent_secret_id_default = None
tencent_secret_key_default = None

default_engine = read_variable(default_engine_path, default_engine_default)
default_language_from = read_variable(default_language_from_path, default_language_from_default)
default_language_to = read_variable(default_language_to_path, default_language_to_default)
tencent_secret_id = read_variable(tencent_secret_id_path, tencent_secret_id_default)
tencent_secret_key = read_variable(tencent_secret_key_path, tencent_secret_key_default)

math_code = 'XMATHX'
19 changes: 19 additions & 0 deletions mathtranslate/tencent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from tencentcloud.common import credential
from tencentcloud.tmt.v20180321 import tmt_client
from .config import tencent_secret_id, tencent_secret_key, math_code


class Translator:
def __init__(self):
self.cred = credential.Credential(tencent_secret_id, tencent_secret_key)
self.client = tmt_client.TmtClient(self.cred, 'ap-shanghai')

def translate(self, text, language_to, language_from):
request = tmt_client.models.TextTranslateRequest()
request.Source = language_from
request.Target = language_to
request.SourceText = text
request.ProjectId = 0
request.UntranslatedText = math_code
result = self.client.TextTranslate(request)
return result.TargetText
Loading

0 comments on commit 1c46732

Please sign in to comment.