Skip to content

Commit

Permalink
test GH Action
Browse files Browse the repository at this point in the history
  • Loading branch information
liao961120 committed Feb 9, 2022
1 parent e88891c commit d16f28b
Show file tree
Hide file tree
Showing 7 changed files with 179 additions and 9 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Update Corpus Data

on:
push:
repository_dispatch:

jobs:
build:
if: "!contains(github.event.commits[0].message, '[skip ci]')"
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8']

steps:
- uses: actions/checkout@v2
- name: Build data
run: |
sudo timedatectl set-timezone Asia/Taipei
pip install -r requirements.txt
python3 GlossProcessor.py ${{ secrets.GDURL2022 }}
cp -r 2022_LANG 2022_LANG.log archive/
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./archive
enable_jekyll: false
4 changes: 0 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
*.docx
*.doc
*.ipynb
*.zip
Expand All @@ -8,6 +7,3 @@ test-corp
test*
curl_download.log
BUDAI_RUKAI
2020_Budai_Rukai
2020_Budai_Rukai.log
*.json
6 changes: 3 additions & 3 deletions GlossProcessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ def read_with_guessed_encoding(fp: str):


if __name__ == "__main__":
DOCX_FOLDER_PATH = r'2020_Budai_Rukai/'
DOCX_FOLDER_PATH = r'2022_LANG/' # 2020_Budai_Rukai/
GDRIVE_URL = sys.argv[1]

logging.basicConfig(level=logging.INFO, format='%(message)s', filemode='w', filename=f'{DOCX_FOLDER_PATH.strip("/")}.log')
Expand Down Expand Up @@ -425,7 +425,7 @@ def read_with_guessed_encoding(fp: str):

# Write to json
with open("data.json", "w", encoding="utf-8") as f:
json.dump(output_glosses, f, ensure_ascii=False)
json.dump(output_glosses, f, ensure_ascii=False, separators=(',', ':'))


#-------- Get glossary --------#
Expand Down Expand Up @@ -475,4 +475,4 @@ def read_with_guessed_encoding(fp: str):


with open('glossary.json', 'w') as f:
json.dump(sorted_glossary, f, ensure_ascii=False)
json.dump(sorted_glossary, f, ensure_ascii=False, separators=(',', ':'))
143 changes: 143 additions & 0 deletions archive/2020_Budai_Rukai.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
2021-04-03 10:38:37

Balenge/20200528.docx #5 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Balenge/20200531.docx #11: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Balenge/20200531.docx #11: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavakaw/20200528.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavakaw/2020053001.docx #4 : invalid GLOSS formatting
Lavakaw/2020053001.docx #13: invalid GLOSS formatting
Lavakaw/20200415.docx #9 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavakaw/20200415.docx #9 : ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavakaw/20200429.docx #17: ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavurase/20200325.docx #3 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #3 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #12: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #12: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #14: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #14: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #15: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #15: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #16: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #16: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #17: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200325.docx #17: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200528.docx #9 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200528.docx #9 : ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavurase/20200531.docx #1 : ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavurase/20200508.docx #10: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200508.docx #10: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200508.docx #19: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200508.docx #19: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200508.docx #21: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200508.docx #21: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #4 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #4 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #5 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #5 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #7 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #7 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #16: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200422.docx #16: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200513.docx #9 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200513.docx #9 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #2 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #2 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #3 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #3 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #4 : ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200415.docx #4 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200429.docx #6 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200429.docx #14: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200429.docx #14: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavurase/20200429.docx #16: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavurase/20200429.docx #16: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Kui/2020053101.docx #19: ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Kui/20200528.docx #2 : ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Muni/20200325.docx #1 : invalid GLOSS formatting
Muni/20200325.docx #2 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Muni/20200325.docx #3 : invalid GLOSS formatting
Muni/20200325.docx #4 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Muni/20200325.docx #5 : invalid GLOSS formatting
Muni/20200325.docx #6 : invalid GLOSS formatting
Muni/20200325.docx #7 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Muni/20200325.docx #8 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Muni/20200325.docx #9 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Muni/20200325.docx #10: invalid GLOSS formatting
Muni/20200325.docx #27: invalid GLOSS formatting
Muni/20200318.docx #9 : invalid GLOSS formatting
Muni/20200408.docx #1 : invalid GLOSS formatting
Muni/20200408.docx #4 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Muni/20200408.docx #7 : invalid GLOSS formatting
Muni/20200408.docx #8 : invalid GLOSS formatting
Muni/20200408.docx #9 : invalid GLOSS formatting
Muni/20200408.docx #10: invalid GLOSS formatting
Muni/20200408.docx #12: invalid GLOSS formatting
Muni/20200408.docx #14: invalid GLOSS formatting
Lavasu/0528.docx #1 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #2 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #3 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #4 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #5 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #13: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #18: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0528.docx #19: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #1 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #2 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #3 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #4 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #5 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #7 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #14: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #15: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #16: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/032504110421.docx #19: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0513.docx #2 : ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavasu/0513.docx #3 : invalid GLOSS formatting
Lavasu/0513.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0513.docx #6 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #13: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #13: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #14: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #14: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #18: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #18: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0513.docx #19: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0513.docx #19: ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavasu/0510.docx #5 : invalid GLOSS formatting
Lavasu/0510.docx #11: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #11: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #12: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #12: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #13: ALIGNMENT > English gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #13: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/0510.docx #14: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/040804150421.docx #4 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/040804150421.docx #5 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/040804150421.docx #5 : ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/040804150421.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/040804150421.docx #7 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/040804150421.docx #13: ALIGNMENT > Chinese gloss line has LESS tokens than Original langauge
Lavasu/040804150421.docx #15: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/040804150421.docx #17: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #2 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #3 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #4 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #5 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #7 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #9 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #10: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #12: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0419.docx #13: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/03180421.docx #2 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/03180421.docx #3 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #6 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #7 : ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #12: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #15: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #16: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0429.docx #16: ALIGNMENT > Chinese gloss line has MORE tokens than Original langauge
Lavasu/0422.docx #10: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0422.docx #11: ALIGNMENT > English gloss line has MORE tokens than Original langauge
Lavasu/0422.docx #12: ALIGNMENT > English gloss line has MORE tokens than Original langauge
1 change: 1 addition & 0 deletions archive/2020_Budai_Rukai/data.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions archive/2020_Budai_Rukai/glossary.json

Large diffs are not rendered by default.

2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
falcon==2.0.0
falcon-cors==1.1.7
python-docx==0.8.10
chardet==3.0.4
beautifulsoup4==4.9.1

0 comments on commit d16f28b

Please sign in to comment.