Cantonese-Chinese-Translation

An experimental study on Standard-Chinese to Cantonese translator models.

V2 (2020-Mar)

Abstract: Learning Cantonese from Standard-Chinese with Neural Machine Translation

(As a continuation of the previous project) this project focuses on Neural Machine Translation (NMT) between Standard Chinese and Cantonese, with the former as the source language and the latter as the target language.

Two sequence-to-sequence models were studied:

A Transformer model. which follows the Encoder-Decoder architecture and uses stacked self-attention and point-wise, fully connected layers for both the encoder and the decoder (Vaswani et al., 2017).
A vanilla RNN model, with the encoder and decoder layer being a recurrent neural network composed of Gated Recurrent Units (Chung et al., 2014), and Bahdanau attention (Bahdanau et al., 2015) in the encoder layer.

Preliminary result:

V1 (2019-Dec)

Abstract: Dialect as a Low-Resource Language: A Study on Standard-Chinese to Cantonese Translation with Movie Transcripts

Cantonese, a major Chinese spoken dialect, can be viewed a a low-resource language given that its raw written form of collection is scarce. This project develops a pipeline to accomplish the low-resource Cantonese translation task with its closely-related rich-resource language counterparts, Standard Chinese (SC). The pipeline consists of two major translation methods: (1) the sequence-to-sequence neural-network approach suggested by Jhamtani et al. (2017), and (2) the translation-matrix approach suggested by Mikolov et al. (2013). Our implementation to perform machine translation from SC to Cantonese, in a simplified setting, do not have satisfying results nor perform better than the baselines. This report describes the similarities and difference between our implementation and the original approaches, and also discusses possible future improvement.

Two major approaches are included:

Copy-Enriched Seq2Seq Models (Jhamtani., 2017)
Enriched dictionary table by Translation-Matrix (Mikolov., 2013)

Check here for more instructions. This version of code is initiated from the work by Jhamtani .

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
code		code
data		data
data_ori		data_ori
img		img
README.md		README.md
Shakespearizing-Modern-English-README.md		Shakespearizing-Modern-English-README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantonese-Chinese-Translation

V2 (2020-Mar)

Abstract: Learning Cantonese from Standard-Chinese with Neural Machine Translation

V1 (2019-Dec)

Abstract: Dialect as a Low-Resource Language: A Study on Standard-Chinese to Cantonese Translation with Movie Transcripts

About

Languages

kiking0501/Cantonese-Chinese-Translation

Folders and files

Latest commit

History

Repository files navigation

Cantonese-Chinese-Translation

V2 (2020-Mar)

Abstract: Learning Cantonese from Standard-Chinese with Neural Machine Translation

V1 (2019-Dec)

Abstract: Dialect as a Low-Resource Language: A Study on Standard-Chinese to Cantonese Translation with Movie Transcripts

About

Topics

Resources

Stars

Watchers

Forks

Languages