Con't use docker image to train #38

leogitpro · 2024-08-20T06:32:52Z

Issue 1: pull docker image on linux server ( No GPU mechine, Just for testing), and run source ~/argos-train-init will be success, but run command: argos-train, That will display: No command .... argos-train,
If run: bin/argos-train, will throw module not found error: ModuleNotFoundError: No module named 'argostrain'

Issue 2: Copy bin/argos-train script to parent path and run: ./argos-train, will be run and can input arguments, but will error occured:

From code (ISO 639): ja
To code (ISO 639): en
From name: Japanese
To name: English
Version: 2.2
cnkdj-jp-addr-ja_en
['\n', '\n', 'Data compiled by [Opus](https://opus.nlpl.eu/).\n', '\n', 'Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).\n', '\n', 'Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).\n', '\n', 'Credits:\n', '\n']
Done splitting data
sentencepiece_trainer.cc(78) LOG(INFO) Starts training with :
trainer_spec {
  input: run/split_data/all.txt
  input_format:
  model_prefix: run/sentencepiece
  model_type: UNIGRAM
  vocab_size: 50000
  self_test_sample_size: 0
  character_coverage: 0.9995
  input_sentence_size: 1000000
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  pretokenization_delimiter:
  treat_whitespace_as_suffix: 0
  allow_whitespace_only_pieces: 0
  required_chars:
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  seed_sentencepieces_file:
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: -1
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇
  enable_differential_privacy: 0
  differential_privacy_noise_level: 0
  differential_privacy_clipping_threshold: 0
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv:
}
denormalizer_spec {}
trainer_interface.cc(353) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator.
trainer_interface.cc(185) LOG(INFO) Loading corpus: run/split_data/all.txt
trainer_interface.cc(409) LOG(INFO) Loaded all 873 sentences
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <unk>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <s>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: </s>
trainer_interface.cc(430) LOG(INFO) Normalizing sentences...
trainer_interface.cc(539) LOG(INFO) all chars count=28413
trainer_interface.cc(550) LOG(INFO) Done: 99.9507% characters are covered.
trainer_interface.cc(560) LOG(INFO) Alphabet size=404
trainer_interface.cc(561) LOG(INFO) Final character coverage=0.999507
trainer_interface.cc(592) LOG(INFO) Done! preprocessed 873 sentences.
unigram_model_trainer.cc(265) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(269) LOG(INFO) Extracting frequent sub strings... node_num=14236
unigram_model_trainer.cc(312) LOG(INFO) Initialized 2485 seed sentencepieces
trainer_interface.cc(598) LOG(INFO) Tokenizing input sentences with whitespace: 873
trainer_interface.cc(609) LOG(INFO) Done! 1143
unigram_model_trainer.cc(602) LOG(INFO) Using 1143 sentences for EM training
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=0 size=1324 obj=19.6514 num_tokens=4413 num_tokens/piece=3.33308
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=1 size=1239 obj=15.5659 num_tokens=4415 num_tokens/piece=3.56336
trainer_interface.cc(687) LOG(INFO) Saving model: run/sentencepiece.model
spm_train_main.cc(282) [_status.ok()] Internal: src/trainer_interface.cc(662) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())] Vocabulary size too high (50000). Please set it to a value <= 1309.
Program terminated with an unrecoverable error.
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(self, grad_h, grad_last):
Corpus corpus_1's weight should be given. We default it to 1 for you.
Traceback (most recent call last):
  File "/home/argosopentech/env/bin/onmt_build_vocab", line 33, in <module>
    sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_build_vocab')())
  File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 71, in main
    build_vocab_main(opts)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 32, in build_vocab_main
    transforms = make_transforms(opts, transforms_cls, fields)
  File "/home/argosopentech/OpenNMT-py/onmt/transforms/transform.py", line 235, in make_transforms
    transform_obj.warm_up(vocabs)
  File "/home/argosopentech/OpenNMT-py/onmt/transforms/tokenize.py", line 147, in warm_up
    load_src_model.Load(self.src_subword_model)
  File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "run/sentencepiece.model": No such file or directory Error #2
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(self, grad_h, grad_last):
[2024-08-20 06:20:54,713 WARNING] Corpus corpus_1's weight should be given. We default it to 1 for you.
[2024-08-20 06:20:54,714 INFO] Parsed 2 corpora from -data.
Traceback (most recent call last):
  File "/home/argosopentech/env/bin/onmt_train", line 33, in <module>
    sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_train')())
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 172, in main
    train(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 106, in train
    checkpoint, fields, transforms_cls = _init_train(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 58, in _init_train
    ArgumentParser.validate_prepare_opts(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 197, in validate_prepare_opts
    cls._validate_fields_opts(opt, build_vocab_only=build_vocab_only)
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 151, in _validate_fields_opts
    cls._validate_file(opt.src_vocab, info='src vocab')
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 18, in _validate_file
    raise IOError(f"Please check path of your {info} file!")
OSError: Please check path of your src vocab file!
Traceback (most recent call last):
  File "/home/argosopentech/argos-train/./argos-train", line 18, in <module>
    train.train(from_code, to_code, from_name, to_name, version, package_version, argos_version, data_exists)
  File "/home/argosopentech/argos-train/argostrain/train.py", line 163, in train
    str(opennmt_checkpoints[-2].f),
IndexError: list index out of range

What the list index out of range? I prepare the training data total 2436 rows.
Please help me to resolve the error.
Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Con't use docker image to train #38

Con't use docker image to train #38

leogitpro commented Aug 20, 2024

Con't use docker image to train #38

Con't use docker image to train #38

Comments

leogitpro commented Aug 20, 2024