Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 #65

PonteIneptique · 2017-10-10T13:25:20Z

From #55

PonteIneptique · 2017-10-10T13:25:42Z

Thanks for the checks, I will look into this in more detail. There is an issue preventing me from debugging this. After the merges, I can't run any script as I keep getting:

Traceback (most recent call last):
  File "train.py", line 6, in <module>
    cli_train()
  File "/home/manjavacas/code/python/pandora/pandora/cli.py", line 144, in cli_train
    train_func(**vars(parser.parse_args()))
  File "/home/manjavacas/code/python/pandora/pandora/cli.py", line 94, in train_func
    tagger.setup_to_train(**data_sets)
  File "/home/manjavacas/code/python/pandora/pandora/tagger.py", line 215, in setup_to_train
    min_lem_cnt=self.min_lem_cnt)
  File "/home/manjavacas/code/python/pandora/pandora/preprocessing.py", line 378, in fit
    self.max_token_len = len(max(tokens, key=len)) + 1
ValueError: max() arg is an empty sequence

command being: python train.py config_12c.txt --train data/capitula_classic/ --dev data/capitula_classic/

PonteIneptique · 2017-10-10T13:26:09Z

Written by @Jean-Baptiste-Camps Oct 9

I think there might be a typo in your command, it's

python train.py config_12c.txt --train data/capitula_classic --dev data/capitula_classic

Notice the absence of / at the end. That is supposing that your files are at the root of the folder data/capitula_classic, and that you are using the same exact files to train and dev. Otherwise,

python train.py config_12c.txt --train data/capitula_classic/train --dev data/capitula_classic/dev

PonteIneptique · 2017-10-10T13:26:55Z

Trailing slashes in folder names are usually meaningless, so I doubt (and can confirm) that this doesn't solve the issue. I image it has to do with the way the data is loaded (the error hints at the fact that the input is empty). The dataset I am working with used to work before the merge of the last 2 PRs.
@emanjavacas Oct 9

Yes, that's why I thought it could be a problem of path. Also, I believed to have encountered an error at some point for including final / (though it's not causing an issue now).
This leaves the second point, detailing the path for train or dev up to the directory containing the .tab files for each (unless you are using the same .tab files for train and dev). On the other hand, if I try it, it does not cause this bug (only, all files from the subfolders are used both for train and dev).
@Jean-Baptiste-Camps Oct9

mikekestemont · 2017-10-18T11:22:53Z

I located the error: in train_func, it is currently assumed that your files have a .tabextension which the capitula corpus have not. I will push a sanity check for this on a separate branch.

PonteIneptique · 2017-10-18T11:24:20Z

I might say something stupid but would not in make sense to provide a glob.glob path to this instead of the current folder ?

ie path/to/dev/**/* ?

mikekestemont · 2017-10-18T11:29:50Z

The code is now doing effectively the same, but the issue was that the CLI interface has a .tab extension hardcoded into, which is why it didn't load anything in the case described by @emanjavacas

PonteIneptique · 2017-10-18T11:31:30Z

Please leave Issues open until merge of the fix next time :)

PonteIneptique added the bug label Oct 10, 2017

PonteIneptique changed the title ~~Train.py not working anymore~~ Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 Oct 10, 2017

PonteIneptique mentioned this issue Oct 18, 2017

Epoch's evaluation takes an unusual long time #55

Open

mikekestemont closed this as completed Oct 18, 2017

PonteIneptique mentioned this issue Oct 18, 2017

check whether input data was loaded #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 #65

Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 #65

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

mikekestemont commented Oct 18, 2017

PonteIneptique commented Oct 18, 2017

mikekestemont commented Oct 18, 2017

PonteIneptique commented Oct 18, 2017

Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 #65

Train.py not working anymore : self.max_token_len = len(max(tokens, key=len)) + 1 #65

Comments

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

PonteIneptique commented Oct 10, 2017

mikekestemont commented Oct 18, 2017

PonteIneptique commented Oct 18, 2017

mikekestemont commented Oct 18, 2017

PonteIneptique commented Oct 18, 2017