too long too train new model use mitie #166

autopost-get · 2017-12-27T06:50:48Z

how can i improve this situation
give me some suggestion ，please

grafael · 2018-01-03T01:58:00Z

As far as I know, it will depend on the size of your datasets. Also, check the memory usage (maybe you are out of memory and the system starts to use swap)

davisking · 2018-01-03T02:22:00Z

You should also make sure that your labels are consistent. Datasets that are harder to label take longer to train. So if, for example, you have a huge number of labeling mistakes training will take a long time.

munaAchyuta · 2018-02-28T06:50:36Z

@davisking @grafael could you please elaborate or give a little more examples
what is inconsistent labeling ? - is it mean number of labels varies from sentence to sentence !! like one sentence has 3 labels and other sentence has 5 labels..etc
how labeling mistakes affecting application performance ?
can MITIE NER run on a distributed architecture ?

davisking · 2018-02-28T12:04:19Z

It just means you are labeling your data with incorrect labels. Like maybe sometimes you label references to the city of Boston as "city" and other times as "place". Or maybe other times you don't label it at all.

munaAchyuta · 2018-03-07T10:39:54Z

Thanks @davisking for your quick reply. this time i made sure that my annotated training data have consistent labels. but still i don't see any progress in performance.

from log what i found, In mitie there are two trainings going on..

PART-I( Train Segmenter) : which is working fine as it is using all cores and using required memory.
PART-II(Train Segment Classifier) : this is not using all cores. and also using huge memory compare to data size. CRITICAL - taking huge time.

here is some log ..
Training example annotated sentences count : 373
Machine Details : RAM - 8GB, cores - 4 cores
Training to recognize 3 labels: 'B-act', 'B-sub', 'B-org'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 4
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.431569
C: 35 loss: 3 0.437956
C: 20 loss: 4.5 0.443431
C: 5 loss: 3 0.440693
C: 20 loss: 1.5 0.415146
C: 6.5555 loss: 5.16517 0.457117
C: 0.1 loss: 8.09489 0.327555
C: 0.1 loss: 3.81119 0.30292
C: 0.549491 loss: 5.61437 0.430657
C: 7.8466 loss: 5.43597 0.457117
C: 10.2883 loss: 5.1293 0.452555
C: 6.0607 loss: 5.02357 0.456204
C: 7.41861 loss: 5.28618 0.457117
C: 7.13806 loss: 5.20935 0.458029
C: 6.98092 loss: 5.16756 0.455292
best C: 7.13806
best loss: 5.20935
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.703608 0.747263 0.724779
Part I: elapsed time: 1027 seconds.

Part II: train segment classifier
now do training
num training samples: 1441
C: 200 f-score: 0.734335
C: 400 f-score: 0.735081
C: 300 f-score: 0.731994
C: 500 f-score: 0.735241
C: 700 f-score: 0.734709
C: 520 f-score: 0.733273
C: 450.957 f-score: 0.733804
C: 483.4 f-score: 0.736308
C: 480.156 f-score: 0.735241
C: 490.078 f-score: 0.734653
C: 484.607 f-score: 0.735241
C: 482.381 f-score: 0.732305
C: 483.799 f-score: 0.734653
C: 483.236 f-score: 0.732149
best C: 483.4

test on train:
286 2 0 3
0 759 0 3
0 0 43 0
4 6 0 335

overall accuracy: 0.987509
Part II: elapsed time: 19417 seconds.

total time took in hour : 5

not sure what's going on !!
yes memory always available.

Thanks in Advance.

davisking · 2018-03-07T12:03:02Z

Sometimes it takes a while. Be patient. What's happening is MITIE is repeatedly training a classifier and doing hyper parameter selection to find the best one. So MITIE training is always going to take longer than other systems since it does a whole lot of internal validation and retraining so that you never have to fiddle with any parameters.

munaAchyuta · 2018-03-08T10:45:04Z

Thanks @davisking .

could you please help me to understand why large value of "C" takes more time compare to small value of "C" ? where Accuracy and F1/F score are mostly same for different values of "C".

from my understanding "C" just a regularisation parameter which helps to reduce/avoid mis-classification. so it doesn't have any effect on Accuracy and F/F1 score. if my understanding is correct , can i use small value of "C" !!. if so, then what is the max minimum value of "C" i can use ? (i mean what is the minimum threshold value of "C" i can use ?) and specially in this problem ?

for above problem please find log..

=============================================== C=300
num training samples: 1441
C: 200 f-score: 0.734335
C: 400 f-score: 0.735081
C: 300 f-score: 0.731994
C: 500 f-score: 0.735241
C: 700 f-score: 0.734709
C: 520 f-score: 0.733273
C: 450.957 f-score: 0.733804
C: 483.4 f-score: 0.736308
C: 480.156 f-score: 0.735241
C: 490.078 f-score: 0.734653
C: 484.607 f-score: 0.735241
C: 482.381 f-score: 0.732305
C: 483.799 f-score: 0.734653
C: 483.236 f-score: 0.732149
best C: 483.4

test on train:
286 2 0 3
0 759 0 3
0 0 43 0
4 6 0 335

overall accuracy: 0.987509
Part II: elapsed time: 19417 seconds.
============================================== C=100
num training samples: 1420
C: 0.01 f-score: 0.673219
C: 200 f-score: 0.75807
C: 100 f-score: 0.758977
C: 148.954 f-score: 0.758783
C: 124.134 f-score: 0.759333
C: 121.721 f-score: 0.757521
C: 136.154 f-score: 0.760752
C: 134.952 f-score: 0.756639
C: 142.253 f-score: 0.757164
C: 138.668 f-score: 0.758945
C: 137.088 f-score: 0.756806
C: 136.031 f-score: 0.759333
C: 136.479 f-score: 0.759459
best C: 136.154
test on train:
286 2 0 3
0 761 0 1
0 0 43 0
4 9 0 311

overall accuracy: 0.98662
Part II: elapsed time: 6148 seconds.
============================================== C=50
num training samples: 1432
C: 0.01 f-score: 0.670678
C: 200 f-score: 0.754349
C: 100 f-score: 0.755016
C: 149.215 f-score: 0.753461
C: 121.914 f-score: 0.755938
C: 118.753 f-score: 0.753097
C: 134.168 f-score: 0.75631
C: 129.929 f-score: 0.756474
C: 129.128 f-score: 0.755917
C: 131.916 f-score: 0.754349
C: 130.128 f-score: 0.755402
C: 129.586 f-score: 0.755938
best C: 129.929
test on train:
286 2 0 3
0 761 0 1
0 0 43 0
5 10 0 321

overall accuracy: 0.985335
Part II: elapsed time: 5562 seconds.
df.number_of_classes(): 4
============================================== C=300
num training samples: 1455
C: 200 f-score: 0.73822
C: 400 f-score: 0.736475
C: 300 f-score: 0.738895
C: 271.805 f-score: 0.737705
C: 326.638 f-score: 0.735243
C: 292.355 f-score: 0.738378
C: 302.664 f-score: 0.733705
C: 296.35 f-score: 0.736475
C: 298.977 f-score: 0.737146
C: 300.35 f-score: 0.736944
C: 299.649 f-score: 0.738933
C: 299.804 f-score: 0.735961
best C: 299.649
test on train:
288 2 0 1
0 760 0 2
0 0 43 0
5 8 0 346

overall accuracy: 0.987629
Part II: elapsed time: 11576 seconds.
df.number_of_classes(): 4

============================================== C=500
Part II: train segment classifier
now do training
num training samples: 1358
PART-II C: 500
PART-II epsilon: 0.0001
PART-II num threads: 4
PART-II max iterations: 2000
C: 400 f-score: 0.774171
C: 600 f-score: 0.778615
C: 500 f-score: 0.779291
C: 538.343 f-score: 0.774471
C: 470.021 f-score: 0.779522
C: 480.425 f-score: 0.776386
C: 443.145 f-score: 0.774217
C: 463.96 f-score: 0.775954
C: 472.435 f-score: 0.775831
C: 468.168 f-score: 0.770751
C: 470.707 f-score: 0.772416
C: 469.493 f-score: 0.770333
C: 470.138 f-score: 0.779291
best C: 470.021
test on train:
287 2 0 2
0 761 0 1
0 0 43 0
6 9 0 247

overall accuracy: 0.985272
Part II: elapsed time: 18762 seconds.
df.number_of_classes(): 4

==============================================

from above log : why best C is coming nearer value of given "C" value ? no matter what C value i choose. you can see above log. my point is what is minimum best C or any threshold value of C which can be used for starting point ?

what is "num features" and why is it always 271 ?

correct me whether my interpretation is wrong !! --> "number of samples" is ( sum of number of labels in each sentence ). e.g : 2 sentence each has 3 label then number of samples is 6. right !!

Thanks in Advance. @grafael @lopuhin @baali @davisking @autopost-get @autopost-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

too long too train new model use mitie #166

too long too train new model use mitie #166

autopost-get commented Dec 27, 2017

grafael commented Jan 3, 2018

davisking commented Jan 3, 2018 via email

munaAchyuta commented Feb 28, 2018 •

edited

Loading

davisking commented Feb 28, 2018 via email

munaAchyuta commented Mar 7, 2018 •

edited

Loading

davisking commented Mar 7, 2018 via email

munaAchyuta commented Mar 8, 2018 •

edited

Loading

too long too train new model use mitie #166

too long too train new model use mitie #166

Comments

autopost-get commented Dec 27, 2017

grafael commented Jan 3, 2018

davisking commented Jan 3, 2018 via email

munaAchyuta commented Feb 28, 2018 • edited Loading

davisking commented Feb 28, 2018 via email

munaAchyuta commented Mar 7, 2018 • edited Loading

davisking commented Mar 7, 2018 via email

munaAchyuta commented Mar 8, 2018 • edited Loading

munaAchyuta commented Feb 28, 2018 •

edited

Loading

munaAchyuta commented Mar 7, 2018 •

edited

Loading

munaAchyuta commented Mar 8, 2018 •

edited

Loading