Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too long too train new model use mitie #166

Open
autopost-get opened this issue Dec 27, 2017 · 7 comments
Open

too long too train new model use mitie #166

autopost-get opened this issue Dec 27, 2017 · 7 comments

Comments

@autopost-get
Copy link

how can i improve this situation
give me some suggestion ,please

@grafael
Copy link

grafael commented Jan 3, 2018

As far as I know, it will depend on the size of your datasets. Also, check the memory usage (maybe you are out of memory and the system starts to use swap)

@davisking
Copy link
Contributor

davisking commented Jan 3, 2018 via email

@munaAchyuta
Copy link

munaAchyuta commented Feb 28, 2018

@davisking @grafael could you please elaborate or give a little more examples
what is inconsistent labeling ? - is it mean number of labels varies from sentence to sentence !! like one sentence has 3 labels and other sentence has 5 labels..etc
how labeling mistakes affecting application performance ?
can MITIE NER run on a distributed architecture ?

@davisking
Copy link
Contributor

davisking commented Feb 28, 2018 via email

@munaAchyuta
Copy link

munaAchyuta commented Mar 7, 2018

Thanks @davisking for your quick reply. this time i made sure that my annotated training data have consistent labels. but still i don't see any progress in performance.

from log what i found, In mitie there are two trainings going on..

  1. PART-I( Train Segmenter) : which is working fine as it is using all cores and using required memory.
  2. PART-II(Train Segment Classifier) : this is not using all cores. and also using huge memory compare to data size. CRITICAL - taking huge time.

here is some log ..
Training example annotated sentences count : 373
Machine Details : RAM - 8GB, cores - 4 cores
Training to recognize 3 labels: 'B-act', 'B-sub', 'B-org'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 4
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.431569
C: 35 loss: 3 0.437956
C: 20 loss: 4.5 0.443431
C: 5 loss: 3 0.440693
C: 20 loss: 1.5 0.415146
C: 6.5555 loss: 5.16517 0.457117
C: 0.1 loss: 8.09489 0.327555
C: 0.1 loss: 3.81119 0.30292
C: 0.549491 loss: 5.61437 0.430657
C: 7.8466 loss: 5.43597 0.457117
C: 10.2883 loss: 5.1293 0.452555
C: 6.0607 loss: 5.02357 0.456204
C: 7.41861 loss: 5.28618 0.457117
C: 7.13806 loss: 5.20935 0.458029
C: 6.98092 loss: 5.16756 0.455292
best C: 7.13806
best loss: 5.20935
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.703608 0.747263 0.724779
Part I: elapsed time: 1027 seconds.

Part II: train segment classifier
now do training
num training samples: 1441
C: 200 f-score: 0.734335
C: 400 f-score: 0.735081
C: 300 f-score: 0.731994
C: 500 f-score: 0.735241
C: 700 f-score: 0.734709
C: 520 f-score: 0.733273
C: 450.957 f-score: 0.733804
C: 483.4 f-score: 0.736308
C: 480.156 f-score: 0.735241
C: 490.078 f-score: 0.734653
C: 484.607 f-score: 0.735241
C: 482.381 f-score: 0.732305
C: 483.799 f-score: 0.734653
C: 483.236 f-score: 0.732149
best C: 483.4

test on train:
286 2 0 3
0 759 0 3
0 0 43 0
4 6 0 335

overall accuracy: 0.987509
Part II: elapsed time: 19417 seconds.

total time took in hour : 5

not sure what's going on !!
yes memory always available.

Thanks in Advance.

@davisking
Copy link
Contributor

davisking commented Mar 7, 2018 via email

@munaAchyuta
Copy link

munaAchyuta commented Mar 8, 2018

Thanks @davisking .

could you please help me to understand why large value of "C" takes more time compare to small value of "C" ? where Accuracy and F1/F score are mostly same for different values of "C".

from my understanding "C" just a regularisation parameter which helps to reduce/avoid mis-classification. so it doesn't have any effect on Accuracy and F/F1 score. if my understanding is correct , can i use small value of "C" !!. if so, then what is the max minimum value of "C" i can use ? (i mean what is the minimum threshold value of "C" i can use ?) and specially in this problem ?

for above problem please find log..

=============================================== C=300
num training samples: 1441
C: 200 f-score: 0.734335
C: 400 f-score: 0.735081
C: 300 f-score: 0.731994
C: 500 f-score: 0.735241
C: 700 f-score: 0.734709
C: 520 f-score: 0.733273
C: 450.957 f-score: 0.733804
C: 483.4 f-score: 0.736308
C: 480.156 f-score: 0.735241
C: 490.078 f-score: 0.734653
C: 484.607 f-score: 0.735241
C: 482.381 f-score: 0.732305
C: 483.799 f-score: 0.734653
C: 483.236 f-score: 0.732149
best C: 483.4

test on train:
286 2 0 3
0 759 0 3
0 0 43 0
4 6 0 335

overall accuracy: 0.987509
Part II: elapsed time: 19417 seconds.
============================================== C=100
num training samples: 1420
C: 0.01 f-score: 0.673219
C: 200 f-score: 0.75807
C: 100 f-score: 0.758977
C: 148.954 f-score: 0.758783
C: 124.134 f-score: 0.759333
C: 121.721 f-score: 0.757521
C: 136.154 f-score: 0.760752
C: 134.952 f-score: 0.756639
C: 142.253 f-score: 0.757164
C: 138.668 f-score: 0.758945
C: 137.088 f-score: 0.756806
C: 136.031 f-score: 0.759333
C: 136.479 f-score: 0.759459
best C: 136.154
test on train:
286 2 0 3
0 761 0 1
0 0 43 0
4 9 0 311

overall accuracy: 0.98662
Part II: elapsed time: 6148 seconds.
============================================== C=50
num training samples: 1432
C: 0.01 f-score: 0.670678
C: 200 f-score: 0.754349
C: 100 f-score: 0.755016
C: 149.215 f-score: 0.753461
C: 121.914 f-score: 0.755938
C: 118.753 f-score: 0.753097
C: 134.168 f-score: 0.75631
C: 129.929 f-score: 0.756474
C: 129.128 f-score: 0.755917
C: 131.916 f-score: 0.754349
C: 130.128 f-score: 0.755402
C: 129.586 f-score: 0.755938
best C: 129.929
test on train:
286 2 0 3
0 761 0 1
0 0 43 0
5 10 0 321

overall accuracy: 0.985335
Part II: elapsed time: 5562 seconds.
df.number_of_classes(): 4
============================================== C=300
num training samples: 1455
C: 200 f-score: 0.73822
C: 400 f-score: 0.736475
C: 300 f-score: 0.738895
C: 271.805 f-score: 0.737705
C: 326.638 f-score: 0.735243
C: 292.355 f-score: 0.738378
C: 302.664 f-score: 0.733705
C: 296.35 f-score: 0.736475
C: 298.977 f-score: 0.737146
C: 300.35 f-score: 0.736944
C: 299.649 f-score: 0.738933
C: 299.804 f-score: 0.735961
best C: 299.649
test on train:
288 2 0 1
0 760 0 2
0 0 43 0
5 8 0 346

overall accuracy: 0.987629
Part II: elapsed time: 11576 seconds.
df.number_of_classes(): 4

============================================== C=500
Part II: train segment classifier
now do training
num training samples: 1358
PART-II C: 500
PART-II epsilon: 0.0001
PART-II num threads: 4
PART-II max iterations: 2000
C: 400 f-score: 0.774171
C: 600 f-score: 0.778615
C: 500 f-score: 0.779291
C: 538.343 f-score: 0.774471
C: 470.021 f-score: 0.779522
C: 480.425 f-score: 0.776386
C: 443.145 f-score: 0.774217
C: 463.96 f-score: 0.775954
C: 472.435 f-score: 0.775831
C: 468.168 f-score: 0.770751
C: 470.707 f-score: 0.772416
C: 469.493 f-score: 0.770333
C: 470.138 f-score: 0.779291
best C: 470.021
test on train:
287 2 0 2
0 761 0 1
0 0 43 0
6 9 0 247

overall accuracy: 0.985272
Part II: elapsed time: 18762 seconds.
df.number_of_classes(): 4

==============================================

from above log : why best C is coming nearer value of given "C" value ? no matter what C value i choose. you can see above log. my point is what is minimum best C or any threshold value of C which can be used for starting point ?

what is "num features" and why is it always 271 ?

correct me whether my interpretation is wrong !! --> "number of samples" is ( sum of number of labels in each sentence ). e.g : 2 sentence each has 3 label then number of samples is 6. right !!

Thanks in Advance. @grafael @lopuhin @baali @davisking @autopost-get @autopost-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants