-
Notifications
You must be signed in to change notification settings - Fork 230
/
Copy pathChangeLog
4695 lines (3931 loc) · 221 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2021-02-13 Deniz Yuret <dyuret@WS001>
* todo:
+ atype: find a solution to the atype mess.
2021-02-10 dyuret <[email protected]>
* cudnn-debug:
this is using CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stdout (normally JULIA_CUDA=CUDNN accomplishes the same, but there's a bug in CUDNN's logging)
2020-08-14 dyuret <[email protected]>
* todo:
+ finish train20
+ test/distributions and update
+ update readmes
+ util -> deprecate.jl: globals no longer needed
+ check import exports, match final Knet exports to 1.3.9
+ clean deprecated directories
+ update all examples and tutorials
++ tutorial
++ cifar10-cnn
++ dcgan-mnist
++ DeepLearningFrameworks
++ dynet-benchmark
++ fashion-mnist
++ housing-linreg
xx julia-tutorial
++ lenet
++ mnist-mlp
++ optimizers
xx README.md
++ reinforcement-learning
++ resnet
++ rnnlm
xx rnn-tutorial
++ synthetic-linreg
++ variational-autoencoder
++ vgg
+ update documentation
- push 1.4.0
- start ops21, layers21
= get rid of data, use MLDatasets and Artifacts
= check prof/ops20.jl, improve cuarrays
2020-08-11 dyuret <[email protected]>
* globals: need to decide where to put them. want modules independent of each other and of globals, eliminate cross-dependencies.
- training: both users and code
- atype: both users and code
- seed!: for users, not code
- dir: for users, not code
- cuallocator: can stay in KnetArrays, no need to export or publicise
- gpu: deprecate this
- libknet8, @knet8: only code, put in own module
2020-08-07 Deniz Yuret <deniz@razer>
* TODO: for the ops20 release:
+ ops20 is done
+ test with fast cuda.jl
- benchmark ops20 with knetarray and cuarray
- test ops20 with knetarray and cuarray
+ gcnode with cuarrays
- jld with cuarrays
- change all examples, tutorials to use explicit module imports (test with Knet that doesn't export anything).
- refactor ops20-ka and ops20-cu?
- integrate layers20 from KnetLayers
- redesign rnn and bn for ops21
- get rid of dependencies jld, datastructures, timeroutput etc
- get rid of data/*: just use MLDatasets when you can. move data.jl into train.
- gpu init
2020-01-04 dyuret <[email protected]>
* issue#513: downloading artifacts: DataDeps.jl vs Artifacts.toml.
Shortcomings of Artifacts.toml:
- Can specify urls but automatic download only possible for tar.gz.
No multiple files, no binary files (mnist, iris etc).
- The nice artifact"iris" string only available from within package.
DataDeps.jl:
- Extra dependency.
- Solves the two issues above.
- Can optionally unpack.
- No unique hash etc.
- Should use Knet prefix for names?
* data-wishlist:
- mnist
- cifar
- ...
* model-wishlist:
- yolov
- lenet
- resnet
- rnnlm
- ...
* interface:
- using Knet: LeNet
- using Knet.LeNet
- using KnetModels.LeNet
- using Knet.Models: LeNet
- should come with init that can download or randomly initialize for various model sizes.
- standard predict / loss interface.
- data read/write functions? training functions?
- (sub)package name vs struct/constructor name?
- if single symbol sufficient, can go with the first one:
-- LeNet() or ResNet(:small) => pretrained model
-- LeNet(784, 128, 256, 10) => randomly initialized model
-- (m::LeNet)(x,y) => loss
-- (m::LeNet)(x) => predict
-- (m::LeNet)(d) => average loss over dataset: need to specify type for x above, or do this with separate func
-- what about accuracy?
-- RNNLM has single input. Need different interface for prediction.
-- use pred/loss? need to revise training routines like adam.
- any standard interface for datasets?
2019-09-04 dyuret <[email protected]>
* Knet.jl: Calling CuArrays on __init__ helps with the stability of some devices.
2019-09-03 dyuret <[email protected]>
* cuarrays: failing on gitlab-ci sometimes, related to gpu type?
Tesla-K20m-sm_35.log: CUDAnative:pass CuArrays:fail(1)/pass(4451) Knet:pass
gebrd!: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/solver.jl:172
Tesla-K80-sm_37.log: CUDAnative:pass CuArrays:pass Knet:pass
Quadro-M2000-sm_52.log: CUDAnative:pass CuArrays:pass Knet:pass
GeForce-GTX-1080-Ti-sm_61.log: CUDAnative:pass CuArrays:fail(1)/pass(4451) Knet:fail(1)/pass
gebrd!: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/solver.jl:172
cpuconv: Test Failed at /dev/shm/dyuret/julia/dev/Knet/test/conv.jl:44
Tesla-P4-sm_61.log: CUDAnative:pass CuArrays:pass Knet:hang/hang(11:18)
Tesla-V100-PCIE-32GB-sm_70.log: CUDAnative:pass CuArrays:fail(4446/5) Knet:pass
Batch 2D (in 4D): Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/fft.jl:62
2D: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/fft.jl:165 (Float32)
3D: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/fft.jl:165 (Float32)
2D: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/fft.jl:165 (Float64)
3D: Test Failed at /dev/shm/dyuret/julia/packages/CuArrays/wXQp8/test/fft.jl:165 (Float64)
GeForce-RTX-2080-Ti-sm_70 *gitlab-ci* CUDAnative:fail/pass CuArrays:pass(4446 pass,5 broken) Knet:hang(10:56)
https://gitlab.com/JuliaGPU/Knet.jl/pipelines/80040815
/builds/JuliaGPU/Knet.jl/.julia/packages/CUDAnative/LkH1v/test/device/execution.jl:545
https://gitlab.com/JuliaGPU/Knet.jl/pipelines/80044768, 80054621
CuArrays:5-broken, Knet:hangs
2018-09-18 Deniz Yuret <[email protected]>
* gc: The current gc has three problems:
1. Waiting until memory is out to call gc.
- try to GC at 1GB, if you don't get half the memory back, increase to 2GB etc.
- need to keep track of how much available.
2. Hanging on to arrays that do not get reused.
- need a way to measure last used age for each bucket.
3. Not being able to reuse arrays for slightly smaller arrays.
- exponential buckets waste too much memory.
- we may not need this if #2 resolved.
2018-09-05 dyuret <[email protected]>
* examples:
? Knet/tutorial
+ charlm: deprecate, in tutorial
+ cifar10-cnn
+ dcgan-mnist
+ DeepLearningFrameworks: Knet.CNN example has accuracy regression.
? dynet-benchmark
+ fashion-mnist
+ housing-linreg
+ julia-tutorial: check old commands, turn into notebook.
+ knet-tutorial: deprecate -> Knet/tutorial
+ lenet
+ mnist-mlp
+ optimizers
x overfitting: deprecate this, it is replicated in tutorial.
? reinforcement-learning
+ resnet
= rnnlm: update this with new interface, check for performance regression
+ rnn-tutorial: check for performance regression
+ synthetic-linreg
+ variational-autoencoder
+ vgg
2018-08-11 Deniz Yuret <[email protected]>
* julia7-compat-todo:
- fix JLD.
- unsafe_copy! is not in base any more? fix unsafe_copy!, unsafe_convert in karray looking in base
- unary_nd, indexed_function, isequivalent, _dbg, ssize not in AutoGrad any more.
- fix KnetDisplay (summary line should show KnetArray) and other display/show problems.
- separate and move KnetArray to KnetML.
- reorganize unary.jl and broadcast.jl
- check TODO.
+ seed! gc dir -- just use the same names but have Knet versions.
- data
+ deps
- docs
- examples
- prof
+ src: CPU:20/20
+ test: CPU:14/14
- limit max memory allocated by kptr
- AutoGrad only uses broadcasted now, compare performance with using broadcast
- try making karray <: AbstractArray and overriding get/setindex.
- search for TODOs.
- new AutoGrad interface.
- test on other AD and GPUarray pkgs.
- add using LinearAlgebra: lmul!, rmul! to test/linalg.jl
- use global keyword in the for loops in tests
- update travis.yml (and even better add gpu testing through #312)
- add Project.toml
- add Manifest.toml to .gitignore
- update readme badges
- eventually, slim down update! and rnn gpu tests
2018-08-09 Deniz Yuret <[email protected]>
* broadcast: we override broadcast and broadcasted for Rec and KnetArray.
- dot operations turn into broadcasted expressions.
- Rec overrides broadcasted to call broadcast_r directly.
- KnetArray should override broadcasted to call broadcast directly.
2017-09-06 EC2 Default User <[email protected]>
* julia6-compat-todo:
+ branches for cuarrays and gpu arrays
+ checkout autograd master in travis
+ need to figure out how to handle cat in autograd.
+ notebooks, vgg. resnet, prof
+ test autocad examples
x test all examples on 4,5,6
+ update news: autograd done, knet done.
- go thru issues: autograd done, issues left.
- branch for reversediff?
- test on latest
- speed test
- examples/optimizers.jl too slow.
- examples/charlm.jl does not pass gradcheck.
- new autograd interface
- broadcast without broadcast_func symbols
2017-09-01 EC2 Default User <[email protected]>
* julia6-compat-todo:
julia4 julia5 julia6
kptr 1 1 1
gpu 1 1 1
distributions 1 1 1
update 1 1 1
karray 1 1 1
linalg 1 1 1
conv 1 1 1
broadcast 1 1 1
unary 1 1 1
reduction 1 1 1
2017-07-29 dyuret <[email protected]>
* julia6-compat-todo:
- fix precompile warnings: WARNING: deprecated syntax "Expr(:ccall)". Use "Expr(:call, :ccall)" instead.
- Pkg.test passes: kptr, gpu, distributions, conv
- fix Pkg.test warnings: linalg
- fix Pkg.test errors: update, karray, broadcast, reduction, unary
- fix Pkg.build errors: Pkg.build does not work.
2017-05-17 Deniz Yuret <[email protected]>
* TODO-KUparser:
Speed issues with KUparser.
First epoch slower: do we need to pre-allocate and not use cudaMalloc?
Slow-down of long runs: is it gpu copy or parsing algorithm? (dynamic-oracle gives clues)
update! slows down in runtests if at the end!
Fix dynet benchmarks, incorporate more (lstm, logp) from cudnn.
Check out GPUArray and ReverseDiff.
Start testing with Julia6 maybe better with speed.
Need to figure out the new Julia6 broadcasting syntax.
Also try the recommendation to detect one-out-of-k argument signature.
2017-04-07 Deniz Yuret <[email protected]>
* DONE:
* TODO8:
# handle KnetFree==nothing in knetgc()
# gpu(false) does not clear out Knet memory structures?
# add mean/std for KnetArray.
# implement nce. debug nce8 branch. try on large data/vocab. why doesn't it converge on ptb?
# broadcast kernels: debug 16/17, add tests, add benchmarks.
# check batch normalization in resnet, add it to src. -- waiting for ilker.
# document load/save via JLD.
# recover kuparser.
# navigation: implement world represenation.
# apply new rnn ideas to doc, readme, tutorial, slides, ijulia.
# fix reduction kernels for large matrices (10K-100K rows start giving bad answers): enis looking
# Latest master julia6 failing on nightly; AutoGrad also fails.
# try feeding rnnlm subset of embed matrix as weight.
# functional lstm interface defining multi-output grad2 function.
# Setup attobot for next release: https://github.com/attobot/attobot.
# prepare and submit julia4/5 benchmarks: find out why benchmarks are slower on julia5/6: https://github.com/JuliaLang/julia/issues/18135 ?
# Issue 89: reduced_dims -> reduced_indices in 0.5.1, stop using unexported functions from base. Other examples: to_indexes, dims2string, deepcopy_internal, LinearFast, show_backtrace, decolon (in AutoGrad).
# AutoGrad docs convert from comments to docstrings in core.jl and include in Knet manual. wip in newdocs branch.
# AutoGrad tests convert to the new test system.
# add some test for deconv vs conv, gclip.
# gpu switching back and forth does not seem to work. do we really need multiple handles for cublas, cudnn? do we need them for libknet8?
# AutoGrad: run all tests with KnetArrays
# AutoGrad: support for keys, values, next for dictionaries.
# add benchmarks with new dynamic frameworks: Yoav tweet, Volkan email. assigned to Ilker, Enis.
# docs todo: Julia tutorial. simplify examples. Baris Bozkurt's comments.
# docs todo: perceptron. kernel perceptron. svm. lenet and vgg in cnn section.
# mnist2d: implement/test sparse arrays
# time doing a single im2col instead of N for conv4
# replace T<: conditions in functions with generated code for each type
# 0.7: rename the 73 functions, cpu tests (add conv), v0.5 compat. check old todo list
# DL439: If one has access to numerical computation on complex numbers, then there isa very efficient way to numerically estimate the gradient by using complex numbersas input to the function (Squire and Trapp, 1998)
# optimization:
## make BLK,THR dependent on the input size? may improve final sum which is only 10x100 in this example.
## extend benchmark tests to cover all combinations of 10,100,1000 dimensions.
## optimize reduction and broadcasting kernels.
## optimize logp / softmax.
## optimize conv4 / matmul - arrayfire? cudnn conv instead of matmul? cudnn conv algorithms? fft paper from https://arxiv.org/abs/1601.06815.
## try fusion: we can do layers in one kernel call: relu(wx+b).
## try streams or multiple gpus.
## for general arrays: broadcast, get/setindex, h/v/cat. Enis working on this.
2017-03-28 Deniz Yuret <[email protected]>
* examples/rnnlm.jl: Trying to replicate one of:
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf
http://www.fit.vutbr.cz/~imikolov/rnnlm/asru_large_v4.pdf
https://arxiv.org/abs/1312.6026
https://arxiv.org/abs/1409.2329
https://arxiv.org/abs/1508.06615
(:epoch,28,:perp,103.928986f0,173.2658f0,157.65414f0) RNNLM.main("--seed 1 --epochs 30 --dropout 0.5 --hidden 200 --embed 200 --batch 100 --best foo.jld")
(:epoch,14,:perp,52.10398f0,140.70792f0,130.77716f0) RNNLM.main("--seed 1 --epochs 30 --dropout 0.5 --hidden 650 --embed 650 --batch 100 --best foo650.jld")
(:epoch,27,:perp,75.37883f0,135.43819f0,127.377f0) RNNLM.main("--seed 1 --epochs 30 --dropout 0.5 --hidden 200 --embed 200 --batch 100 --optim Adagrad()")
(:epoch,5,:perp,69.288506f0,184.24977f0,171.84555f0) RNNLM.main("--seed 1 --epochs 30 --hidden 200 --embed 200 --batch 100 --optim Adagrad()")
(:epoch,30,:perp,113.68978f0,166.22272f0,154.77658f0) RNNLM.main("--seed 1 --epochs 30 --hidden 200 200 --embed 200 --batch 100 --dropout 0.5 --optim Adagrad()")
(:epoch,25,:perp,181.603f0,239.55917f0,225.10878f0) RNNLM.main("--seed 1 --epochs 30 --hidden 200 200 --embed 200 --batch 100 --dropout 0.5 --optim Sgd(lr=1,gclip=5)")
Paper claims perp < 100. Potential reasons we don't get this:
x Dropout not working. significant drop when turned off.
- Difft way to measure (exclude eos etc)
- Adam bad, sgd/gclip good. (adagrad works better). Paper uses SGD(lr=1,gclip=5) halving lr if devperp does not go down by 1.
- Difft lstm type?
- Initialization different. [-0.05,0.05]
- Batchsize different = 20.
- Highway network.
- BPTT 35 time steps: charlm style not sentence bound?
- Number of layers = 2! Table 2 gives 2x300 for char model. 2x200 for word model?
2017-03-19 Deniz Yuret <[email protected]>
* DONE:
2017-03-17 Deniz Yuret <[email protected]>
* DONE:
# charlm: Implement rnns using separate embedding vectors and a @primitive concat operation: fix charlm demo, implement efficient s2s demo, s2s with attention. Need to first solve minibatching. Is v/hcat efficient? KnetArray implements only 2-arg version. AutoGrad calls cat, supports multiple args, grad uses uncat which uses getindex with an array of indices. KnetArray probably does not support an array of indices.
# charlm: try indexing op instead of sparse matmul.
# charlm: try lstm time concatenation.
# charlm: add adam: sgd(3.5) works faster. short bptt in initial epochs works faster.
# Add dropout as a primitive. Need kernel.
# charlm: add dropout
# s2s profile.
# check julia4 and julia6 compat, find out why julia pkgs marks knet as broken.
# fix intermittent test errors: https://github.com/JuliaLang/julia/pull/20736#issuecomment-283834724
2017-03-15 Deniz Yuret <[email protected]>
* profile: %41 forw (%20 logp), %44 back (%21 logp), %14 sum_outgrads
* flat: 2027 logp, 1761 sum, 1410 gemm, 1373 sum_outgrads
4115 ...et/.julia/v0.5/AutoGrad/src/core.jl:88; forward_pass(::Function, ::Tuple{Dict{Symbol,Any},Arr...
12 /Users/dyuret/knet/master/prof/s2s.jl:83; state = initstate(inputs[1], model[:state0]) # 14
92 /Users/dyuret/knet/master/prof/s2s.jl:86; input = lstm_input(model[:embed1], input) # 85
717 /Users/dyuret/knet/master/prof/s2s.jl:87; state = lstm(model[:encode], state, input) # 723
9 /Users/dyuret/knet/master/prof/s2s.jl:91; input = lstm_input(model[:embed2], EOS) # 3
692 /Users/dyuret/knet/master/prof/s2s.jl:95; state = lstm(model[:decode], state, input) # 702
80 /Users/dyuret/knet/master/prof/s2s.jl:100; input = lstm_input(model[:embed2],output) # 61
39 /Users/dyuret/knet/master/prof/s2s.jl:102; state = lstm(model[:decode], state, input) # 30
1 /Users/dyuret/knet/master/prof/s2s.jl:106; gold = vcat(outputs..., EOS) # 1
2473 /Users/dyuret/knet/master/prof/s2s.jl:107; sumlogp = lstm_output(model[:output], preds, gold) # 2441
43 /Users/dyuret/knet/master/prof/s2s.jl:114; pred1 = vcat(preds...) # 46
248 /Users/dyuret/knet/master/prof/s2s.jl:115; pred2 = pred1 * param[1] # 242
144 /Users/dyuret/knet/master/prof/s2s.jl:116; pred3 = pred2 .+ param[2] # 145
2037 /Users/dyuret/knet/master/prof/s2s.jl:117; sumlogp = logprob(gold, pred3) # 2006
2027 ...rs/dyuret/knet/master/prof/s2s.jl:135; o1 = logp(ypred,2) # 1999
833 ...ret/.julia/v0.5/Knet/src/unary.jl:176; x1 = maximum(x,d...)
120 ...ret/.julia/v0.5/Knet/src/unary.jl:177; x2 = x .- x1
122 ...ret/.julia/v0.5/Knet/src/unary.jl:178; x3 = exp(x2)
827 ...ret/.julia/v0.5/Knet/src/unary.jl:179; x4 = sum(x3,d...)
1 ...ret/.julia/v0.5/Knet/src/unary.jl:180; x5 = log(x4)
123 ...ret/.julia/v0.5/Knet/src/unary.jl:181; x6 = x2 .- x5
6 ...rs/dyuret/knet/master/prof/s2s.jl:136; o2 = o1[index] # 4
4 ...rs/dyuret/knet/master/prof/s2s.jl:137; o3 = sum(o2) # 2
4409 ...et/.julia/v0.5/AutoGrad/src/core.jl:231; backward_pass(::AutoGrad.Rec{Dict{Symbol,Any}}, ::Aut...
932 ./<missing>:0; *(::Type{AutoGrad.Grad{2}}, ::Knet.KnetArray{Float32...
1 ./<missing>:0; +(::Type{AutoGrad.Grad{1}}, ::Knet.KnetArray{Float32...
410 ./<missing>:0; .*(::Type{AutoGrad.Grad{2}}, ::Knet.KnetArray{Float3...
148 ./<missing>:0; .+(::Type{AutoGrad.Grad{2}}, ::Knet.KnetArray{Float3...
21 ./<missing>:0; getindex(::Type{AutoGrad.Grad{1}}, ::Array{Any,1}, :...
2142 ./<missing>:0; logp(::Type{AutoGrad.Grad{1}}, ::Knet.KnetArray{Floa...
818 ...uret/.julia/v0.5/Knet/src/unary.jl:195; dx1 = sum(dy,d...)
120 ...uret/.julia/v0.5/Knet/src/unary.jl:196; dx2 = exp(y)
1038 ...uret/.julia/v0.5/Knet/src/unary.jl:197; dx3 = dx2 .* dx1
166 ...uret/.julia/v0.5/Knet/src/unary.jl:198; dx4 = dy - dx3
245 ./<missing>:0; lstm_input(::Type{AutoGrad.Grad{1}}, ::Knet.KnetArra...
143 ./<missing>:0; sigm(::Type{AutoGrad.Grad{1}}, ::Knet.KnetArray{Floa...
6 ./<missing>:0; sum(::Type{AutoGrad.Grad{1}}, ::Float32, ::Float32, ...
106 ./<missing>:0; tanh(::Type{AutoGrad.Grad{1}}, ::Knet.KnetArray{Floa...
10 ./base.jl:151; vector_any()
2 ...AutoGrad/src/base/abstractarray.jl:0; cat(::Type{AutoGrad.Grad{17}}, ::Knet.KnetArray{Floa...
85 ...AutoGrad/src/base/abstractarray.jl:85; cat(::Type{AutoGrad.Grad{11}}, ::Knet.KnetArray{Floa...
1373 ...et/.julia/v0.5/AutoGrad/src/core.jl:233; backward_pass(::AutoGrad.Rec{Dict{Symbol,Any}}, ::Aut...
360 ...lia/v0.5/AutoGrad/src/interfaces.jl:71; sum_outgrads(::Void, ::AutoGrad.UngetIndex)
569 ...lia/v0.5/AutoGrad/src/interfaces.jl:87; sum_outgrads(::Dict{Symbol,Any}, ::AutoGrad.UngetIndex)
3 ...lia/v0.5/AutoGrad/src/interfaces.jl:92; sum_outgrads(::Array{Any,1}, ::AutoGrad.UngetIndex)
77 ...uret/.julia/v0.5/Knet/src/karray.jl:905; sum_outgrads{T}(a::KnetArray{T},b::KnetArray{T})=(a+b)
346 ...uret/.julia/v0.5/Knet/src/karray.jl:908; c = sum_outgrads_karray(a, b.value, b.index...)
2017-03-12 Deniz Yuret <[email protected]>
* DONE:
# fix repeated index gradient bug.
# s2s: Add a generic s2s example to Knet/examples. Need to solve minibatching / concat first.
# docs: write rnn chapter. fix earlier chapters.
2017-03-11 Deniz Yuret <[email protected]>
* indexing: repeated indices are not handled properly by
ungetindex, they need to be summed in backward pass but setindex
just overwrites. Let's understand how Julia indexing works:
a[:,i]
522 ./abstractarray.jl:752; getindex(::Array{Float64,2}, ::Colon, ::Array{Int64,1})
? ./multidimensional.jl:270; _getindex(::Base.LinearFast, ::Array{Float64,2},...
522 ./multidimensional.jl:291; _unsafe_getindex(::Base.LinearFast, ::Array{Float64,2},...
149 ./multidimensional.jl:296; macro expansion
373 ./multidimensional.jl:298; macro expansion
373 ./multidimensional.jl:340; _unsafe_getindex!
373 ./multidimensional.jl:348; macro expansion
6 ./cartesian.jl:62; macro expansion
367 ./cartesian.jl:64; macro expansion
45 ./cartesian.jl:62; macro expansion
7 ./multidimensional.jl:349; macro expansion
315 ./multidimensional.jl:350; macro expansion
_unsafe_getindex is a @generated function which uses @nexprs, @ncall, to_index.
http://docs.julialang.org/en/latest/manual/metaprogramming.html#Generated-functions-1
@generated functions return quoted expressions that get compiled
at runtime. Only the types of expressions can be accessed in their
body.
@ncall 3 func a
==> func(a_1, a_2, a_3)
@ncall 2 func a b i->c[i]
==> func(a, b, c[1], c[2])
@nexprs 4 i -> y[i] = A[i+j]
==> y[1]=A[1+j]; y[2]=A[2+j]...
to_index defined in operators.jl, converts to Array,Colon,Int
Numbers -> Int
Colon is kept
BitArray -> find(I) -> Vector{Int} # although there is a special _unsafe_getindex which skips using find
Other arrays are kept (int or cartesianindex)
index_shape(A,I_1,I_2,...) gives the destination shape
indices(b) => (Base.OneTo(10000),Base.OneTo(10000))
eachindex(b) => Base.OneTo(100000000)
Finally we call _unsafe_getindex!(dest,A,I_1,I_2,...) at multidimensional.jl:340
J = decolon(src,I_1,I_2,...) converts colons to explicit indices.
@nloops N itersym rangeexpr bodyexpr
@nloops N itersym rangeexpr preexpr bodyexpr
@nloops N itersym rangeexpr preexpr postexpr bodyexpr
# this indexes dest with linear, src with cartesian indices
D = eachindex(dest)
Ds = start(D)
for j_2 in J_2
for j_1 in J_1
d,Ds = next(D,Ds)
dest[d] = getindex(src,j_1,j_2)
end
end
We need to write an accumulating version of setindex! (addindex!) for ungetindex.
_setindex! for N-D indices L364
_setindex! for 1-D indices L368 uses _maybe_reshape(A)
_unsafe_setindex! is called
_unsafe_batchsetindex! is called with (A,_iterable(x),to_indexes(J...)...) L420
Here is macroexpanded version for N=2:
X is iterated over. A is called with individual indices.
We just need to zero out A and add for AutoGrad.
We need to write a KnetArray specific version in Knet.
quote # none, line 2:
begin
I_1 = I[1]
I_2 = I[2]
end # none, line 3:
idxlens = index_lengths(A,I_1,I_2) # none, line 4:
setindex_shape_check(X,idxlens[1],idxlens[2]) # none, line 5:
J = decolon(A,I_1,I_2) # none, line 6:
begin
J_1 = J[1]
J_2 = J[2]
end # none, line 7:
Xs = start(X) # none, line 8:
begin
$(Expr(:inbounds, true))
begin # cartesian.jl, line 62:
for j_2 = J_2 # cartesian.jl, line 63:
nothing # cartesian.jl, line 64:
begin # cartesian.jl, line 62:
for j_1 = J_1 # cartesian.jl, line 63:
nothing # cartesian.jl, line 64:
begin # none, line 9:
(v,Xs) = next(X,Xs) # none, line 10:
setindex!(A,v,j_1,j_2)
end # cartesian.jl, line 65:
nothing
end
end # cartesian.jl, line 65:
nothing
end
end
$(Expr(:inbounds, :pop))
end # none, line 12:
A
end
2017-03-10 Deniz Yuret <[email protected]>
* DONE:
# Added KnetArray indexing support for: Int, Colon, UnitRange, StepRange, CartesianIndex, Array{Int}, Array{Bool}, Array{CartesianIndex}. Multidimensional indexing incomplete.
2017-03-03 Deniz Yuret <[email protected]>
* DONE:
# Implement a decent hill climbing algorithm for hyperopt: do not repeat, have acceleration, independent step sizes, guarantee local minima, one dimension at a time, pick dimension/move using bandits or a smart queue. Wikipedia has good pseudocode. Bandits may work better.
2017-03-01 Deniz Yuret <[email protected]>
* DONE:
# Add gclip to update!.
2017-02-26 Deniz Yuret <[email protected]>
* DONE:
# Issue 88: better handling of difft elt types on data vs weights in update!
2017-02-23 Deniz Yuret <[email protected]>
* DONE:
# bug in AutoGrad: broadcast.jl: $f(x1::Rec,x2::AbstractArray)=$f(x1,x2.value)
# gradcheck fails on: bmax(x,y)=broadcast(max,x,y) or bmin(x,y)=((y.<x).*y+(x.<y).*x)
# grad of convert (AutoGrad issue), implemented in karray.jl, need to port.
2017-02-22 Deniz Yuret <[email protected]>
* DONE:
# automatically generate README.md from tutorial.md.
# make examples easier to load (turn off Pkg install for vgg etc, docs take too long)
## no easy way to do it without effecting source links.
2017-02-19 Deniz Yuret <[email protected]>
* DONE:
# use mocha for cpu conv/pool. wip.
# cpuconv todo:
# ok: need to get rid of mask in pool_back
# ok: reimplement conv4 in terms of im2col
# ok: need low level blas call with pointers
# ok: reimplement conv4x conv4w using col2im?
# ok: need separate cpu and gpu libraries: condition makefile on finding nvcc, also cond openmpi like mocha/dep
2017-02-18 Deniz Yuret <[email protected]>
* DONE:
# add tests for new update interface.
2017-02-16 Deniz Yuret <[email protected]>
* DONE:
# permutedims ambiguity in AutoGrad in Julia 0.4.
# docs imagesize: The ![]() syntax doesn't support image size. However, you should be able to use the @raw block to insert custom HTML with an <img> tag as follows:
```@raw html
<img src="..." height="..." width="...">
```
# add x=unpool(pool(x)) test,
# add davide fix to knet before release
# Contents does not show on README.md, @ref doesn't work etc. New README?
# fix unclear docs for optimization, tutorial example? better yet, impl improved update interface.
2017-02-15 Deniz Yuret <[email protected]>
* DONE:
# KnetArray ==, ===, isapprox etc. missing.
# copy! for KnetArray missing. (we have fill!, rand! etc. is copy! supported by AutoGrad? should refuse for Rec)
# deepcopy does not work for KnetArray
# override KnetArray and Array instead of cpu2gpu etc.
# add KnetArray{Float32}(3) type init like Array and deprecate the other.
# Make BaseTestNext conditional on julia version. If not possible, create header.jl in test/ and check version/pkg.
2017-02-13 Deniz Yuret <[email protected]>
* DONE:
# Latest master failing on Julia 0.4 (convtest) and nightly distros.
# TonyKelman on Compat: If you use Compat in your tests, it needs to be in test/REQUIRE or REQUIRE.
# TonyKelman on removing nightly from travis: you can make this an allowed failure so it'll run but won't make your status red
2017-02-12 Deniz Yuret <[email protected]>
* unit-test-TODO:
+ unary.jl: cpu=25s gpu=41s
+ broadcast.jl: cpu=19s gpu=34s
+ reduction.jl: cpu=18s gpu=28s (6 fail)
+ karray.jl: cpu=(fail) gpu=12s
+ linalg.jl: cpu=(fail) gpu=15s
+ update.jl: cpu=6.6s gpu=19s
+ kptr.jl: cpu=2.5s gpu=2.5s
+ gpu.jl: cpu=3s gpu=2.4s
+ distributions.jl: cpu=3s gpu=3s
+ conv.jl: cpu=(fail) gpu=13s (10 broken)
+ runtests.jl: switch to new tests, figure out why so slow
+ karray,linalg: cpu tests failing
+ reduction: gpu tests failing
+ conv.jl: debug unpool, add cpuconv
2017-02-10 Deniz Yuret <[email protected]>
* DONE:
# doc warnings about missing broadcast, reduction, restricted cat, indexing, no bool array etc.
# figure out rtfd doc setup or forwarding.
# figure out no KnetArray in cpu problem
# KnetArray does not work on cpu-only (should it? no. operators overloaded assuming gpu)
# add back cpu convolution
# gputests.jl and cputests.jl are broken.
# resolve issues
2017-02-09 Deniz Yuret <[email protected]>
* DONE:
# missing docs for distributions
# the update methods not documented yet.
# 0.8.1: tag new version for the paper when all is done.
# add transpose to KnetArray.
# source links? (especially for examples)
# docs todo: warn against overwriting arrays.
# use original docstrings for examples readme, put examples readme in docs.
2017-02-08 Deniz Yuret <[email protected]>
* Documenter.jl: (TODO) Julia documentation moved back to MD
supported by Documenter.jl. Supports latex, doctest, xrefs,
search. Hosted on github pages built throgh Travis (no need for
Python). PDF output? PDF:
https://juliadocs.github.io/Documenter.jl/stable/lib/internals/writers.html
Automatic conversion from rst? Try Pandoc. Use markdown_github as
output.
Using sips command on osx to resize images.
- doctest, does it accept ... ?
- how do we generate docs from docstrings? ```@docs
- can use function f end to document a zero method function
- you can document fields of a type.
- how do you refer to julia function docs from knet docstrings?
- @doc "..." foo is used when foo is defined by generated code
- can we default link text to link content?
- $(EXPORTS) used in module doc to list exported symbols.
- using ```@contents, @index, @docs
- where are the source links?
2017-02-07 Deniz Yuret <[email protected]>
* src/Makefile (CFLAGS): code refactoring:
Function lists (cuda??.jl) and cuda code generators
(cuda??_gen.jl) do not go well together. The reason is the same
function list (e.g. binary array ops) are used by more than one
type of cuda kernel (same size vs broadcasting kernels). It makes
more sense to collect function lists in semantically named files
(unary, broadcasting, reduce etc.).
cuda1: abs2,abs,acos,... => unary
cuda10: add,sub,mul,... => broadcast
cuda11: add,sub,mul,...
cuda12: add,sub,mul,...
cuda20: sum,prod,... => reduction
cuda21: sum,prod,...
2016-10-24 dyuret <[email protected]>
* julia.st: for pretty-printing Julia use:
enscript -Ejulia -M A4 -2rGC -o foo1.ps core.jl
ps2pdf foo1.ps
enscript does not come with a julia format. You can create one
using matlab.st (for keywords) and python.st (for strings,
comments) under /usr/share/enscript/hl/julia.st.
2016-10-21 Deniz Yuret <[email protected]>
* DONE:
# implement adam.
# gpu(true) leaves memory footprint on each device.
# Paulito old cudnn interface support.
# update the rest of the documentation.
# vgg: change default padding for conv4 to be (n-1)/2
# Davide push/vcat bug.
2016-10-05 dyuret <[email protected]>
* DONE:
# write paper for nips
# document KnetArray in readme. finish the under the hood section.
# put function references in documentation.
# try to measure back functions one by one as well for gpu profiling.
# implement axpy! for KnetArray if it is worth it to get faster updates.
2016-10-03 Deniz Yuret <[email protected]>
* DONE:
# delete knet.ji in build script: no need.
# use Knet.dir set in Knet.jl.
# check if benchmarks keep minibatches in gpu.
# implement a install_extras command: no need instead we do following:
# automatic loading of packages by demos (use Pkg.add or introduce installExtras)
# implement vggnet demo.
# housing.jl bias fails gradcheck (because of Float32)
# check warnings in cpu-only knet.
# change charlm default winit.
2016-09-30 Deniz Yuret <[email protected]>
* DONE:
# use Float32 in housing by default.
# charlm can transfer all minibatches to gpu before timing (slows down due to gc)
# reimplement lenet using loop.
# charlm: profile speed, add dropout, nlayer.
2016-09-20 Deniz Yuret <[email protected]>
* DONE:
# broadcasting KnetArray bug when array size 1x1.
# extend readme with examples. revise intro. publish intro on blog. use the presentation.
2016-09-18 Deniz Yuret <[email protected]>
* DONE:
# new amazon aws image
# finish cudnn, curand etc. if possible eliminate dependence on them.
# logp should take a second argument like sum.
2016-09-16 Deniz Yuret <[email protected]>
* DONE:
# lenet: move cudnn calls to cuda44.jl
# karray: checkbounds without abstractarray.
# document KnetArray inline.
# remove importall Base.
# implement more efficient lstm making sure getindex does a view (transpose?)
# citation
# write examples/README
# charlm fails on cpu. change its default data file to something smaller. small files fail gradcheck.
# test gpu/cpu, osx/linux, 0.4, 0.5, 0.6. (currently failing on Travis).
# clean up old tags and register Knet
* Release:
Eliminated old tags. Just tag one 0.7 and one 0.8 version.
For each tagged version:
- minimize REQUIRE
- test on v0.4 v0.5 v0.6
- test on cpu vs gpu
julia 0.4 knet 0.7 cpu: ok
julia 0.4 knet 0.7 gpu: fail: mnist4d,copyseq
julia 0.4 knet 0.8 cpu: ok
julia 0.4 knet 0.8 gpu: ok
julia 0.4 autograd cpu: ok
julia 0.4 autograd gpu: ok
julia 0.5 knet 0.7 cpu: error
julia 0.5 knet 0.7 gpu: error
julia 0.5 knet 0.8 cpu: ok
julia 0.5 knet 0.8 gpu: ok
julia 0.5 autograd cpu: ok
julia 0.5 autograd gpu: ok
julia 0.6 knet 0.7 cpu: error
julia 0.6 knet 0.7 gpu: error
julia 0.6 knet 0.8 cpu: ok
julia 0.6 knet 0.8 gpu: ok
julia 0.6 autograd cpu: error, wip compat0.6, 0-dim array problem
julia 0.6 autograd gpu: error, wip compat0.6
julia 0.6 knet 0.8 cpu:
using Knet
WARNING: log{T <: Number}(x::AbstractArray{T}) is deprecated, use log.(x) instead.
WARNING: cos{T <: Number}(x::AbstractArray{T}) is deprecated, use cos.(x) instead.
WARNING: abs2{T <: Number}(x::AbstractArray{T}) is deprecated, use abs2.(x) instead.
Tests pass.
julia 0.6 autograd master cpu:
using AutoGrad: ok
Pkg.test("AutoGrad")
WARNING: max{T1 <: Real}(x::Real,y::AbstractArray{T1}) is deprecated, use max.(x,y) instead.
WARNING: abs{T <: Number}(x::AbstractArray{T}) is deprecated, use abs.(x) instead.
This is a common error I need to fix for other Julia versions as well:
WARNING: (AutoGrad.ungetindex,0.18437601415789295,[0.228639,0.085112],(2,),"MethodError(convert,(Array{Float64,N},OH416_A660_2_(2,)_0.18437601415789295))")
ERROR: LoadError: MethodError: no method matching erfinv(::Array{Float64,1})
julia 0.4 knet 0.7 gpu: fails mnist4d and copyseq
julia 0.4 knet 0.7 cpu: should add cpu conv test but they pass.
julia 0.5 knet 0.7 cpu:
using Knet
WARNING: Method definition randn!(Base.Random.AbstractRNG, AbstractArray{#T<:Any, N<:Any}) in module Random at random.jl:1207 overwritten in module Knet at /mnt/ai/home/dyuret/.julia/v0.5/Knet/src/util/array.jl:36.
WARNING: could not import Base.lastidx into LegacyStrings
WARNING: Base.writemime is deprecated. likely near /mnt/ai/home/dyuret/.julia/v0.5/Knet/src/net.jl:186
WARNING: deprecated syntax "[a=>b for (a,b) in c]". Use "Dict(a=>b for (a,b) in c)" instead.
Pkg.test("Knet")
WARNING: symbol is deprecated, use Symbol instead./mnt/ai/home/dyuret/.julia/v0.5/Knet/examples/linreg.jl:25
WARNING: Knet.Kfun.(:wdot) is deprecated; use Knet.Kfun.:wdot or getfield(Knet.Kfun, :wdot) instead./mnt/ai/home/dyuret/.julia/v0.5/Knet/src/compiler.jl:33
ERROR: expecting assignment expression got # /mnt/ai/home/dyuret/.julia/v0.5/Knet/src/kfun.jl, line 43:
in _comp(::Expr, ::Dict{Symbol,Symbol}, ::Dict{Symbol,Any}, ::Expr) at /mnt/ai/home/dyuret/.julia/v0.5/Knet/src/compiler.jl:97
julia 0.6 knet 0.7 cpu: (similar to julia 0.5)
using Knet
WARNING: Method definition randn!(Base.Random.AbstractRNG, AbstractArray{#T<:Any, N<:Any}) in module Random at random.jl:1281 overwritten in module Knet at /mnt/ai/home/dyuret/.julia/v0.6/Knet/src/util/array.jl:36.
WARNING: could not import Base.lastidx into LegacyStrings
WARNING: Base.writemime is deprecated. likely near /mnt/ai/home/dyuret/.julia/v0.6/Knet/src/net.jl:186
WARNING: deprecated syntax "[a=>b for (a,b) in c]".Use "Dict(a=>b for (a,b) in c)" instead.
Pkg.test("Knet")
WARNING: Method definition randn!(Base.Random.AbstractRNG, AbstractArray{#T<:Any, N<:Any}) in module Random at random.jl:1281 overwritten in module Knet at /mnt/ai/home/dyuret/.julia/v0.6/Knet/src/util/array.jl:36.
WARNING: symbol is deprecated, use Symbol instead.
WARNING: Knet.Kfun.(:wdot) is deprecated; use Knet.Kfun.:wdot or getfield(Knet.Kfun, :wdot) instead.
ERROR: expecting assignment expression got # /mnt/ai/home/dyuret/.julia/v0.6/Knet/src/kfun.jl, line 43:
in _comp(::Expr, ::Dict{Symbol,Symbol}, ::Dict{Symbol,Any}, ::Expr) at /mnt/ai/home/dyuret/.julia/v0.6/Knet/src/compiler.jl:97
julia 0.5 knet 0.7 gpu:
WARNING: Base.SparseMatrix is deprecated. (in CUSPARSE)
WARNING: Method definition (::Type{Knet._CudaArray})(CUDArt.CudaArray{#T<:Any, #N<:Any}) in module Knet at /state/partition1/dyuret/knet/publish/v0.5/Knet/src/util/cudart.jl:127 overwritten at /state/partition1/dyuret/knet/publish/v0.5/Knet/src/util/cudart.jl:128.
WARNING: Base.writemime is deprecated. likely near /state/partition1/dyuret/knet/publish/v0.5/Knet/src/util/cudart.jl:133
ERROR: LoadError: LoadError: LoadError: UndefVarError: TopNode not defined /state/partition1/dyuret/knet/publish/v0.5/Knet/src/util/deepcopy.jl, in expression starting on line 21
WARNING: Method definition randn!(Base.Random.AbstractRNG, AbstractArray{#T<:Any, N<:Any}) in module Random at random.jl:1207 overwritten in module Knet at /state/partition1/dyuret/knet/publish/v0.5/Knet/src/util/array.jl:36.
WARNING: deprecated syntax "[a=>b for (a,b) in c]". Use "Dict(a=>b for (a,b) in c)" instead.
WARNING: could not import Test.default_handler into Main
WARNING: could not import Test.Success into Main
WARNING: could not import Test.Failure into Main
ERROR: LoadError: LoadError: UndefVarError: Success not defined
WARNING: symbol is deprecated, use Symbol instead. /state/partition1/dyuret/knet/publish/v0.5/Knet/examples/linreg.jl:25
WARNING: Knet.Kfun.(:wdot) is deprecated; use Knet.Kfun.:wdot or getfield(Knet.Kfun, :wdot) instead. /state/partition1/dyuret/knet/publish/v0.5/Knet/src/compiler.jl:33
catastrophic failure of tests.
2016-09-14 Deniz Yuret <[email protected]>
* DONE:
# lenet broken: chased bug down to KnetArray <: AbstractArray, what do we inherit?
# charlm: slowed down
# charlm: experiment with other forms of lstm
# try cpu sum and vcat see if better.
# check multi-gpu support on KnetArrays: can we copy, free, etc with a non-active device?
# charlm: optimize params
# 0.7: upper limit julia 0.5, remove downloading mnist from runtests.
# 0.7 has tests failing on Julia 0.4 and errors on Julia 0.5.
2016-09-13 Deniz Yuret <[email protected]>
* DONE:
## support cat and sub with KnetArray.
2016-09-12 Deniz Yuret <[email protected]>
* DONE:
# mnist2d: sampling gradcheck
## need a good gradcheck to make sure, other TODO items in charlm.
# mnist2d: implement efficient softmax -- seems ok for now.
# load/save using JLD: need to write KnetArray handlers.
## turn gpu on by default if exists.
# eliminate dependence on CUDArt.
## support multiple gpus with KnetArrays
# need an efficient softmax, logsumexp.
## test on mnist --fast:
## cpu before: 2.75 gpu before: 2.25
## cpu after : 2.75 gpu after : 2.10
## test on lenet --fast:
## gpu before: 10.15 gpu after: 9.84 (mostly because of relu)
## after using 10^8 for knetgc limit: 9.30
## test on charlm with 10k lines of shakespeare (one epoch time) (10,2.419187890389808):
## compare with 10.6 secs/epoch for train in Knet7:
## gpu before: test: 2.32 train: 6.58
## gpu after : test: 2.33 train: 6.30
## bitarrays : test: 2.15 train: 6.20
## fixes : test: 2.16 train: 6.14
## charlm:
# loss results do not match Knet7 because of epsbump
# cpu=gpu tested but knet7=knet8 only forward up to epsbump, knet8 dont have keepstate or gclip
## implemented maximum but AutoGrad.ungetindex failing tests
2016-09-10 Deniz Yuret <[email protected]>
* DONE:
# charlm keepstate problem: state has Values so gc does not work unless we getval or reinit state between iterations!
# charlm loss problem: knet7!=knet8 because of epsbump.
2016-09-06 Deniz Yuret <[email protected]>
* DONE:
# mnist2d: cpu support, in general for all of Knet; test with v0.4 and v0.5
# mnist2d: cpu/gpu, Float16,32,64 options
# mnist2d: update documentation
# put info about additional packages for examples and gpu in README
# mnist4d
# gc problem.
* gc-problem: mnist4d explodes memory.
2016-09-04 Deniz Yuret <[email protected]>
* DONE:
# mnist2d: add hidden option
# mnist2d: comparison with knet7
2016-08-29 Deniz Yuret <[email protected]>
* DONE:
# profile AutoGrad more, figure out memory problem, good look at unbroadcast. memory use with/without forw. closures?
# make AutoGrad completely generic?
# tmpfree is dangerous, user visible variables from a=b*c may be overwritten!
# figure out why forw adds 0.38
# determine and minimize autograd overhead.
2016-08-28 Deniz Yuret <[email protected]>
* DONE:
# move cuda -> src, src -> src7
# should we introduce our own CudaArray type?
# define KnetArray. (KA) use instead of CudaArray.
# test cuda version compare with af.
# we could just support 0,1,2 dims or 0,1,N dims for map and reduce.
# finish cuda2arg, cublas
# need to write cuda21 vector reductions before full AutoGrad test.
* ArrayFire: gave up on it, my code is faster:
# can we get AF kernels to dump out?
# try c++ ArrayFire mnist example?
# will arrayfire memory management still work with rnns?
# eventually look at arrayfire convolution.
# thrust and tensorflow also opensource.
* JIT: could look at this if libknet8 gets too big:
# JIT compile kernels as needed: https://blog.maleadt.net/2015/01/15/julia-cuda/
## http://docs.nvidia.com/cuda/nvrtc/index.html#basic-usage
## http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#mathematical-functions-appendix
2016-08-27 Deniz Yuret <[email protected]>
* cuda: timing tests on 100K repetitions on 1000x100 output array with simple ops:
BLK=256,THR=256 for all (except reductions cuda20,cuda21 use 128,128)
F32 F64 where
0.73 1.40 cuda1 (unary)
0.73 1.40 cuda01 (scalar,array)
0.80 2.06 cuda11 (same size array)
0.86 2.08 cuda12 (same size array, broadcasting)
broadcasting (cuda12):
F32 F64
mat+mat 0.86 mat+mat 2.08