We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training VAE with default params seems unstable. Logs:
train: epoch 0, step 200, nll 150.3577, klw: 0.1153, KL 0.2740, rc 150.3278, log_ppl 6.7723, ppl 873.2936, time elapsed: 19.7s train: epoch 0, step 400, nll 148.7724, klw: 0.1305, KL 1.0032, rc 148.6502, log_ppl 6.6977, ppl 810.5463, time elapsed: 36.8s train: epoch 0, step 600, nll 148.2995, klw: 0.1457, KL 1.5545, rc 148.0989, log_ppl 6.6933, ppl 806.9564, time elapsed: 53.9s train: epoch 0, step 800, nll 147.0893, klw: 0.1609, KL 1.3469, rc 146.9111, log_ppl 6.6390, ppl 764.3558, time elapsed: 70.9s train: epoch 0, step 1000, nll nan, klw: 0.1761, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 87.8s train: epoch 0, step 1200, nll nan, klw: 0.1914, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 104.0s train: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan valid: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan test: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan train: epoch 1, step 0, nll nan, klw: 0.2002, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 0.2s train: epoch 1, step 200, nll nan, klw: 0.2154, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 15.5s train: epoch 1, step 400, nll nan, klw: 0.2306, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 32.6s train: epoch 1, step 600, nll nan, klw: 0.2458, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 49.4s train: epoch 1, step 800, nll nan, klw: 0.2610, KL nan, rc nan, log_ppl nan, ppl nan, time elapsed: 64.2s
Logs from vae-train example from texar-pytorch for reference:
vae-train
train: epoch 0, step 0, nll 202.0137, klw 0.1002, KL 0.0218, rc 202.0115, log_ppl 9.2349, ppl 10248.7397, time_cost 0.6 train: epoch 0, step 200, nll 145.4623, klw 0.1154, KL 1.2449, rc 145.3253, log_ppl 6.5687, ppl 712.4463, time_cost 23.2 train: epoch 0, step 400, nll 139.3877, klw 0.1306, KL 1.8096, rc 139.1726, log_ppl 6.3007, ppl 544.9752, time_cost 45.7 train: epoch 0, step 600, nll 135.5956, klw 0.1458, KL 2.1700, rc 135.3190, log_ppl 6.1319, ppl 460.2903, time_cost 68.1 train: epoch 0, step 800, nll 133.1483, klw 0.1610, KL 2.3624, rc 132.8281, log_ppl 6.0154, ppl 409.6862, time_cost 90.7 train: epoch 0, step 1000, nll 130.8279, klw 0.1762, KL 2.4912, rc 130.4704, log_ppl 5.9178, ppl 371.5942, time_cost 112.9 train: epoch 0, step 1200, nll 129.0042, klw 0.1914, KL 2.5828, rc 128.6131, log_ppl 5.8383, ppl 343.1805, time_cost 134.9 train: epoch 0, nll 128.1585, KL 2.6265, rc 127.7491, log_ppl 5.7997, ppl 330.2135 valid: epoch 0, nll 119.1858, KL 2.9482, rc 116.2376, log_ppl 5.4454, ppl 231.7005 test: epoch 0, nll 118.1185, KL 2.8654, rc 115.2531, log_ppl 5.3893, ppl 219.0593 train: epoch 1, step 0, nll 117.8860, klw 0.2003, KL 3.2506, rc 117.2353, log_ppl 5.1465, ppl 171.8215, time_cost 0.1 train: epoch 1, step 200, nll 117.5880, klw 0.2155, KL 3.3343, rc 116.8944, log_ppl 5.2860, ppl 197.5439, time_cost 22.2 train: epoch 1, step 400, nll 115.9266, klw 0.2307, KL 3.5084, rc 115.1690, log_ppl 5.2572, ppl 191.9527, time_cost 44.4 train: epoch 1, step 600, nll 115.5976, klw 0.2459, KL 3.6699, rc 114.7753, log_ppl 5.2438, ppl 189.3889, time_cost 66.2 train: epoch 1, step 800, nll 115.2580, klw 0.2611, KL 3.8097, rc 114.3733, log_ppl 5.2201, ppl 184.9485, time_cost 88.1 train: epoch 1, step 1000, nll 114.8173, klw 0.2763, KL 3.9137, rc 113.8769, log_ppl 5.1968, ppl 180.7011, time_cost 109.8 train: epoch 1, step 1200, nll 114.5588, klw 0.2915, KL 3.9752, rc 113.5725, log_ppl 5.1819, ppl 178.0216, time_cost 130.7 train: epoch 1, nll 114.3053, KL 4.0000, rc 113.2953, log_ppl 5.1728, ppl 176.4117 valid: epoch 1, nll 113.7273, KL 4.2768, rc 109.4505, log_ppl 5.1961, ppl 180.5585 test: epoch 1, nll 112.7315, KL 4.1861, rc 108.5454, log_ppl 5.1436, ppl 171.3240 train: epoch 2, step 0, nll 107.0595, klw 0.3004, KL 4.1894, rc 105.8015, log_ppl 4.9940, ppl 147.5299, time_cost 0.1 train: epoch 2, step 200, nll 108.8126, klw 0.3156, KL 4.3093, rc 107.4859, log_ppl 4.9412, ppl 139.9346, time_cost 20.1```
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Training VAE with default params seems unstable.
Logs:
Logs from
vae-train
example from texar-pytorch for reference:The text was updated successfully, but these errors were encountered: