GPU is not used #64

mothguib · 2020-07-16T12:00:54Z

I set up Pegasus following the instructions in a Docker container with CUDA 10, but it seems that the GPU is not used, whether I run train.py or evaluate.py.

Commands run:

python3 pegasus/bin/train.py --params=aeslc_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 --model_dir=ckpt/pegasus_ckpt/aeslc

python3 pegasus/bin/evaluate.py --params=aeslc_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/aeslc

These programmes are run on my 16-core CPU but when I monitor my GPU with nvidia-smi it shows that the GPU is not used (utilisation of 0%).

My Python env:

asn1crypto==0.24.0
astor==0.8.1
attrs==19.3.0
bz2file==0.98
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
cloudpickle==1.3.0
cryptography==2.1.4
decorator==4.4.2
dill==0.3.2
dopamine-rl==3.0.1
Flask==1.1.2
future==0.18.2
gast==0.2.2
gevent==20.6.2
gin-config==0.3.0
google-api-core==1.21.0
google-api-python-client==1.10.0
google-auth==1.19.1
google-auth-httplib2==0.0.4
google-pasta==0.1.8
googleapis-common-protos==1.52.0
greenlet==0.4.16
grpcio==1.26.0
gunicorn==20.0.4
gym==0.17.2
h5py==2.10.0
httplib2==0.18.1
idna==2.6
itsdangerous==1.1.0
Jinja2==2.11.2
joblib==0.16.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==10.6.0
keyrings.alt==3.0
kfac==0.2.0
Markdown==3.1.1
MarkupSafe==1.1.1
mecab-python3==0.996.5
mesh-tensorflow==0.1.16
mock==4.0.2
mpmath==1.1.0
nltk==3.5
numpy==1.18.1
oauth2client==4.1.3
opencv-python==4.3.0.36
opt-einsum==3.1.0
Pillow==7.2.0
portalocker==1.7.0
promise==2.3
protobuf==3.11.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycrypto==2.6.1
pyglet==1.5.0
pygobject==3.26.1
pypng==0.0.20
python-apt==1.6.4
pytz==2020.1
pyxdg==0.25
regex==2020.7.14
requests==2.24.0
rouge-score==0.0.4
rsa==4.6
sacrebleu==1.4.12
scipy==1.5.1
SecretStorage==2.3.1
sentencepiece==0.1.91
six==1.12.0
sympy==1.6.1
tensor2tensor==1.15.0
tensorboard==1.15.0
tensorflow==1.15.3
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.2
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-probability==0.7.0
tensorflow-text==1.15.0rc0
termcolor==1.1.0
tf-slim==1.1.0
tfds-nightly==3.0.0.dev202004160105
tqdm==4.47.0
uritemplate==3.0.1
urllib3==1.25.9
Werkzeug==0.16.1
wrapt==1.11.2
zope.event==4.4
zope.interface==5.1.0```

The text was updated successfully, but these errors were encountered:

JingqingZ · 2020-07-16T13:19:56Z

Hi, one thing worth checking is if python actually imports tensorflow-gpu instead of tensorflow (CPU) when import tensorflow as tf. I notice you have different versions of tensorflow-gpu and tensorflow, which may help you check which package is actually imported by printing tf.__version__.

mothguib · 2020-07-16T13:22:45Z

tf.__version__ returns '1.15.2', tensorflow-gpu is therefore well imported.

JingqingZ · 2020-07-16T19:47:55Z

Hi, could you provide some printout from your terminal so that we can have a check if any error or warning in it?

mothguib · 2020-07-17T09:54:05Z

Hi, here are the logs:

WARNING:tensorflow:From pegasus/bin/train.py:94: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0717 09:45:40.229595 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:49: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0717 09:45:40.341865 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:49: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f55ad064400>) includes params argument, but params are not passed to Estimator.
W0717 09:45:40.342288 140007967700800 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f55ad064400>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/aeslc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f55ad065400>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0717 09:45:40.342869 140007967700800 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/aeslc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f55ad065400>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0717 09:45:40.343273 140007967700800 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0717 09:45:40.343642 140007967700800 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0717 09:45:40.352694 140007967700800 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0717 09:45:40.353030 140007967700800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-07-17 09:45:50.315351: W tensorflow/core/platform/cloud/google_auth_provider.cc:178] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
I0717 09:45:51.519790 140007967700800 dataset_info.py:427] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: aeslc/1.0.0
I0717 09:45:52.170372 140007967700800 dataset_info.py:358] Load dataset info from /tmp/tmp67upfey1tfds
I0717 09:45:52.171418 140007967700800 dataset_info.py:398] Field info.description from disk and from code do not match. Keeping the one from code.
I0717 09:45:52.171481 140007967700800 dataset_info.py:398] Field info.citation from disk and from code do not match. Keeping the one from code.
I0717 09:45:52.171828 140007967700800 dataset_builder.py:346] Generating dataset aeslc (/home/pegasus/tensorflow_datasets/aeslc/1.0.0)
Downloading and preparing dataset aeslc/1.0.0 (download: 11.10 MiB, generated: Unknown size, total: 11.10 MiB) to /home/pegasus/tensorflow_datasets/aeslc/1.0.0...
Dl Completed...: 0 url [00:00, ? url/s]          I0717 09:45:52.930225 140007967700800 download_manager.py:477] Downloading https://github.com/ryanzhumich/AESLC/archive/master.zip into /home/pegasus/tensorflow_datasets/downloads/ryanzhumich_AESLC_archive_masterACSpoxw627Ay4UrkswMeyz6RrOey8kKfkhEM4VySJWU.zip.tmp.c52f0c6613d4472baecd575aaed90f5e...
Dl Completed...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00,  9.60s/ url]
Extraction completed...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.69s/ file]
Extraction completed...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.68s/ file]
Dl Size...: 11 MiB [00:14,  1.34s/ MiB]

Dl Completed...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.69s/ url]
I0717 09:46:07.615057 140007967700800 dataset_builder.py:947] Generating split train
Shuffling and writing examples to /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-train.tfrecord
  0%|                                                                                                                                                | 0/14436 [00:00<?, ? examples/s]I0717 09:46:10.858341 140007967700800 tfrecords_writer.py:230] Done writing /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-train.tfrecord. Shard lengths: [14436]
I0717 09:46:10.860188 140007967700800 dataset_builder.py:947] Generating split validation                                                                                             
Shuffling and writing examples to /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-validation.tfrecord
  0%|                                                                                                                                                 | 0/1960 [00:00<?, ? examples/s]I0717 09:46:11.275590 140007967700800 tfrecords_writer.py:230] Done writing /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-validation.tfrecord. Shard lengths: [1960]
I0717 09:46:11.276207 140007967700800 dataset_builder.py:947] Generating split test                                                                                                   
Shuffling and writing examples to /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-test.tfrecord
  0%|                                                                                                                                                 | 0/1906 [00:00<?, ? examples/s]I0717 09:46:11.702754 140007967700800 tfrecords_writer.py:230] Done writing /home/pegasus/tensorflow_datasets/aeslc/1.0.0.incompleteCMITBV/aeslc-test.tfrecord. Shard lengths: [1906]
I0717 09:46:11.703861 140007967700800 dataset_builder.py:401] Skipping computing stats for mode ComputeStatsMode.AUTO.                                                                
Dataset aeslc downloaded and prepared to /home/pegasus/tensorflow_datasets/aeslc/1.0.0. Subsequent calls will reuse this data.
I0717 09:46:11.705291 140007967700800 dataset_builder.py:500] Constructing tf.data.Dataset for split train, from /home/pegasus/tensorflow_datasets/aeslc/1.0.0
I0717 09:46:11.973816 140007967700800 datasets.py:215] Number of examples for config aeslc train is 14436
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/ragged/ragged_tensor.py:1586: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0717 09:46:12.506592 140007967700800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/ragged/ragged_tensor.py:1586: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Calling model_fn.
I0717 09:46:12.680724 140007967700800 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running train on CPU
I0717 09:46:12.680862 140007967700800 tpu_estimator.py:3124] Running train on CPU
WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:78: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

W0717 09:46:12.681185 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:78: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/attention.py:41: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.

W0717 09:46:12.681609 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/layers/attention.py:41: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/embedding.py:57: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0717 09:46:12.715725 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/layers/embedding.py:57: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/embedding.py:57: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

W0717 09:46:12.715814 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/layers/embedding.py:57: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/embedding.py:61: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W0717 09:46:12.716027 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/layers/embedding.py:61: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/embedding.py:64: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0717 09:46:12.716216 140007967700800 deprecation.py:506] From /home/pegasus/pegasus/pegasus/layers/embedding.py:64: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/layers/attention.py:106: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.

W0717 09:46:14.478103 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/layers/attention.py:106: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/models/transformer.py:108: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.

W0717 09:46:17.322627 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/models/transformer.py:108: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:113: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.

W0717 09:46:17.351484 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:113: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:114: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.

W0717 09:46:17.351696 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:114: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:115: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0717 09:46:17.351840 140007967700800 deprecation.py:323] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:115: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:315: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W0717 09:46:17.356784 140007967700800 module_wrapper.py:139] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:315: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/quantization.py:147: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0717 09:46:17.359377 140007967700800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/quantization.py:147: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/quantization.py:156: The name tf.mod is deprecated. Please use tf.math.mod instead.

W0717 09:46:17.360769 140007967700800 module_wrapper.py:139] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/quantization.py:156: The name tf.mod is deprecated. Please use tf.math.mod instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:244: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

W0717 09:46:25.700572 140007967700800 module_wrapper.py:139] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:244: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:189: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W0717 09:46:33.827164 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:189: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:205: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

W0717 09:46:33.833476 140007967700800 module_wrapper.py:139] From /home/pegasus/pegasus/pegasus/params/estimator_utils.py:205: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

I0717 09:46:34.638834 140007967700800 estimator_utils.py:207] **** Trainable Variables ****
I0717 09:46:34.638930 140007967700800 estimator_utils.py:212]   name = embeddings/weights:0, shape = (96103, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.638993 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639040 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639110 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639156 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639247 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639290 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639355 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639397 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639439 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.639482 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.639523 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639566 140007967700800 estimator_utils.py:212]   name = encoder/layer_0/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639607 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639648 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639688 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639731 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639773 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639816 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.639858 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639900 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.639941 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.639983 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.640024 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640066 140007967700800 estimator_utils.py:212]   name = encoder/layer_1/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640107 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640148 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640189 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640232 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640274 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640317 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640360 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640400 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640441 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.640483 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.640524 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640566 140007967700800 estimator_utils.py:212]   name = encoder/layer_2/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640607 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640647 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640688 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640730 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640773 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640815 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.640858 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640899 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.640940 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.640982 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.641023 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641064 140007967700800 estimator_utils.py:212]   name = encoder/layer_3/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641105 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641146 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641186 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641228 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641270 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641311 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641354 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641395 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641435 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.641476 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.641517 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641559 140007967700800 estimator_utils.py:212]   name = encoder/layer_4/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641613 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641655 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641695 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641737 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641779 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641821 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.641864 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641905 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.641945 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.641986 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.642026 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642069 140007967700800 estimator_utils.py:212]   name = encoder/layer_5/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642109 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642148 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642188 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642229 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642270 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642311 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642352 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642393 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642432 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.642474 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.642514 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642556 140007967700800 estimator_utils.py:212]   name = encoder/layer_6/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642595 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642635 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642675 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642716 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642757 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642798 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.642840 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642879 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.642918 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.642959 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.642999 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643040 140007967700800 estimator_utils.py:212]   name = encoder/layer_7/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643079 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643119 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643158 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643199 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643240 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643282 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643323 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643363 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643402 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.643444 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.643483 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643524 140007967700800 estimator_utils.py:212]   name = encoder/layer_8/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643563 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643604 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643643 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643685 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643726 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643767 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.643809 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643848 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.643888 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.643929 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.643968 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644009 140007967700800 estimator_utils.py:212]   name = encoder/layer_9/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644049 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644090 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644130 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644171 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644214 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644257 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644299 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644340 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644381 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.644423 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.644463 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644505 140007967700800 estimator_utils.py:212]   name = encoder/layer_10/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644546 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644586 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644626 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644668 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644711 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644753 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.644795 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644836 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.644876 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.644918 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.644963 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645006 140007967700800 estimator_utils.py:212]   name = encoder/layer_11/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645046 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645087 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645128 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645169 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645211 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645254 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645297 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645338 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645379 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.645420 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.645461 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645502 140007967700800 estimator_utils.py:212]   name = encoder/layer_12/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645542 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645582 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645622 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645664 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645706 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645748 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645790 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645830 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.645870 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.645911 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.645952 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.645994 140007967700800 estimator_utils.py:212]   name = encoder/layer_13/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646034 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646075 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646115 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646156 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646198 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646239 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646281 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646321 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646360 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.646402 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.646442 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646484 140007967700800 estimator_utils.py:212]   name = encoder/layer_14/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646524 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646564 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646604 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646646 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646687 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646730 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646772 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646812 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.646852 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.646893 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.646933 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.646975 140007967700800 estimator_utils.py:212]   name = encoder/layer_15/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647015 140007967700800 estimator_utils.py:212]   name = encoder/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647056 140007967700800 estimator_utils.py:212]   name = encoder/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647096 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647135 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647176 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647218 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647260 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647302 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647344 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647384 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647424 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647465 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647506 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647548 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647590 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647631 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647671 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.647713 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.647753 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647795 140007967700800 estimator_utils.py:212]   name = decoder/layer_0/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647835 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647876 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.647916 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.647958 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648000 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648042 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648084 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648125 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648165 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648207 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648249 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648296 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648339 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648379 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648419 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.648461 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.648501 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648542 140007967700800 estimator_utils.py:212]   name = decoder/layer_1/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648583 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648623 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648663 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648705 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648746 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648788 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648831 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648871 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.648911 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648953 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.648995 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649037 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649079 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649119 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649159 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.649201 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.649241 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649283 140007967700800 estimator_utils.py:212]   name = decoder/layer_2/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649323 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649364 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649404 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649446 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649488 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649530 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649572 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649613 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649653 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649695 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649737 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649779 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.649822 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649863 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.649903 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.649944 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.649984 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650026 140007967700800 estimator_utils.py:212]   name = decoder/layer_3/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650066 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650106 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650146 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650188 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650230 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650273 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650315 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650356 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650396 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650439 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650480 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650522 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650564 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650605 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650645 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.650687 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.650728 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650770 140007967700800 estimator_utils.py:212]   name = decoder/layer_4/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650810 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650851 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.650891 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650932 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.650974 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651017 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651059 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651099 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651139 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651182 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651223 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651265 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651308 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651348 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651388 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.651429 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.651469 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651512 140007967700800 estimator_utils.py:212]   name = decoder/layer_5/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651551 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651592 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651636 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651678 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651721 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651763 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651805 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651846 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.651886 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651929 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.651970 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652012 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652055 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652095 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652135 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.652177 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.652218 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652260 140007967700800 estimator_utils.py:212]   name = decoder/layer_6/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652300 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652340 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652381 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652423 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652466 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652508 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652551 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652592 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652632 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652674 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652716 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652758 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.652800 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652839 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.652879 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.652921 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.652961 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653003 140007967700800 estimator_utils.py:212]   name = decoder/layer_7/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653043 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653083 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653122 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653164 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653207 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653249 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653290 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653330 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653370 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653412 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653454 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653496 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653537 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653578 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653618 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.653659 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.653700 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653741 140007967700800 estimator_utils.py:212]   name = decoder/layer_8/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653782 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653822 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.653862 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653904 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653946 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.653988 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654030 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654071 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654110 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654152 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654194 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654237 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654279 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654320 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654360 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.654402 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.654442 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654484 140007967700800 estimator_utils.py:212]   name = decoder/layer_9/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654525 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654565 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654605 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654646 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654688 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654730 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654773 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654813 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.654854 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654896 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654942 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.654986 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655029 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655069 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655109 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.655151 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.655191 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655233 140007967700800 estimator_utils.py:212]   name = decoder/layer_10/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655273 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655314 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655355 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655398 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655440 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655484 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655526 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655567 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655608 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655650 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655692 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655735 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655777 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655817 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.655858 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.655900 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.655940 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.655982 140007967700800 estimator_utils.py:212]   name = decoder/layer_11/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656022 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656062 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656103 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656145 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656187 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656229 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656271 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656311 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656351 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656393 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656435 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656477 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656519 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656558 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656599 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.656641 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.656682 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656723 140007967700800 estimator_utils.py:212]   name = decoder/layer_12/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656764 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656804 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.656844 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656885 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656927 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.656970 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657011 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657051 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657092 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657133 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657175 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657217 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657259 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657299 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657340 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.657382 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.657423 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657464 140007967700800 estimator_utils.py:212]   name = decoder/layer_13/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657505 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657544 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657585 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657627 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657669 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657711 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657754 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657794 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.657834 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657876 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657917 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.657959 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658002 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658043 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658083 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.658124 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.658165 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658207 140007967700800 estimator_utils.py:212]   name = decoder/layer_14/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658247 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658292 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658332 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658375 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658416 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658459 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/self_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658501 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658542 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658583 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/q_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658625 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/k_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658667 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/v_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658710 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/memory_attention/output_proj/kernel:0, shape = (1024, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658752 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658793 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658833 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/dense/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0717 09:46:34.658875 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/dense/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0717 09:46:34.658916 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/dense_1/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0717 09:46:34.658958 140007967700800 estimator_utils.py:212]   name = decoder/layer_15/ffn/dense_1/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.658999 140007967700800 estimator_utils.py:212]   name = decoder/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0717 09:46:34.659040 140007967700800 estimator_utils.py:212]   name = decoder/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:Done calling model_fn.
I0717 09:46:34.665250 140007967700800 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0717 09:46:34.666114 140007967700800 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0717 09:46:39.734592 140007967700800 monitored_session.py:240] Graph was finalized.
2020-07-17 09:46:39.734843: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-17 09:46:39.751882: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3601000000 Hz
2020-07-17 09:46:39.752496: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x651d8c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-17 09:46:39.752523: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:tensorflow:Restoring parameters from ckpt/pegasus_ckpt/aeslc/model.ckpt-0
I0717 09:46:39.753648 140007967700800 saver.py:1284] Restoring parameters from ckpt/pegasus_ckpt/aeslc/model.ckpt-0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W0717 09:46:42.819490 140007967700800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I0717 09:46:43.995445 140007967700800 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0717 09:46:44.636074 140007967700800 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ckpt/pegasus_ckpt/aeslc/model.ckpt.
I0717 09:47:01.281232 140007967700800 basic_session_run_hooks.py:606] Saving checkpoints for 0 into ckpt/pegasus_ckpt/aeslc/model.ckpt.
INFO:tensorflow:global_step/sec: 0.0172225
I0717 09:49:06.726892 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0172225
INFO:tensorflow:examples/sec: 0.13778
I0717 09:49:06.727284 140007967700800 tpu_estimator.py:2308] examples/sec: 0.13778
INFO:tensorflow:global_step/sec: 0.0401332
I0717 09:49:31.643851 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0401332
INFO:tensorflow:examples/sec: 0.321066
I0717 09:49:31.643971 140007967700800 tpu_estimator.py:2308] examples/sec: 0.321066
INFO:tensorflow:global_step/sec: 0.0397918
I0717 09:49:56.774654 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0397918
INFO:tensorflow:examples/sec: 0.318334
I0717 09:49:56.774774 140007967700800 tpu_estimator.py:2308] examples/sec: 0.318334
INFO:tensorflow:global_step/sec: 0.0393763
I0717 09:50:22.170644 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0393763
INFO:tensorflow:examples/sec: 0.31501
I0717 09:50:22.170766 140007967700800 tpu_estimator.py:2308] examples/sec: 0.31501
INFO:tensorflow:global_step/sec: 0.0383158
I0717 09:50:48.269510 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0383158
INFO:tensorflow:examples/sec: 0.306527
I0717 09:50:48.269627 140007967700800 tpu_estimator.py:2308] examples/sec: 0.306527
INFO:tensorflow:global_step/sec: 0.0386849
I0717 09:51:14.119487 140007967700800 tpu_estimator.py:2307] global_step/sec: 0.0386849
INFO:tensorflow:examples/sec: 0.30948
I0717 09:51:14.120115 140007967700800 tpu_estimator.py:2308] examples/sec: 0.30948

We can see that it runs the training on the CPU:

I0717 09:46:12.680724 140007967700800 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running train on CPU
I0717 09:46:12.680862 140007967700800 tpu_estimator.py:3124] Running train on CPU```

JingqingZ · 2020-07-17T11:15:36Z

Hi, I am sorry this problem looks novel to me and I am not quite sure where the error actually is. It seems tensorflow didn't locate (or try to locate) any GPU resources.

akashjaswal · 2020-07-17T16:59:55Z

@mothguib - I'm facing a similar issue as well. Could you try this out to see if there is an issue with the CUDA drivers itself and if that could be the reason why tensorflow couldnt find the GPU devices as mentioned above:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("GPUs: ", tf.config.experimental.list_physical_devices('GPU'))

mothguib · 2020-07-19T17:25:34Z

@akashjaswal - It seems effectively this issue comes from the CUDA driver itself, my GPU is not detected:

pegasus@pegasus-run:/src$ python
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Num GPUs Available:  0
>>> print("GPUs: ", tf.config.experimental.list_physical_devices('GPU'))
GPUs:  []`

mothguib · 2020-08-01T17:10:12Z

I finally solved the problem. It's somewhat a mystery for me, but when using Docker image tensorflow/tensorflow:1.15.2-gpu under Python, the GPUs are only detected either by root with the native Python interpreter, or by any user with Anaconda's Python interpreter, but not by a non-root user with the native one. I made a pull request to propose a Docker image that enables one to run Pegasus directly with all needed packages and materials installed: #70.

jasonwu0731 · 2020-09-03T18:23:18Z

I faced the same issue and fixed it by this way:

I am using this GCP image nvcr.io/nvidia/tensorflow:20.08-tf1-py3, which has tensorflow-gpu==1.15.3. At this stage, when I call tf.test.is_gpu_available it returns True. If I directly run pip3 install -r requirements.txt, it will have conflict with tensorflowand tensor2tensor library so that tf.test.is_gpu_available will return False.

I found that if you do pip install tensor2tensor==1.15.0 it will automatically install tensorflow>2.0. So what I did is just download the tenser2tensor library from source. Also, you may need to also have pip install tensorflow-probability==0.8.0 because the 0.11.0 version has dependancy on tf>2.0.

cindycandy · 2020-10-04T03:13:17Z

I finally solved the problem. It's somewhat a mystery for me, but when using Docker image tensorflow/tensorflow:1.15.2-gpu under Python, the GPUs are only detected either by root with the native Python interpreter, or by any user with Anaconda's Python interpreter, but not by a non-root user with the native one. I made a pull request to propose a Docker image that enables one to run Pegasus directly with all needed packages and materials installed: #70.

hello, I've met the same trouble like yours, I also use docker tensorflow-gpu == 1.15.2, but I don't understand your solustion. Can you explain it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU is not used #64

GPU is not used #64

mothguib commented Jul 16, 2020 •

edited

Loading

JingqingZ commented Jul 16, 2020

mothguib commented Jul 16, 2020 •

edited

Loading

JingqingZ commented Jul 16, 2020

mothguib commented Jul 17, 2020

JingqingZ commented Jul 17, 2020

akashjaswal commented Jul 17, 2020 •

edited

Loading

mothguib commented Jul 19, 2020

mothguib commented Aug 1, 2020

jasonwu0731 commented Sep 3, 2020

cindycandy commented Oct 4, 2020

GPU is not used #64

GPU is not used #64

Comments

mothguib commented Jul 16, 2020 • edited Loading

JingqingZ commented Jul 16, 2020

mothguib commented Jul 16, 2020 • edited Loading

JingqingZ commented Jul 16, 2020

mothguib commented Jul 17, 2020

JingqingZ commented Jul 17, 2020

akashjaswal commented Jul 17, 2020 • edited Loading

mothguib commented Jul 19, 2020

mothguib commented Aug 1, 2020

jasonwu0731 commented Sep 3, 2020

cindycandy commented Oct 4, 2020

mothguib commented Jul 16, 2020 •

edited

Loading

mothguib commented Jul 16, 2020 •

edited

Loading

akashjaswal commented Jul 17, 2020 •

edited

Loading