1. 程式人生 > >The bug about using hooks and MirroredStrategy in tf.estimator.Estimator

The bug about using hooks and MirroredStrategy in tf.estimator.Estimator

When I was using MirroedStrategy in my tf.estimator.Estimator:

Python
1234567 distribution=tf.contrib.distribute.MirroredStrategy(["/device:GPU:0","/device:GPU:1"])config=tf.estimator.RunConfig(train_distribute=distribution,eval_distribute=distribution)estimator=tf.estimator.Estimator(model_fn=build_model_fn_optimizer(),config=
config)estimator.train(input_fn=input_fn,steps=10)

and add hooks for training:

Python
12 logging_hook=tf.train.LoggingTensorHook({'logits':logits})returntf.estimator.EstimatorSpec(mode,loss=loss_fn(),train_op=train_op,training_hooks=[logging_hook])

The tensorflow report errors:

1234567891011 File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line356,intrainloss=self._train_model(input_fn,hooks,saving_listeners)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1179,in_train_modelreturnself._train_model_distributed(input_fn,hooks,saving_listeners)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1309,in_train_model_distributedgrouped_estimator_spec.training_hooks)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1305,inget_hooks_from_the_first_deviceforper_device_hook inper_device_hooksFile"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1305,in<listcomp>forper_device_hook inper_device_hooksAttributeError:'Estimator'objecthas no attribute'_distribution'

Without finding any answers on google, I have to look into the code of ‘estimator.py’ in tensorflow. Fortunately, the code defect is obvious:

Python
123456789101112 scaffold=_combine_distributed_scaffold(grouped_estimator_spec.scaffold,self._train_distribution)# TODO(yuefengz): add a test for unwrapping per_device_hooks.defget_hooks_from_the_first_device(per_device_hooks):return[self._distribution.unwrap(per_device_hook)[0]forper_device_hook inper_device_hooks]training_hooks=get_hooks_from_the_first_device(grouped_estimator_spec.training_hooks)

class Estimator havn’t any private argument named ‘_distribution’ but only have ‘_train_distribution’ and ‘_eval_distribution’. So the fix is just change ‘self._distribution.unwrap(per_device_hook)[0]’ to ‘self._train_distribution.unwrap(per_device_hook)[0]’.

I had submitted a request pull for tensorflow to fix this bug in branch 1.11