1. 程式人生 > >MonitoredTrainingSession指定is_chief之後,一直報:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initializ

MonitoredTrainingSession指定is_chief之後,一直報:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initializ

dev class lan variables red ini ecs fail var

MonitoredTrainingSession指定is_chief之後,報錯:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized

原因:還是因為MonitoredTrainingSession中沒有指定:master=server.target,添加之後就可以正常運行了。

with tf.train.MonitoredTrainingSession(
  master=server.target,
  is_chief=is_chief,
  checkpoint_dir=checkpoint_dir,
  save_checkpoint_secs=FLAGS.save_interval_secs,
  save_summaries_steps=100,
  save_summaries_secs=None,
  config=sess_config,
  hooks=hooks) as sess:

但是還會報一次tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized

此時,可以讓非worker 0 sleep 5秒

time.sleep(5)

參考: https://stackoverflow.com/questions/42397370/distributed-tensorflow-save-fails-no-device

MonitoredTrainingSession指定is_chief之後,一直報:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initializ