jiangoforit / yellowfin Goto Github PK
View Code? Open in Web Editor NEWauto-tuning momentum SGD optimizer
License: Apache License 2.0
auto-tuning momentum SGD optimizer
License: Apache License 2.0
Just gave yellowfin a try yesterday and it works nicely! Just a minor comment/suggestion:
what do you think about renaming lr
to learning_rate
for consistency with other tensorflow optimizers. Can open a small PR
self.opt_q = YFOptimizer().minimize(self.vae_discriminator_loss, var_list=q_vars)
File "xxx\src\yellowfin.py", line 215, in apply_gradients
after_apply_op = self.after_apply()
File "xxx\src\yellowfin.py", line 139, in after_apply
self._grad_squared.append(tf.square(g) )
File "C:\Miniconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 412, in square
return gen_math_ops.square(x, name=name)
File "C:\Miniconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2585, in square
result = _op_def_lib.apply_op("Square", x=x, name=name)
File "C:\Miniconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 509, in apply_op
(input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.
PS xxx>
If I switch back to Adam, it works fine. Not sure what is up.
It would be interesting to try out YFOptimizer
but it's too tricky to install the package at the moment. Is a PyPI release in the works so we can do pip install yellowfin
?
The mu_update_interval
parameters is not used by the optimizer, is it declared for future works? Maybe to schedule the mu during training?
I have tried to replace the optimizer with YellowFin in cifar10 in tensorflow tutorials, but it did not perform well, much worse than the original decay sgd.
The origin code is :
with tf.control_dependencies([loss_averages_op]): opt = tf.train.GradientDescentOptimizer(lr) grads = opt.compute_gradients(total_loss) # Apply gradients. apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
My code is:
with tf.control_dependencies([loss_averages_op]): opt = YFOptimizer(lr=1.0, mu=0.0) # opt = tf.train.GradientDescentOptimizer(learning_rate=0.01) grads = opt.compute_gradients(total_loss) apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
I simply copied the yellowfin.py from Zehaos's yellowfin.py, which added compute_gradients function.
Did I miss something?
Hello,
I want to use YF in my own code. So, first I am trying to run yellowfin_test.py but it gave me back AssertionError in line 88 of the code. Any help is appreciated!
As seen in Zehaos/MobileNet#27 -- the global step does not update after each training step has been taken. Is there a fix to this coming up soon? I have tried both the older version of yellowfin.py
in that issue and also the latest one available. In both instances, the global step doesn't update.
I believe the issue comes from the global variable existing only within the optimizer but not globally. As a quick fix, I moved the definition of the global step (at https://github.com/JianGoForIt/YellowFin/blob/master/tuner_utils/yellowfin.py#L60) out of the optimizer and directly in the graph, before feeding in this variable back to the optimizer.
Is there a cleaner solution to this?
Error is thrown during the after_apply operation in the yellowfin class. My suggestion is to screen for Nones during the apply_gradients operation. This is similar to the apply_gradients operation in the official repo:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/optimizer.py#L426
I used Yellowfin to train Resnet50 on ImageNet using 4 k80 GPUs and got bad performance. After 50k steps, the training loss was about 6, while the SGD without momentum and learning rate decay got only about 4.7. Any idea with this phenomenon?
how to use it for usual logistic regression?
did you tested on simple logistic regression task to prove that your code is better ?
Below is a simple piece of code to try YellowFin on my dataset.
x = tf.placeholder( tf.float32, [ None, train_x.shape[ 1 ] ] )
y = tf.placeholder( tf.float32, [ None, train_y.shape[ 1 ] ] )
m = tf.layers.dense( x, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, train_y.shape[ 1 ] )
prediction = tf.nn.softmax( m )
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( labels=y, logits=m ) )
optimizer = yellowfin.YFOptimizer().minimize( loss )
s = tf.Session()
s.run( tf.global_variables_initializer() )
for epoch in range( epochs ):
_, h = s.run( [ optimizer, loss ], feed_dict={ x: train_x, y: train_y } )
Usually, it crashes and throws the following exception.
Caused by op 'update_hyper/cond/PyFuncStateless', defined at:
File "test2.py", line 47, in <module>
optimizer = yf.YFOptimizer( learning_rate=1., momentum=0. ).minimize( loss )
File "/data/python-mp-test/libs/yellowfin.py", line 268, in minimize
return self.apply_gradients(grads_and_vars)
File "/data/python-mp-test/libs/yellowfin.py", line 223, in apply_gradients
update_hyper_op = self.update_hyper_param()
File "/data/python-mp-test/libs/yellowfin.py", line 191, in update_hyper_param
lambda: self._mu_var) )
File "/usr/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1689, in BuildCondBranch
original_result = fn()
File "/data/python-mp-test/libs/yellowfin.py", line 190, in <lambda>
self._mu = tf.identity(tf.cond(self._do_tune, lambda: self.get_mu_tensor(),
File "/data/python-mp-test/libs/yellowfin.py", line 173, in get_mu_tensor
roots = tf.py_func(np.roots, [coef], Tout=tf.complex64, stateful=False)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 201, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 56, in _py_func_stateless
Tout=Tout, name=name)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
UnknownError (see above for traceback): LinAlgError: Array must not contain infs or NaNs
[[Node: update_hyper/cond/PyFuncStateless = PyFuncStateless[Tin=[DT_FLOAT], Tout=[DT_COMPLEX64], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](update_hyper/cond/ScatterUpdate)]]
can you update it for new version of TF
I am trying to adapt YellowFin to be usable as optimizer in tensor2tensor(it's use tensorflow>=1.2.0rc1) but unfortunately i cannot debug this error:
starter.sh
script (inside a Docker container is better).nvidia-docker run -it -v $(pwd):/t2t -p 6006:6006 -w /t2t tensorflow/tensorflow:latest-devel-gpu
.Using YellowFin
INFO:tensorflow:Computing gradients for global model_fn.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Operation'>):
<tf.Operation 'training/update_hyper/cond/assert_equal/Assert/Assert' type=Assert>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "/usr/local/bin/t2t-trainer", line 6, in <module>\n exec(compile(open(__file__).read(), __file__, \'exec\'))', 'File "/t2t/tensor2tensor/bin/t2t-trainer", line 83, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "/t2t/tensor2tensor/bin/t2t-trainer", line 79, in main\n schedule=FLAGS.schedule)', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 247, in run\n run_locally(exp_fn(output_dir))', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 537, in run_locally\n exp.train_and_evaluate()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate\n self.train(delay_secs=0)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train\n hooks=self._train_monitors + extra_hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train\n monitors=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit\n loss = self._train_model(input_fn=input_fn, hooks=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model\n model_fn_ops = self._get_train_ops(features, labels)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops\n return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn\n model_fn_results = self._model_fn(features, labels, **kwargs)', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 520, in model_fn\n colocate_gradients_with_ops=True)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 293, in optimize_loss\n name="train")', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 1154, in apply_gradients\n gradients, global_step=global_step, name=name)', 'File "/t2t/tensor2tensor/utils/yellowfin.py", line 222, in apply_gradients\n update_hyper_op = self.update_hyper_param()', 'File "/t2t/tensor2tensor/utils/yellowfin.py", line 190, in update_hyper_param\n lambda: self._mu_var) )', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in cond\n orig_res_t, res_t = context_t.BuildCondBranch(true_fn)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1689, in BuildCondBranch\n original_result = fn()', 'File "/t2t/tensor2tensor/utils/yellowfin.py", line 189, in <lambda>\n self._mu = tf.identity(tf.cond(self._do_tune, lambda: self.get_mu_tensor(),', 'File "/t2t/tensor2tensor/utils/yellowfin.py", line 180, in get_mu_tensor\n tf.assert_equal(tf.size(root), tf.constant(1) )', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/check_ops.py", line 318, in assert_equal\n return control_flow_ops.Assert(condition, data, summarize=summarize)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]']
==================================
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-07-06 14:31:31.807218: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-06 14:31:31.807260: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-06 14:31:31.807285: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-06 14:31:31.855132: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-06 14:31:31.855471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 670MX
major: 3 minor: 0 memoryClockRate (GHz) 0.601
pciBusID 0000:01:00.0
Total memory: 2.94GiB
Free memory: 2.60GiB
2017-07-06 14:31:31.855541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-06 14:31:31.855567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-07-06 14:31:31.855606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 670MX, pci bus id: 0000:01:00.0)
2017-07-06 14:31:32.895272: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895276: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895446: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895327: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895466: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895573: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895625: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895675: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895693: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.895545: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.897115: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.901863: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.902270: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.902804: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.903010: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.903597: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.904450: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.904735: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:32.907982: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
2017-07-06 14:31:33.041912: W tensorflow/core/framework/op_kernel.cc:1158] Failed precondition: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/t2t/tensor2tensor/bin/t2t-trainer", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/t2t/tensor2tensor/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/t2t/tensor2tensor/utils/trainer_utils.py", line 247, in run
run_locally(exp_fn(output_dir))
File "/t2t/tensor2tensor/utils/trainer_utils.py", line 537, in run_locally
exp.train_and_evaluate()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model
config=self._session_config
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 412, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 279, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
Caused by op u'global_step/read', defined at:
File "/usr/local/bin/t2t-trainer", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/t2t/tensor2tensor/bin/t2t-trainer", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/t2t/tensor2tensor/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/t2t/tensor2tensor/utils/trainer_utils.py", line 247, in run
run_locally(exp_fn(output_dir))
File "/t2t/tensor2tensor/utils/trainer_utils.py", line 537, in run_locally
exp.train_and_evaluate()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 952, in _train_model
global_step = contrib_framework.create_global_step(g)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 133, in create_global_step
return training_util.create_global_step(graph)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/training_util.py", line 119, in create_global_step
collections=[ops.GraphKeys.GLOBAL_VARIABLES, ops.GraphKeys.GLOBAL_STEP])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 725, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 200, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 319, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1303, in identity
result = _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value global_step
[[Node: global_step/read = Identity[T=DT_INT64, _class=["loc:@global_step"], _device="/job:localhost/replica:0/task:0/cpu:0"](global_step)]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables_1/boolean_mask/Gather:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "/usr/local/bin/t2t-trainer", line 6, in <module>\n exec(compile(open(__file__).read(), __file__, \'exec\'))', 'File "/t2t/tensor2tensor/bin/t2t-trainer", line 83, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "/t2t/tensor2tensor/bin/t2t-trainer", line 79, in main\n schedule=FLAGS.schedule)', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 247, in run\n run_locally(exp_fn(output_dir))', 'File "/t2t/tensor2tensor/utils/trainer_utils.py", line 537, in run_locally\n exp.train_and_evaluate()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate\n self.train(delay_secs=0)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train\n hooks=self._train_monitors + extra_hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train\n monitors=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit\n loss = self._train_model(input_fn=input_fn, hooks=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model\n config=self._session_config', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__\n self._sess = _RecoverableSession(self._coordinated_creator)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__\n _WrappedSession.__init__(self, self._create_session())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session\n return self._sess_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session\n self.tf_sess = self._session_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 403, in create_session\n self._scaffold.finalize()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 192, in finalize\n default_ready_for_local_init_op)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 254, in get_or_default\n op = default_constructor()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 189, in default_ready_for_local_init_op\n variables.global_variables())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]']
==================================
If you do not want to help or contribute, please close the issue and forgive me.
Otherwise, i will appreciate any help :)
I've also tried to write YellowFin as an tf.train.Optimizer, but going at C++ level seems to be out of my skills at the moment...
@JianGoForIt as i said in different issues i was trying to adapt YF
to be usable in tensor2tensor and after my PR to definitively integrate YF in T2T, it raised a license problem. Once the PR is accepted it will override your MIT License, so the T2T authors need your OK(approval) to keep the PR, otherwise we cannot use your code. This is the PR.
It seems that there is an assert operation which might never be evaluated:
tf.assert_equal(tf.size(root), tf.constant(1) )
https://github.com/JianGoForIt/YellowFin/blob/master/tuner_utils/yellowfin.py#L180
Newer versions of TF produce an error with that, and it was probably not the intended behaviour.
Related issue: tensorflow/tensorflow#11315
As title.
Hi! Thanks for posting this code. I thought that I would give YF a try as a drop-in optimizer. Currently, I am using Keras, and I was able to modify your code to run on Keras models by doing the following:
compute_gradients
standalone methodapply_gradients
and after_apply
YFOptimizer
object in a Keras TFOptimizerHowever, while it runs and my loss goes down -- I am not 100% I did everything properly. Do you think you might consider adding this support?
In the line https://github.com/JianGoForIt/YellowFin/blob/master/char-rnn-tensorflow/model.py#L92
the lr
is set to 1 and not to the command line argument value.
Later in https://github.com/JianGoForIt/YellowFin/blob/master/char-rnn-tensorflow/train_YF.py#L138
the learning is set to the command line argument value but for YF this has no effect because the connection between the variable model.lr and YF was never made (for Adam and SGD this will work because model.lr is passed as the learning rate)
tensorflow) ➜ models git:(master) ✗ pip install YellowFin
Collecting YellowFin
Using cached Yellowfin-1.0.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-jykvuD/YellowFin/setup.py", line 7, in <module>
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
File "/home/canoe/Project/tensorflow/lib/python2.7/codecs.py", line 896, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: '/tmp/pip-build-jykvuD/YellowFin/README.md'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-jykvuD/YellowFin/
Running on GPU device I get the following error:
Cannot assign a device for operation 'apply_updates/exDeepFm/embedding/embedding_layer/YellowFin': Cou
ld not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
SparseApplyMomentum: CPU
Shape: GPU CPU
Square: GPU CPU
Unique: GPU CPU
Cast: GPU CPU
UnsortedSegmentSum: GPU CPU
Identity: GPU CPU
Assign: GPU CPU
StridedSlice: GPU CPU
Const: GPU CPU
VariableV2: GPU CPU
TruncatedNormal: GPU CPU
Gather: GPU CPU
Fill: GPU CPU
Mul: GPU CPU
Add: GPU CPU
Thanks for sharing the yellowfin code on github, I just tried it out in one of my projects and got good results. I am just wondering if you are planning to add a open source license in future so that people can use it (of course with proper acknowledgement) in their projects and don't have to remove the yellowfin code sections when sharing their projects e.g., on GitHub.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.