rafaljozefowicz / lm Goto Github PK

View Code? Open in Web Editor NEW

164.0 164.0 57.0 3.63 MB

License: MIT License

Python 100.00%

lm's People

Contributors

Stargazers

Watchers

Forkers

stanfordmlgroup lopuhin wellwang benjamesbabala guanlongtianzi fangzheng354 byzhang leezqcst ambier beamandrew tranlm wangpeng138375 forestlzj ravi-teja-mullapudi mkolod shamoya sheerun iamsile sainid77 linguistliu dconger srvinay snci artemzi krislc schmidek manjunaths thangduong zhaowei8188127 yueqi-zhang yuyuz nesciemus meinwerk eric-haibin-lin kangkanglee anpark hoangcuong2011 soudai-s mgsong mpatwary mswellhao rokasst tanyufei syx528911137 lobster1987 yangdaxia6 alianpaul mpjlu zhisbug barseghyanartur keyuding jaycicle xrosliang

lm's Issues

0.11/0.12 compatibility fix

Haven't validated if accuracy is unaffected, but for the model to run on 0.12, tf.nn.rnn_cell.LSTMCell needs to be changed to:
cell = tf.nn.rnn_cell.LSTMCell(hps.state_size, hps.emb_size, num_proj=hps.projected_size, state_is_tuple=False)

Full Softmax Crash

Using the (default configuration) LSTM-2048-512, we're able to run sampled softmax. However, when running eval with full softmax (num_sampled = 0) we hit a crash.

When running on CPU we get a Segmentation fault during the first call to sess.run (run_utils.py, line 121):.

With GPU, execution reaches the first call to sess.run (as above), but the error traces back to an earlier line (run_utils.py, line 94).

...
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 24.21GiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[8192,793470]
...

Environment is a ubuntu 14.04 box has 128GB RAM, 4xGTX1080s(6GB), and TF 0.10.

Is there a way to run a full softmax on this hardware?

uniform_unit_scaling_initializer full_shape

Hi, I'm trying to run the code on Google Colab but I'm facing the following error:

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
  File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "drive/app/lm-master/single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/content/drive/app/lm-master/language_model.py", line 62, in _forward
    emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
  File "/content/drive/app/lm-master/model_utils.py", line 18, in sharded_variable
    initializer = tf.uniform_unit_scaling_initializer(dtype=dtype, full_shape=shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
    return func(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'full_shape'

I see that in the new version of tf the full_shape is no longer an argument of uniform_unit_scaling_initializer.
I tried removing the shape argument to test, but I faced another error:


/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1036, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 879, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("model/model/dropout/mul:0", shape=(512,), dtype=float32, device=/gpu:0)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "drive/app/lm-master/single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/content/drive/app/lm-master/language_model.py", line 68, in _forward
    inputs = [tf.squeeze(v, [1]) for v in tf.split(1, hps.num_steps, x)]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 1366, in split
    axis=axis, num_split=num_or_size_splits, value=value, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5069, in _split
    "Split", split_dim=axis, value=value, num_split=num_split, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 533, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

Then I tried to convert num_steps into int32 which was unsuccessful.

Basically above is my unsuccessful attempt in fixing this error. What should I do about it and how can I handle shape argument in uniform_unit_scaling_initializer

Error in sampled_softmax_loss after converting code to tf 1.0+

Hi Rafal, thank you for the code! Not sure if you are still supporting it.

But I keep getting errors using it with tf 1.2, after converting your code to be compatible with the new version tf using tf_upgrade.py. At first it complained targets has int32 and doesn't match float32 in

            loss = tf.nn.sampled_softmax_loss(softmax_w, softmax_b, tf.to_float(inputs),
                                              targets, hps.num_sampled, hps.vocab_size)

So I changed targets to tf.to_float(targets); now I am getting the error shown below:

$ python single_lm_train.py --logdir log --num_gpus 1 --datadir data --hpconfig emb_size=100,state_size=256,projected_size=128
Traceback (most recent call last):
  File "single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/home/ccrmad/Code/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/home/ccrmad/Code/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/home/ccrmad/Code/lm-master/language_model.py", line 100, in _forward
    tf.to_float(targets), hps.num_sampled, hps.vocab_size)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1247, in sampled_softmax_loss
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1007, in _compute_sampled_logits
    inputs, sampled_w, transpose_b=True) + sampled_b
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1825, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1242, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2536, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1818, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1768, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
    require_shape_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 1 and 128 for 'model/model/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [5120,1], [?,128].

Any idea why?

why not reset the LSTM states

at begin of a sentence, I think should reset the LSTM states as other rnnlm(https://github.com/facebookresearch/adaptive-softmax, https://github.com/yoonkim/lstm-char-cnn)

Cannot understand the reasons for several implementation details, e.g. sharded embedding, lstm state between batch

Hi, i have read your code here, but several implementation details quite confused me. Hope for your help.

sharded embedding - emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
Could you tell me why you split the embeddings to several shards. What are the benefits for doing this?
delivery lstm state between batch - self.initial_states[i].assign(state)
As you have shuffled sentences before training, so i can't see links between training examples of adjacent batch. Therefore i can't understand why we need to delivery lstm state between batch.

Why is embed_grad scaled up by batch_size?

https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L119-L122

        for i in range(len(emb_grads)):
            assert isinstance(emb_grads[i], tf.IndexedSlices)
            emb_grads[i] = tf.IndexedSlices(emb_grads[i].values * hps.batch_size, emb_grads[i].indices,
                                            emb_grads[i].dense_shape)

why is emb_grad modified this way?

rafaljozefowicz / lm Goto Github PK

lm's People

Contributors

Stargazers

Watchers

Forkers

lm's Issues

0.11/0.12 compatibility fix

Full Softmax Crash

uniform_unit_scaling_initializer full_shape

Error in sampled_softmax_loss after converting code to tf 1.0+

why not reset the LSTM states

Cannot understand the reasons for several implementation details, e.g. sharded embedding, lstm state between batch

Why is embed_grad scaled up by batch_size?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent