Coder Social home page Coder Social logo

lm's People

Contributors

rafaljozefowicz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lm's Issues

0.11/0.12 compatibility fix

Haven't validated if accuracy is unaffected, but for the model to run on 0.12, tf.nn.rnn_cell.LSTMCell needs to be changed to:
cell = tf.nn.rnn_cell.LSTMCell(hps.state_size, hps.emb_size, num_proj=hps.projected_size, state_is_tuple=False)

Full Softmax Crash

Using the (default configuration) LSTM-2048-512, we're able to run sampled softmax. However, when running eval with full softmax (num_sampled = 0) we hit a crash.

When running on CPU we get a Segmentation fault during the first call to sess.run (run_utils.py, line 121):.

With GPU, execution reaches the first call to sess.run (as above), but the error traces back to an earlier line (run_utils.py, line 94).

...
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 24.21GiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[8192,793470]
...

Environment is a ubuntu 14.04 box has 128GB RAM, 4xGTX1080s(6GB), and TF 0.10.

Is there a way to run a full softmax on this hardware?

uniform_unit_scaling_initializer full_shape

Hi, I'm trying to run the code on Google Colab but I'm facing the following error:

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
  File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "drive/app/lm-master/single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/content/drive/app/lm-master/language_model.py", line 62, in _forward
    emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
  File "/content/drive/app/lm-master/model_utils.py", line 18, in sharded_variable
    initializer = tf.uniform_unit_scaling_initializer(dtype=dtype, full_shape=shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
    return func(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'full_shape'

I see that in the new version of tf the full_shape is no longer an argument of uniform_unit_scaling_initializer.
I tried removing the shape argument to test, but I faced another error:


/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1036, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 879, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("model/model/dropout/mul:0", shape=(512,), dtype=float32, device=/gpu:0)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "drive/app/lm-master/single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/content/drive/app/lm-master/language_model.py", line 68, in _forward
    inputs = [tf.squeeze(v, [1]) for v in tf.split(1, hps.num_steps, x)]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 1366, in split
    axis=axis, num_split=num_or_size_splits, value=value, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5069, in _split
    "Split", split_dim=axis, value=value, num_split=num_split, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 533, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

Then I tried to convert num_steps into int32 which was unsuccessful.

Basically above is my unsuccessful attempt in fixing this error. What should I do about it and how can I handle shape argument in uniform_unit_scaling_initializer

Error in sampled_softmax_loss after converting code to tf 1.0+

Hi Rafal, thank you for the code! Not sure if you are still supporting it.

But I keep getting errors using it with tf 1.2, after converting your code to be compatible with the new version tf using tf_upgrade.py. At first it complained targets has int32 and doesn't match float32 in

            loss = tf.nn.sampled_softmax_loss(softmax_w, softmax_b, tf.to_float(inputs),
                                              targets, hps.num_sampled, hps.vocab_size)

So I changed targets to tf.to_float(targets); now I am getting the error shown below:

$ python single_lm_train.py --logdir log --num_gpus 1 --datadir data --hpconfig emb_size=100,state_size=256,projected_size=128
Traceback (most recent call last):
  File "single_lm_train.py", line 38, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "single_lm_train.py", line 27, in main
    run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
  File "/home/ccrmad/Code/lm-master/run_utils.py", line 14, in run_train
    model = LM(hps, "train", ps_device)
  File "/home/ccrmad/Code/lm-master/language_model.py", line 24, in __init__
    loss = self._forward(i, xs[i], ys[i], ws[i])
  File "/home/ccrmad/Code/lm-master/language_model.py", line 100, in _forward
    tf.to_float(targets), hps.num_sampled, hps.vocab_size)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1247, in sampled_softmax_loss
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1007, in _compute_sampled_logits
    inputs, sampled_w, transpose_b=True) + sampled_b
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1825, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1242, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2536, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1818, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1768, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
    require_shape_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 1 and 128 for 'model/model/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [5120,1], [?,128].

Any idea why?

Cannot understand the reasons for several implementation details, e.g. sharded embedding, lstm state between batch

Hi, i have read your code here, but several implementation details quite confused me. Hope for your help.

  • sharded embedding - emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
    Could you tell me why you split the embeddings to several shards. What are the benefits for doing this?

  • delivery lstm state between batch - self.initial_states[i].assign(state)
    As you have shuffled sentences before training, so i can't see links between training examples of adjacent batch. Therefore i can't understand why we need to delivery lstm state between batch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.