Comments (22)
I faced the same issue with Hub2, work around is to use Hub1.
from albert.
You need to specify vocab or spam_model_file (related to sentencepiece tokenization model) in the command line.
How do you get them?
You can download https://tfhub.dev/google/albert_base/1 and untar them. Then you can find them in "../asset/30k-clean.model"
add command-line arguments
"--spam_model_file=YOUR_PATH/assets/30k-clean.model"
Note: only work for hub1.
from albert.
I am still seeing the same issue with TF 1.15. using the "run_classifier" command mentioned above. v1 module works fine.
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Einsum' (op type: Einsum)
from albert.
I faced the same issue with Hub2, work around is to use Hub1.
FYI, Rachnas means using version 1 of the base model rather than 2. If someone finds a way to use version 2 please tell us the secret!
from albert.
I faced the same issue with Hub2, work around is to use Hub1.
Thank you for the advice Rachnas. It worked in Hub1. However, still wondering how to work using Hub2 :)
from albert.
astrongstorm, Rachnas have you guys been able to get reasonable results from any training? Even when I repeat the same example in they have provided I get pretty bad results.
from albert.
astrongstorm, Rachnas have you guys been able to get reasonable results from any training? Even when I repeat the same example in they have provided I get pretty bad results.
I am yet to get results.
from albert.
run_classifier_with_tfhub.py
--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1
--data_dir=glue_data/MNLI
--task_name=mnli
--spm_model_file=30k-clean.model
--output_dir=output
--do_train=true
--do_eval=true
--max_seq_length=128
--train_batch_size=32
--learning_rate=1e-4
--num_train_epochs=5
--eval_batch_size=32
--predict_batch_size=32
--use_tpu=False
I got poor results too:
INFO:tensorflow:***** Eval results *****
I1109 21:40:18.838561 139722163410752 run_classifier_with_tfhub.py:273] ***** Eval results *****
INFO:tensorflow: eval_accuracy = 0.8169129
I1109 21:40:18.838666 139722163410752 run_classifier_with_tfhub.py:275] eval_accuracy = 0.8169129
INFO:tensorflow: eval_loss = 0.57061106
I1109 21:40:18.838964 139722163410752 run_classifier_with_tfhub.py:275] eval_loss = 0.57061106
INFO:tensorflow: global_step = 61359
from albert.
For this problem, I believe we are talking about the v2, there are some problems on tensor lookup on Hub2, right?
from albert.
facing same issue using version 2, but it works fine with version 1 by defined spam_model_file in command line
from albert.
I'm getting bad results on both version 1 and 2. Better results on 1 in comparison to 2 though. In my prior experiences with other models I found that Lamb was very sensitive to the parameters. I'm thinking of trying Adam to see if that is the problem. Has anyone tried using Adam instead of Lamb and see if they get better results?
from albert.
I am also having the same issue.
from albert.
The training problem is still not solved even after using Hub1 ( version 1 of ALBERT ) . It gives the following error -
ValueError: Variable <tf.Variable 'albert_layer_module/cls/predictions/output_bias:0' shape=(30000,) dtype=float32> has
None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
from albert.
I'm getting bad results on both version 1 and 2. Better results on 1 in comparison to 2 though. In my prior experiences with other models I found that Lamb was very sensitive to the parameters. I'm thinking of trying Adam to see if that is the problem. Has anyone tried using Adam instead of Lamb and see if they get better results?
Have you solve the problem on v2? Could you share how to make it work?
from albert.
The issue with hub v2 modules is not fixed yet (v1 is good)
from albert.
The "no gradient defined for operation Einsum" was found to be caused by using an old version of TF. The full investigation is here. I've modified requirements.txt to explicitly request TF 1.15. Please run pip install -r requirements.txt and verify that you are running TF 1.15. If you still see the problem, let me know by posting to this thread.
BTW, I merged the TF-hub functionality into run_classifier.py
in this commit. The reason is that run_classifier_with_tfhub.py
got out of sync. Please use run_classifier.py with --albert_hub_module_handle=XXX
when fine-tuning from TF-Hub. Sorry for any inconvenience.
I tested this with TF1.15 using the v2 hub modules and it seems to be working at HEAD.
python3 -m run_classifier --data_dir="$HOME/ALBERT/glue" --task_name=cola --output_dir=/tmp/testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-05 --train_step=50 --spm_model_file="$HOME/ALBERT/spm_vocab/30k-clean.model"
from albert.
The "no gradient defined for operation Einsum" was found to be caused by using an old version of TF. The full investigation is here. I've modified requirements.txt to explicitly request TF 1.15. Please run pip install -r requirements.txt and verify that you are running TF 1.15. If you still see the problem, let me know by posting to this thread.
BTW, I merged the TF-hub functionality into
run_classifier.py
in this commit. The reason is thatrun_classifier_with_tfhub.py
got out of sync. Please use run_classifier.py with--albert_hub_module_handle=XXX
when fine-tuning from TF-Hub. Sorry for any inconvenience.I tested this with TF1.15 using the v2 hub modules and it seems to be working at HEAD.
python3 -m run_classifier --data_dir="$HOME/ALBERT/glue" --task_name=cola --output_dir=/tmp/testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-05 --train_step=50 --spm_model_file="$HOME/ALBERT/spm_vocab/30k-clean.model"
with tensorflow version 1.15 we are still facing the same error
from albert.
Ah, now I'm able to reproduce it. There appears to be an issue with the way that the V2 modules were generated. I'm looking into it with the TF team and will get back with an answer soon hopefully.
from albert.
It looks like V2 modules were generated with a different version of TF, which contains native ops not present in TF 1.X releases. We will have to regenerate and re-release them with TF 1.15. Apologies for the inconvenience. I'll update this thread when the new modules are uploaded.
from albert.
We have regenerated the hub modules using TF1.15.
Please use hub modules with the "/3" suffix. Hub modules with the "/2" suffix will remain broken. TF-Hub links in the readme have been updated accordingly.
See Jan 7 update in the readme for more info.
from albert.
I am facing the same issue with the traditional BERt on Colab.
Here's all the specs:
TF --> '1.15.0'
Colab --> '0.7.0'
Code for loading BERt
input_word_ids = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="segment_ids")
#BERt = BERtLayer()([input_word_ids, input_mask, segment_ids])
bert = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1", trainable=True)
pooled_output, sequence_output = bert([input_word_ids, input_mask, segment_ids])
Exception thrown
`Call initializer instance with the dtype argument instead of passing it to the constructor
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_word_ids (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
input_mask (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
segment_ids (InputLayer) [(None, 20)] 0
__________________________________________________________________________________________________
keras_layer (KerasLayer) [(None, 768), (None, 177853441 input_word_ids[0][0]
input_mask[0][0]
segment_ids[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) [(None, None, 512), 2099200 keras_layer[0][1]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 512) 0 bidirectional[0][1]
bidirectional[0][3]
__________________________________________________________________________________________________
repeat_vector (RepeatVector) (None, None, 512) 0 concatenate[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, None, 1) 513 repeat_vector[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, None, 1) 0 dense[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 512) 0 bidirectional[0][0]
activation[0][0]
__________________________________________________________________________________________________
multiply (Multiply) (None, None, 512) 0 bidirectional[0][0]
lambda[0][0]
__________________________________________________________________________________________________
babelnet (Dense) (None, None, 26221) 13451373 multiply[0][0]
__________________________________________________________________________________________________
domain (Dense) (None, None, 9916) 5086908 multiply[0][0]
__________________________________________________________________________________________________
lexicon (Dense) (None, None, 9916) 5086908 multiply[0][0]
==================================================================================================
Total params: 203,578,343
Trainable params: 203,578,342
Non-trainable params: 1
__________________________________________________________________________________________________
enter in train...
WARNING:tensorflow:From /content/Progetto/code/tokenizer.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From /content/Progetto/code/tokenizer.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
Done train preparation...
Done label preparatiomn
ciao
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 29740 samples, validate on 7436 samples
2020-02-22 08:36:39.829236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.829902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-02-22 08:36:39.830005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-22 08:36:39.830039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-22 08:36:39.830074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-22 08:36:39.830103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-22 08:36:39.830127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-22 08:36:39.830154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-22 08:36:39.830182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-22 08:36:39.830309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.830960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.831507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-02-22 08:36:39.831561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-22 08:36:39.831575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-02-22 08:36:39.831603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-02-22 08:36:39.831760: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.832342: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.832866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2020-02-22 08:36:41.766438: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:42.267189: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:43.576576: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:43.654042: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:44.099220: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
Epoch 1/4
2020-02-22 08:37:05.283924: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-22 08:37:08.127005: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at function_ops.cc:250 : Not found: No gradient defined for op: Einsum
Traceback (most recent call last):
File "model.py", line 128, in <module>
modello.train(train,label,vocab_label_bn,vocab_label_wndmn,vocab_label_lex, train_dev, label_dev)
File "model.py", line 92, in train
callbacks = [checkpoint, early_stopper],
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 675, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.NotFoundError: [_Derived_]No gradient defined for op: Einsum
[[{{node Func/_36}}]]
[[training/Adam/gradients/gradients/keras_layer/cond/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/SymbolicGradient]]`
from albert.
Does the hub module have multiple tags? If so, did you tried any other?
I faced a similar error with a different hub module. It turns out I was using the incorrect tag.
from albert.
Related Issues (20)
- torch.nn.modules.module.ModuleAttributeError: 'AlbertEmbeddings' object has no attribute 'bias' HOT 1
- The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.
- albert base fine-tuned on squad2.0 gets stuck in loop when predicting on new file HOT 1
- Wrong pieces for control symbols after loading SentencepieceProcessor from official model HOT 2
- fine tune on my own English dataset
- Discrepancy in tokenization results using albert's tokenizer and sentencepiece library
- which word segmentation tool is used for pretraining Chinese ALBERT
- Probable error on line 306 in `create_pretraining_data.py` for albert
- Default Tutorial Not Working - Can't download MRPC data HOT 2
- Prediction Fails on default Colab HOT 2
- How to get the test embeddings from output of fine-tuned model (tutorial)
- when training in Race , The eval_accuracy is flat , it only has three numbers which are 0.0, 0.33334, 0.66667, 1.0
- Difference between v1 and v2 for xxlarge
- Wrong evaluate result on Squad2.0
- The results can't be reproduced HOT 2
- Improvement to how the `app` and `pages` files conflict is shown. Especially the last log line `"pages/" - "app/"` made it seem like you should remove the `pages` folder altogether. This was a bug in how the `''` case was displayed. After having a look at this I went further and added exactly which file caused the conflict given that `app` allows you to create `app/(home)/page.js` and such it saves some digging for what the actual conflicting file is. Similarly in `pages` both `pages/dashboard/index.js` and `pages/dashboard.js` are possible.
- Load in Browser Tensorflow
- Why do I find inconsistencies between the output of my ALBERT model converted to ONNX format and tested with ONNX Runtime, compared to the original PyTorch format model?
- Albet
- Albert
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from albert.