Comments (5)
I think SimpleLossCompute should work fine on CPU if you have enough memory. Do you get an error?
There is a variant you could use, where you split into chunks like MultiGPULossCompute, but do not use data parallel. Let me know if SimpleLossCompute fails
from annotated-transformer.
it seems like MultiGPULossCompute
does a good job separating data into chunks. SimpleLossCompute
fails, I think because there is no enough RAM.
But adapted MultiGPULossCompute
fails because of something else. The log is following.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-38-240a851d50f5> in <module>()
9 MultiGPULossCompute(model.generator, criterion,
10 # devices=devices,
---> 11 opt=model_opt))
12 model.eval()
13 loss = run_epoch((rebatch(pad_idx, b) for b in valid_iter),
<ipython-input-26-3c250c9d9ec4> in run_epoch(data_iter, model, loss_compute)
8 out = model.forward(batch.src, batch.trg,
9 batch.src_mask, batch.trg_mask)
---> 10 loss = loss_compute(out, batch.trg_y, batch.ntokens)
11 total_loss += loss
12 total_tokens += batch.ntokens
<ipython-input-36-46d56c947589> in __call__(self, out, targets, normalize)
37 for o in out]
38
---> 39 gen = generator(out_column)
40 # gen = nn.parallel.parallel_apply(generator, out_column)
41
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
<ipython-input-4-96c707961385> in forward(self, x)
6
7 def forward(self, x):
----> 8 return F.log_softmax(self.proj(x), dim=-1)
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
53
54 def forward(self, input):
---> 55 return F.linear(input, self.weight, self.bias)
56
57 def __repr__(self):
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
831 - Output: :math:`(N, *, out\_features)`
832 """
--> 833 if input.dim() == 2 and bias is not None:
834 # fused op is marginally faster
835 return torch.addmm(bias, input, weight.t())
AttributeError: 'list' object has no attribute 'dim'
from annotated-transformer.
Did you figure this out? I would like to leave it open.
from annotated-transformer.
Unfortunately SimpleLossCompute
still does not work (and this is not because of RAM).
It fails on the validation step with the following error message
loss.backward()
File "/home/melpuser/miniconda2/envs/py36/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/melpuser/miniconda2/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: element 0 of variables tuple is volatile
For some reason it doesn't like calculating loss.backward()
in SimpleLossCompute
while calculating loss on the validation set.
I've tried to adapt MultiGPULossCompute
to simple CPU version, but so far with no success, cannot deal with x and y - TypeError: forward() missing 1 required positional argument: 'target'
from annotated-transformer.
I tackled the same problem and found the following codes worked.
- PyTorch version
pip install http://download.pytorch.org/whl/cpu/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
After ## Multi-GPU Training
of the notebook,
pad_idx = TGT.vocab.stoi["<blank>"]
model = make_model(len(SRC.vocab), len(TGT.vocab), N=6)
criterion = LabelSmoothing(size=len(TGT.vocab), padding_idx=pad_idx, smoothing=0.1)
BATCH_SIZE = 100
train_iter = MyIterator(train, batch_size=BATCH_SIZE,
repeat=False, sort_key=lambda x: (len(x.src), len(x.trg)),
batch_size_fn=batch_size_fn, train=True)
valid_iter = MyIterator(val, batch_size=BATCH_SIZE,
repeat=False, sort_key=lambda x: (len(x.src), len(x.trg)),
batch_size_fn=batch_size_fn, train=False)
(I changed BATCH_SIZE for my environment.)
model_opt = NoamOpt(model.src_embed[0].d_model, 1, 2000,
torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))
for epoch in range(10):
model.train()
run_epoch((rebatch(pad_idx, b) for b in train_iter), model,
SimpleLossCompute(model.generator, criterion, model_opt))
model.eval()
print(run_epoch((rebatch(pad_idx, b) for b in train_iter), model,
SimpleLossCompute(model.generator, criterion, None)))
NOTE: I just checked the script doesn't return errors, so I'm not sure whether the training goes well or not (I mean I didn't check the performance of a trained model).
from annotated-transformer.
Related Issues (20)
- Some doubts about SublayerConnection HOT 5
- How long is the training process? HOT 3
- TypeError: dropout(): argument 'input' (position 1) must be Tensor, not NoneType HOT 1
- nbatches vs batch_size
- Visualization issue
- No need for a generator in the EncoderDecoder class HOT 1
- How to do the inference?
- How to calculate the BLEU score? HOT 1
- label smoothing inf err HOT 2
- Incorrect implementaion of SublayerConnection class HOT 2
- use the greedy_decode two times in check_outputs function
- Issue with Spacy Dependency Version: issubclass() arg 1 must be a class HOT 4
- dockerfile HOT 1
- annotated-transformer
- The first column of synthetic data in the first example should be set to 0 instead of 1?
- note on torch 1.11 vs torch 2.1 compatibility
- About transpose processing in `MultiHeadedAttention` class. HOT 2
- Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. HOT 7
- Epoch Training: Help HOT 1
- Typo in Multihead-attention: HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annotated-transformer.