Comments (13)
Hello! Thank you for reporting!
We will quickly resolve this issue.
from petals.
We resolved this issue in recent master update. Just pull new updates.
Thank tou for noticing the issue and waiting fixes.
from petals.
Thank you for the information. It seems the only change required is this: #574.
We will soon merge it with the main.
from petals.
Hi!
How is the work on the fixes going, is everything good?
We are really looking for the merge
from petals.
Sorry for taking so long; the fix is merged into the master.
from petals.
Hello!
I observe the same problem. I have tried to diagnose the issue a bit by myselve.
As I understood (if you haven't found it already) the problem is in calculating block size (its parameters). The layer_idx mentioned above is used in load_pretrained_block, but it is not used when calculating block_size and when calculating rps in throughput.
Very much waiting for a solution.
from petals.
Thank you for your quick response!
from petals.
Hi!
Original error of this issue doesn't appear anymore, but I've got another error when I try launching private swarm with Mixtral (with GPU, CPU is ok). Also it doesn't appear when I do the same with StableBeluga2
System:
- Python3.10
- Torch2.2.2
- Cuda 12.3
- Ubuntu 22.04
Reproduce
python3 -m petals.cli.run_server SanjiWatsuki/TinyMixtral-32x248M --new_swarm
Error
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in <module>
main()
File "/home/qessia/.local/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/server.py", line 237, in __init__
throughput_info = get_server_throughput(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache_ if inference else None)
File "/home/qessia/.local/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/petals/models/mixtral/block.py", line 74, in forward
outputs = super().forward(
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 934, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 356, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/home/qessia/.local/lib/python3.10/site-packages/transformers/cache_utils.py", line 131, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)
from petals.
Hello!
This is a strange error. Can you also provide a transformers' version?
from petals.
Can you also provide a transformers' version?
4.38.2
from petals.
I had that same error on master as well and had a ticket open for it, #575
from petals.
I was able to get the branch mentioned running and my docker work rebased.
Have now tinymixtral running locally in gpu.
https://github.com/meta-introspector/petals
from petals.
Thank you for fixes!! It works
from petals.
Related Issues (20)
- Is there a way to shard a model without downloading it first? HOT 2
- DynamicCache and Beam Search
- Manual management of shards HOT 1
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) HOT 2
- Error with PyTorch 2.3.0: Missing '_refresh_per_optimizer_state' in 'torch.cuda.amp.grad_scaler'
- LLama-3-70B support HOT 3
- System_prompt HOT 1
- batch processing/parallel processing HOT 1
- Donating System Memory? HOT 1
- Petals doesn't deal with server failure properly HOT 4
- Unable to connect to Private Swarm HOT 1
- Meta Llama 3.1 HOT 2
- NotImplementedError: HOT 1
- Unable to see private swarm contributers on the Health Monitor
- Pascal family cards support.
- multiple gpu support?
- attention_mask = FalconModel._prepare_attn_mask(attention_mask, (batch_size, seq_length), past_length) AttributeError: type object 'FalconModel' has no attribute '_prepare_attn_mask'
- Error using the local llama3.1 model
- Question about overlapped serving blocks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from petals.