Hello, I am learning the implementation of the inference part and trying to run it

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Can not use direct server-to-server communication about petals HOT 6 CLOSED

miaoqijun commented on August 28, 2024

Can not use direct server-to-server communication

from petals.

Comments (6)

cybershrapnel commented on August 28, 2024

i think they abandoned open source for the paid model lol

from petals.

mryab commented on August 28, 2024

@cybershrapnel, I'm not sure where you got this impression, but Petals has no paid model and we have no intention of developing it :)
It's true that our current team of maintainers is very short on people, but we are trying to develop additional features for the library and are happy to support contributions (e.g., PRs) from the community

from petals.

mryab commented on August 28, 2024

@miaoqijun Thanks for the observation! Tagging @justheuristic just to be sure, but it looks like a very good catch: it might be that we overlooked the way metadata is built when making multiple inference steps. We'll try to look into it

from petals.

justheuristic commented on August 28, 2024

Hi, @mryab , @miaoqijun

lemme look into this, will write back soon

from petals.

justheuristic commented on August 28, 2024

Okay, so the problem OP described is certainly still there. And it's a shame that it took us so long to get to that 😅

@miaoqijun , thank you a lot for the work you've done when writing this issue.

For reproducibility, here's how i tested it:

After this line: https://github.com/bigscience-workshop/petals/blob/c08d09c/src/petals/server/handler.py#L324 , i added print("checking if should push to next_servers:", next_servers)

note: the commit id points to the main branch as of now

test setup

# terminal 1 - initial peer
python -m petals.cli.run_dht --identity_path tests/bootstrap.id --host_maddrs /ip4/127.0.0.1/tcp/31337
# terminals 2 and 3 - repeat the same script
python -m petals.cli.run_server $MODEL_NAME --num_blocks 4 --device cpu --torch_dtype float32 --initial_peers $INITIAL_PEERS

# terminal 4: run inference test
pytest test_full_model.py::test_full_model_exact_match  -s

Outputs from the first server match with what @miaoqijun reported earlier:

Note that the client knows the next servers during the first request - but it withholds them because processing prefix with pushes can be invalid in some cases (e.g. if client wishes to modify intermediate activations via prefix tuning).

Out of the two alternative solutions (either send next_servers in the first request or what @miaoqijun proposed ) the latter is more general because it covers cases where next_servers changed during inference (e.g. next server experienced hardware failure).

I will now reopen this as a PR and find a way to properly credit @miaoqijun in that pull request

from petals.

miaoqijun commented on August 28, 2024

Thx for fixing this and happy with my contribution although it‘s a bit late : )

from petals.

Can not use direct server-to-server communication about petals HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent