Coder Social home page Coder Social logo

Comments (16)

aramallo avatar aramallo commented on September 26, 2024 1

Oh I see what it might be going on. See my comments on the snippet.

setup_node_id(#{host := _Host, port := _Port, client_id := NodeID} = State) ->
    _CurrName = node(),
    %% >>>>
    %% Here partisan started, adopted a dynamic name
    %% and if a discovery agent was configured on the sys.config 
    %% it might have already joined the cluster with its dynamic name
    %% <<<<

    %% >>>>
    %% Stop will not do a leave, so you need to call partisan_peer_service:leave() before stopping the app
    %% <<<<
    partisan:stop(),

    net_kernel:stop(),
    {NodeName, HostName} = 
    case net_kernel:start(NodeID, #{name_domain => shortnames}) of
        {ok, _Pid} ->
            partisan:start(),
            {ok , LocalHost} = inet:gethostname(),
            {node(), LocalHost};
        {error, Reason} ->
            io:format("error starting net_kernel: ~p~n", [Reason]),
            {undefined, undefined}
    end,
    State#{node => atom_to_binary(NodeName), hostname => list_to_binary(HostName)}
    .

Overall is there any reason why you are starting disterl? I wouldn't do it like this :-)

I tend to completely disable it and set the nodename via the vm.args but you can also set it via sys.conf using {partisan, [{name, NODENAME}, ...]}..

You can use vm.args.src in rebar3 to take the value from an ENV VAR too.

## Name of the node (used by Partisan)
-name ${ERLANG_NODENAME}

## Cookie for disterl (used for remote_console only)
-setcookie my cookie

## Explicit connections only (Deprecated in latest versions)
-connect_all false
-auto_connect never

## Disable disterl
-start_epmd false
-hidden

You can also just tell relx to load partisan but not start it, so that you can do your custom name configuration, then do sth like the following:

%% Asumming partisan has been loaded but not started
Nodename = ...do your magic here
application:set_env(partisan, name, Nodename),
_ = application:ensure_all_started(partisan)

from partisan.

aramallo avatar aramallo commented on September 26, 2024

Hi @mcesaro , I will look into that tomorrow morning and let you know when fixed. Thanks a lot!

from partisan.

aramallo avatar aramallo commented on September 26, 2024

Which version/ tag are you using?

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

It's the one included in tag "leapsight-1.5.1" in erleans, which is in my settings "v5.0.0-rc.11".
Now that you mention it, one of the other issues I should address is a generic handling of the partisan versions among applications including erleans and applications using partisan directly.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

hi @mcesaro just to clarify when you say you would like the server to keep on running which one are you referrying to? Cause the server that is crashing is partisan_peer_service_client, the TCP client connection that Partisan is trying to create with 'pebble@max-a5'. During the handshake the peer identifies with a different name, hence the error, this must be the case when the node has a membership view where 'pebble@max-a5' listens to IP X, but IP X is now associated with a node with a different name. In this case the partisan_peer_service should crash and the Peer Service Manager (in this case partisan_pluggable_peer_service_manager) should continuosly retry that connection until its view is updated (replacing
'pebble@max-a5' with a new spec).

Although the partisan_peer_service_client server crashes and its linked to the Peer Service Manager, the latter should still be running as it traps exits and handles them in handle_info.

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

Hi,
I see the logic, although it's not clear to me how the peer service view is updated (should it be automatic or triggered somehow ?).
I was concerned because the peer service manager itself crashed, bringing down the rest of the application.
My test envirnoment is made of a bunch of lxc containers with 2/3 different bridged LANs and the local DNS (manage by lxd) might be part of the problem.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

HI,

Can you provide me the log entries were the Peer Service crashes with the previous logs as well for context.
If the Peer Service crashes due to the above ther must be something else. In any case that would be a bug.

it's not clear to me how the peer service view is updated
The view is updated by each node gossiping its state, a CRDT object, which is merged on every node at every round.

This is done by the Peer Service Manager, in your case
partisan_pluggable_peer_service_manager (process) delegating the details to partisan_full_membership_strategy (module). The first one periodically calls the second one periodic function. The latter uses partisan_membership_set a module wrapping access to a CRDT, to maintain its view of the cluster. The periodic function returns the messages the Peer Service Manager for it to send to all the other peers. On reception the remote Peer Service Manager merges the received CRDT with the one maintained by its local partisan_full_membership_strategy module.

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

UPDATE: the problem affects all the members of a cluster.

Hi,
I can't replicate the crash, so it probably was due to a different reason in my own code.
However, I'm experiencing the following behavior that renders the system unusable: when one of the cluster members restarts and try to join again, a kind of loop condition is generated.
See the following low from 2 nodes, pebble and cwork:
peer pebble

=ERROR REPORT==== 25-Oct-2023::11:28:49.915373 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::11:28:49.915515 ===
** Generic server <0.3063.0> terminating 
** Last message in was {tcp,#Port<0.1137>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.1137>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            partisan_membership,
                            #{monotonic => false,parallelism => 1,
                              compression => true},
                            [compressed],
                            <0.1066.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=CRASH REPORT==== 25-Oct-2023::11:28:49.915713 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.3063.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.1059.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1066.0>]
    dictionary: [{{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,compression => true}},
                  {{partisan_peer_service_client,from},<0.1066.0>},
                  {{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,channel},partisan_membership},
                  {{partisan_peer_service_client,egress_delay},0}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13853
  neighbours:

=ERROR REPORT==== 25-Oct-2023::11:28:50.917261 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::11:28:50.917498 ===
** Generic server <0.3080.0> terminating 
** Last message in was {tcp,#Port<0.1145>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.1145>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            undefined,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.1066.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::11:28:50.917912 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=CRASH REPORT==== 25-Oct-2023::11:28:50.917764 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.3080.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.1059.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1066.0>]
    dictionary: [{{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,
                     compression => false}},
                  {{partisan_peer_service_client,from},<0.1066.0>},
                  {{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,channel},undefined},
                  {{partisan_peer_service_client,egress_delay},0}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13644
  neighbours:

=ERROR REPORT==== 25-Oct-2023::11:28:50.918026 ===
** Generic server <0.3084.0> terminating 
** Last message in was {tcp,#Port<0.1148>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.1148>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            data,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.1066.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::11:28:50.918281 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=CRASH REPORT==== 25-Oct-2023::11:28:50.918334 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.3084.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.1059.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1066.0>]
    dictionary: [{{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,
                     compression => false}},
                  {{partisan_peer_service_client,from},<0.1066.0>},
                  {{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,channel},data},
                  {{partisan_peer_service_client,egress_delay},0}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13471
  neighbours:

=ERROR REPORT==== 25-Oct-2023::11:28:50.918440 ===
** Generic server <0.3087.0> terminating 
** Last message in was {tcp,#Port<0.1150>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.1150>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            partisan_membership,
                            #{monotonic => false,parallelism => 1,
                              compression => true},
                            [compressed],
                            <0.1066.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=CRASH REPORT==== 25-Oct-2023::11:28:50.918656 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.3087.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.1059.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1066.0>]
    dictionary: [{{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,compression => true}},
                  {{partisan_peer_service_client,from},<0.1066.0>},
                  {{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,channel},partisan_membership},
                  {{partisan_peer_service_client,egress_delay},0}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13852
  neighbours:

on cwork peer:

=ERROR REPORT==== 25-Oct-2023::09:28:04.263766 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.264086 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.264151 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.264024 ===
** Generic server <0.1739.0> terminating 
** Last message in was {tcp,#Port<0.467>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.467>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            partisan_membership,
                            #{monotonic => false,parallelism => 1,
                              compression => true},
                            [compressed],
                            <0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::09:28:04.264304 ===
** Generic server <0.1738.0> terminating 
** Last message in was {tcp,#Port<0.466>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.466>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            data,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::09:28:04.264353 ===
** Generic server <0.1737.0> terminating 
** Last message in was {tcp,#Port<0.465>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.465>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            undefined,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=CRASH REPORT==== 25-Oct-2023::09:28:04.264295 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.1739.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.845.0>]
    message_queue_len: 0
    messages: []
    links: [<0.855.0>]
    dictionary: [{{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,egress_delay},0},
                  {{partisan_peer_service_client,channel},partisan_membership},
                  {{partisan_peer_service_client,from},<0.855.0>},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,compression => true}},
                  {{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13864
  neighbours:

=CRASH REPORT==== 25-Oct-2023::09:28:04.264475 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.1738.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.845.0>]
    message_queue_len: 0
    messages: []
    links: [<0.855.0>]
    dictionary: [{{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,egress_delay},0},
                  {{partisan_peer_service_client,channel},data},
                  {{partisan_peer_service_client,from},<0.855.0>},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,
                     compression => false}},
                  {{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13632
  neighbours:

=CRASH REPORT==== 25-Oct-2023::09:28:04.264567 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.1737.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.845.0>]
    message_queue_len: 0
    messages: []
    links: [<0.855.0>]
    dictionary: [{{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,egress_delay},0},
                  {{partisan_peer_service_client,channel},undefined},
                  {{partisan_peer_service_client,from},<0.855.0>},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,
                     compression => false}},
                  {{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13508
  neighbours:

=ERROR REPORT==== 25-Oct-2023::09:28:04.436731 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.437083 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.437245 ===
    description: Unexpected peer, aborting
    expected: '[email protected]'
    got: 'pebble@max-a5'
=ERROR REPORT==== 25-Oct-2023::09:28:04.437387 ===
** Generic server <0.1752.0> terminating 
** Last message in was {tcp,#Port<0.470>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.470>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            partisan_membership,
                            #{monotonic => false,parallelism => 1,
                              compression => true},
                            [compressed],
                            <0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::09:28:04.437584 ===
** Generic server <0.1751.0> terminating 
** Last message in was {tcp,#Port<0.469>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.469>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            data,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=ERROR REPORT==== 25-Oct-2023::09:28:04.436970 ===
** Generic server <0.1750.0> terminating 
** Last message in was {tcp,#Port<0.468>,
                            <<131,104,2,119,5,104,101,108,108,111,119,13,112,
                              101,98,98,108,101,64,109,97,120,45,97,53>>}
** When Server state == {state,
                            {partisan_peer_socket,#Port<0.468>,gen_tcp,inet,
                                false},
                            #{port => 10202,ip => {192,168,1,120}},
                            undefined,
                            #{monotonic => false,parallelism => 1,
                              compression => false},
                            [],<0.855.0>,
                            #{name =>
                                  '[email protected]',
                              listen_addrs =>
                                  [#{port => 10202,ip => {192,168,1,120}}],
                              channels =>
                                  #{undefined =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    data =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => false},
                                    partisan_membership =>
                                        #{monotonic => false,parallelism => 1,
                                          compression => true}}}}
** Reason for termination ==
** {unexpected_peer,'pebble@max-a5',
                    '[email protected]'}

=CRASH REPORT==== 25-Oct-2023::09:28:04.437814 ===
  crasher:
    initial call: partisan_peer_service_client:init/1
    pid: <0.1752.0>
    registered_name: []
    exception exit: {unexpected_peer,'pebble@max-a5',
                                     '[email protected]'}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1208)
    ancestors: [partisan_pluggable_peer_service_manager,
                  partisan_peer_service_sup,partisan_sup,<0.845.0>]
    message_queue_len: 0
    messages: []
    links: [<0.855.0>]
    dictionary: [{{partisan_peer_service_client,listen_addr},
                   #{port => 10202,ip => {192,168,1,120}}},
                  {{partisan_peer_service_client,egress_delay},0},
                  {{partisan_peer_service_client,channel},partisan_membership},
                  {{partisan_peer_service_client,from},<0.855.0>},
                  {{partisan_peer_service_client,channel_opts},
                   #{monotonic => false,parallelism => 1,compression => true}},
                  {{partisan_peer_service_client,peer},
                   #{name => '[email protected]',
                     listen_addrs => [#{port => 10202,ip => {192,168,1,120}}],
                     channels =>
                         #{undefined =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           data =>
                               #{monotonic => false,parallelism => 1,
                                 compression => false},
                           partisan_membership =>
                               #{monotonic => false,parallelism => 1,
                                 compression => true}}}}]
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 13855
  neighbours:

It looks like it is related to the handling of unexpected peers, that triggers this loop condition.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

Hi @mcesaro , yes so the Peer Service will keep on trying to open a connection to the nodes in the membership list forever. Maybe in this case we need to do something different. But before doing that I would like to understand what might be going on.

Umm maybe I have introduced something weird in the latest update to the listen_addrs and IP resolution.

The unexpected_peer error seems to be triggered by your node trying to connect to
peer [email protected] which is listening on #{port => 10202,ip => {192,168,1,120} according to the membership set. The fact that the name is {UUID}@127.0.0.1 means that node couldn't resolve its erlang nodename on init and defaulted to a dynamically generated one [1].

When the node at #{port => 10202,ip => {192,168,1,120} is contacted it identifies itself with pebble@max-a5 and thus the error. This could be a bug or a result of a crash (without a leave) followed by another node starting on the same IP:Port as the one that crashed.

Lets check if it could be the latter. Is it possible that you are experiencing the following?

  1. You start two nodes: this one and the peer. The peer is started at #{port => 10202,ip => {192,168,1,120}.
  2. For some reason the peer could not resolve its erlang nodename, adopting a dynamic one i.e. {UUID}@127.0.0.1 joins the cluster with the first one, then is stopped (maybe you stopped it cause the name was wrong?) or crashed without issuing a partisan_peer_service:leave(). So it remains as a member in the cluster membership set in this node.
  3. A new peer starts using the same IP:Port i.e. #{port => 10202,ip => {192,168,1,120} but now correctly taking its nodename pebble@max-a5.
  4. pebble@max-a5 joins the cluster , and now Partisan has 3 members in the cluster, this node, pebble@max-a5 and [email protected].
  5. As a result, this node and pebble@max-a5 will try to connect to [email protected] at #{port => 10202,ip => {192,168,1,120} but the peer running there is called pebble@max-a5!

If I am correct here you should get the three codenames as a result fo calling partisan_peer_service:members() on this node.

If this is the case should be able to resolve the issue by calling

partisan_peer_service:leave('[email protected]')

[1] Here I see a bug since the generated name should have been [email protected].

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

Hi @aramallo,
since I had some doubts about resolving the erlang nodename, in my code I usually do something like that:

setup_node_id(#{host := _Host, port := _Port, client_id := NodeID} = State) ->
    _CurrName = node(),
    partisan:stop(),
    net_kernel:stop(),
    {NodeName, HostName} = 
    case net_kernel:start(NodeID, #{name_domain => shortnames}) of
        {ok, _Pid} ->
            partisan:start(),
            {ok , LocalHost} = inet:gethostname(),
            {node(), LocalHost};
        {error, Reason} ->
            io:format("error starting net_kernel: ~p~n", [Reason]),
            {undefined, undefined}
    end,
    State#{node => atom_to_binary(NodeName), hostname => list_to_binary(HostName)}
    .

thus trying to make sure that the short name of the node is the one assigned.
This is why I can't see if there is an obvious reason for the need of dynamic names.

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

I see.
Actually I do not want to user disterl (that's the reason why I like partisan!), but I thought that the only way to assign a reliable shortname to the erlang node was starting the net_kernel in a controlled way.
Since I need to name the nodes dynamically, I guess I might address this by deferring partisan startup after assigning the node name.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

Obviously, setting the name param will not affect disterl, that is Partisan will not set the the node name in net_kernel. If you want both net_kernel and partisan to have the same node i.e. node() == partisan:node() then the best is to set the name in your vm.args or vm.args.src file.

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

I guess I will just rely on partisan doing the right thing, i.e. take away any reference to disterl. However, I will need to do that independently of the vm.args settings.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

@mcesaro I guess the ideal would be for each node to have a persistent name? So that the same node in a fleet takes the same node after each restart. Not sure how you deploy your system, if you are using k8s you would do that using StatefulSets and exporting the pod name as an Env Var to use in the vm.args.

from partisan.

mcesaro avatar mcesaro commented on September 26, 2024

@aramallo apparently a combination of methods works.
I changed the setup of a cluster node like this:

setup_node_id(#{host := _Host, port := _Port, client_id := NodeID} = State) ->
    partisan:stop(),
    {ok , HostName} = inet:gethostname(),
    NodeName = 
        [io_lib:format("~p@~s", [NodeID, HostName])]
        / lists:flatten
        / list_to_atom,
         
    io:format("nodename: ~p~n", [NodeName]),
    application:set_env(partisan, name, NodeName),
    _ = application:ensure_all_started(partisan),

    State#{node => atom_to_binary(NodeName), hostname => list_to_binary(HostName)}
    .

Seems to work great with my container setup and also on bare metal. No DNS issues (hopefully),

nodename: 'ska@max-a5'
=NOTICE REPORT==== 25-Oct-2023::22:00:17.637182 ===
    name: 'ska@max-a5'
    description: Partisan node name configured
    disterl_enabled: false
=NOTICE REPORT==== 25-Oct-2023::22:00:17.641555 ===
    addr: {0,0,0,0,0,0,0,1}
    family: inet6
    host: max-a5
    description: Resolved IP address for host
setup: partisan peer service join ok
setup: partisan try rpc call
setup: partisan rpc call success

Dynamic names seem to work with partisan!

P.S. I use the great epipe parse transform to emulate the Elixir |> operator.

from partisan.

aramallo avatar aramallo commented on September 26, 2024

@mcesaro Very nice!

BTW I was not aware or epipe, thanks for the tip. I will start working on the other issues and closing this one.

from partisan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.