Coder Social home page Coder Social logo

cyraft's People

Contributors

maksimdrachov avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

songmeo

cyraft's Issues

TimerHandle.cancel() not working as expected?

The leader election "works", however there's a couple of small issues I'm still struggling with. Fundamentally they are all related to scheduling/cancelling the delayed callbacks (for term/election timeouts).

Running tests/raft_leader_election.py:

  1. pytest -k _unittest_raft_fsm_1 --pdb

image

The above warning-message appears. I don't see why though. At the end I'm using cancel(); which should take care care of cancelling both callbacks:

cyraft/cyraft/node.py

Lines 717 to 727 in a8e2121

def close(self) -> None:
"""
Cancel the timers and close the node.
"""
if hasattr(self, "_election_timer"):
self._election_timer.cancel()
assert self._election_timer.cancelled()
if hasattr(self, "_term_timer"):
self._term_timer.cancel()
assert self._term_timer.cancelled()
self._node.close()

The thing is, it does take care of cancelling those callbacks, for example if add an additional 10 seconds of sleep (expecting that if the callback was not cancelled properly it would have enough time to execute), it doesn't execute anything:

image

image

(Nothing happens.)

  1. Now comes the confusing part, the previous the unit tests should indicate that the heartbeat mechanism works correctly, however it starts to mess up with the last 2 unit tests? For example running:

pytest -k _unittest_raft_fsm_3 --pdb

image

Even though node 43 gets the heartbeat message as expected, it resets the election timeout timer, it still somehow ends up timing out on the election timeout set at the start of the unit test?

Am I missing something?

Simplifying the Raft algorithm

So:

image

Is there any reason why we can't remove the marked state transition? After all if we assume that upon election timeout, the node returns to the follower state, basicly the same result is achieved? (It starts another election timeout, allowing for some other node to get elected in the meantime.)

This would simplify both the RaftNode implementation, as well as the testing part (where it is difficult to catch a node as it transitions between candidate and candidate state).

PS: Im gonna proceed as if the answer is yes.

No response from request_vote

I'm trying to get a response from request_vote, after sending a request using Yakut.

The issue is that I'm not getting any response, and the request times out:

issue-1-no-response

The demo node does appear to receive the request correctly:

issue-1-request-received

(When running in debug mode, putting a breakpoint on the entrance of _serve_request_vote, it appears to hang on the return statement)

The code implementing this request_vote functionality is as follows:

  • Add srv_request_vote to be served in the background:

    # Create an RPC-server. (RequestVote)
    try:
        _logger.info("Request vote service is enabled")
        srv_request_vote = self._node.get_server(
            sirius_cyber_corp.RequestVote_1, "request_vote"
        )
        srv_request_vote.serve_in_background(self._serve_request_vote)
    except pycyphal.application.register.MissingRegisterError:
        _logger.info(
            "The request vote service is disabled by configuration (UAVCAN__SRV__REQUEST_VOTE__ID missing)"
        )
  • The _serve_request_vote function itself:

    @staticmethod
    async def _serve_request_vote(
        # self,
        request: sirius_cyber_corp.RequestVote_1.Request,
        metadata: pycyphal.presentation.ServiceRequestMetadata,
    ) -> sirius_cyber_corp.RequestVote_1.Response:
        _logger.info(
            "\033[94m Request vote request %s from node %d \033[0m",
            request,
            metadata.client_node_id,
        )
    
        return sirius_cyber_corp.RequestVote_1.Response(
            term=1,
            vote_granted=True,
        )

This is pretty much the same way it is done in the demo example from pycyphal.

The only thing I'm unclear about is whether it might have something to do the way run() is implemented:

async def run(self) -> None:
    """
    The main method that runs the business logic. It is also possible to use the library in an IoC-style
    by using receive_in_background() for all subscriptions if desired.
    """
    _logger.info("Application Node started!")
    _logger.info("Running. Press Ctrl+C to stop.")

    while True:
        await asyncio.sleep(0.1)

In the demo example, there's more setup code here, however nothing related to the service (least_square in their case.)

I suspect maybe I need to use the library in "IoC-style by using receive_in_background()", however not sure how this looks in code, please bear with me and use simple words.

Add name resolution service

# NameToIDRequest.0.1.dsdl
# This message is published when a node desires to map a computational graph name to a numerical identifier.
# The Raft node that is currently elected as the Leader should find the entry in the log and send the response.
# If there is no such entry in the log, a new one needs to be created by the Leader ad-hoc;
# the response with the new value is then published as soon as the Raft consensus is reached (replication completed).
ResourceKind.0.1 kind
uavcan.primitive.String.1.0 name
@extent 512*8
# NameToIDResponse.0.1.dsdl
# This message is published by a name service node to inform the subscribers of the identifier associated with the named resource.
# If there is no known association, an ID has to be chosen automatically and the name table be extended ad-hoc.
# If there is no known association and it is impossible to create one at the moment, no response should be published.
uint32 id
# The requested identifier value.
ResourceKind.0.1 kind
uavcan.primitive.String.1.0 name
@extent 512*8
# ResourceKind.0.1.dsdl
# Kind of a named resource.
uint4 value
uint4 SUBJECT = 0
@sealed

Definition AppendEntries.1.0.dsdl

For AppendEntries, what type to use for the entries variable?

From paper:

image

I'm thinking something like this:

# This service is used for the AppendEntries RPC

uint64 term
uint64 leaderID
uint64 prevLogIndex
uint64 prevLogTerm
LogEntry.1.0[<64] entries
uint64 leaderCommit

@sealed

---

uint64 term
bool success

@sealed
# This type is used to define a LogEntry for AppendEntries RPC
uint64 term
uint8[<=256] name
uint64 value
@sealed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.