sofastack / sofa-jraft Goto Github PK

View Code? Open in Web Editor NEW

3.5K 127.0 1.1K 3.47 MB

A production-grade java implementation of RAFT consensus algorithm.

Home Page: https://www.sofastack.tech/projects/sofa-jraft/

License: Apache License 2.0

Shell 0.07% Java 99.93%

raft-algorithm raft-java raft sofastack sofa-jraft sofa-bolt distributed-consensus-algorithms java consensus

sofa-jraft's Introduction

SOFAJRaft

中文

Overview

SOFAJRaft is a production-level, high-performance Java implementation based on the RAFT consistency algorithm that supports MULTI-RAFT-GROUP for high-load, low-latency scenarios. With SOFAJRaft you can focus on your business area. SOFAJRaft handles all RAFT-related technical challenges. SOFAJRaft is very user-friendly, which provides several examples, making it easy to understand and use.

Features

Leader election and priority-based semi-deterministic leader election
Log replication and recovery
Read-only member (learner)
Snapshot and log compaction
Cluster membership management, adding nodes, removing nodes, replacing nodes, etc.
Mechanism of transfer leader for reboot, load balance scene, etc.
Symmetric network partition tolerance
Asymmetric network partition tolerance
Fault tolerance, minority failure doesn't affect the overall availability of system
Manual recovery cluster available for majority failure
Linearizable read, ReadIndex/LeaseRead
Replication pipeline
Rich statistics to analyze the performance based on Metrics
Passed Jepsen consistency verification test
SOFAJRaft includes an embedded distributed KV storage implementation

Requirements

Compile requirement: JDK 8+ and Maven 3.2.5+ .

Documents

Contribution

How to contribute

Acknowledgement

SOFAJRaft was ported from Baidu's braft with some optimizing and improvement. Thanks to the Baidu braft team for opening up such a great C++ RAFT implementation.

License

SOFAJRaft is licensed under the Apache License 2.0. SOFAJRaft relies on some third-party components, and their open source protocol is also Apache License 2.0. In addition, SOFAJRaft also directly references some code (possibly with minor changes), which open source protocol is Apache License 2.0, including

NonBlockingHashMap/NonBlockingHashMapLong in JCTools
HashedWheelTimer in Netty, also referenced Netty's Pipeline design
Efficient encoding/decoding of UTF8 String in Protobuf

Community

See our community materials.

Join the user group on Slack

Scan the QR code below with DingTalk(钉钉) to join the SOFAStack user group.

Scan the QR code below with WeChat(微信) to Follow our Official Accounts.

Known Users

These are the companies using SOFAStack (the names are in no particular order). Please leave a comment here to tell us your scenario to make SOFAStack better.

sofa-jraft's People

Contributors

Stargazers

Watchers

Forkers

kqsmea8 dbl-x kluyuyu tandd1123 p79n6a gonnana m11y a2vk hongweiyi sosojustdo sevenlogin xiangjun0103 navyjiang zhoudaqing juaby lzw2006 opmindo ljc520313 beaver-company mahak zhangweidavid wujingjun11 zhou-jered stateis0 ercargo myxof fanhualta chenhuimin zsongshu penghaibo203 samson886 gongxihai22 daniellitoc marveliu zorrock ainilili googlemeoften linking12 yaoqi technoboy- tomzhang slievrly shiftyman dyb10101 reidone yida818 paulzhu8597 roboslyq sioncheng fengjiachun linuer selman0634 chakra-coder misselvexu uniseraph jason8888888 qq261888667 rintnil shaocongliang startime-h taiyangchen mfrank2016 binlijin fredchenbj lusong1986 canglang1973 typ1990 fibbery chenxulun yajxh tianshouzhi sakimonk wjpit1990 viyond ruyuhi wqxdoc yatian xhjcehust accphuangxin sunmingshi haiker2011 itmachina anguoliu jingshsh hwlsniper eiantee zgseed zhaog universefeeler aeonve haohao1470 justwindy berrycol chang290 baodingfengyun hhy5277 goodstar huangyunbin chicc999 shouweikun

sofa-jraft's Issues

获取存活follower

作为leader，想获取存活的follower，通知存活的follower处理数据。

目前存在 com.alipay.sofa.jraft.core.ReplicatorGroupImpl#failureReplicators，但是接口没有提供获取的方法。

配置变更中如果移除了当前节点，不会shutdown这个node？

如题，从代码看，被移除的node一直还在，不断重复选举超时。

是否应该shutdown这个node？或者是否应该暴露一个接口，当被移除时触发，让业务可以定义这个行为？

针对 windows 环境做系统测试

目前已发现的兼容问题：

文件路径名，部分测试用了临时目录和特殊符号的目录。
ProtoBufFile 在文件存在的时候 save 返回 false #54

1.2.5 发布

单元测试通过
jepsen 验证通过
文档补充
发布和 release note

Change the jdk version in travis

We should use openjdk instead of oraclejdk because oraclejdk will charge. Openjdk8 and openjdk11 will be used as two versions of the official long-term maintenance. At the same time, openjdk11 has a lot of syntax adjustments, so it needs to do the corresponding compatibility test.

提问请教: 一个端口是否能服务于多个raft group?

请教几个问题：
假如我想要在一个节点上有多个raft group，是否可以只监听一个端口，靠group id来区分它们吗？还是有多少个raft group，就分别要开一个端口进行服务？
我看到PeerId里面有一个idx字段，这个字段现在有用吗？

还有一个不明白的地方，在CounterClient这个例子里面，客户端在进行rpc调用的时候，只用到了Endpoint信息(也就是IP+端口)，groupID都没用到。是不是可以理解为一个端口只能对应一个groupID？

cliClientService.getRpcClient().invokeWithCallback(leader.getEndpoint().toString(), request...

jraft构建失败

操作系统：mac os
JDK：oracle jdk8
开发工具：idea 2018

jmh基准测试框架的scope为test，项目构建失败；
改为provide后，运行用例失败。

LogManagerImpl#getEntryFromMemory(long index) 遍历logsInMemory太耗费cpu了

Requires.requireTrue(。。。descLogsInMemory()），descLogsInMemory这个操作每次都跑一下也挺happy的

同一个RpcClient异步调用顺序和最后状态机执行的顺序不同

现在我希望在一个client上提交大量的请求，这些请求是有前后关系的，不能乱序，如果我使用同一个客户端循环调用invokeWithCallback，那么jraft应该无法保证状态机执行的顺序和invoke的顺序一致吧？那么我只能一个一个的同步调用，速度就会非常慢。
请问这种问题有什么解决方案吗？

bolt里面的日志怎么没打出来

CounterServer结束leader进程，过段时间重启会一直刷日志

Sofa-Middleware-Log SLF4J : Actual binding is of type [ com.alipay.remoting Log4j2 ]
2019-03-19 11:35:54 [main] INFO log:30 - Sofa-Middleware-Log SLF4J : Actual binding is of type [ com.alipay.remoting Log4j2 ]
2019-03-19 11:35:56 [main] INFO FSMCallerImpl:188 - Starts FSMCaller successfully.
2019-03-19 11:35:56 [Jraft-FSMCaller-disruptor-0] INFO StateMachineAdapter:79 - onConfigurationCommitted: 127.0.0.1:8081,127.0.0.1:8082,127.0.0.1:8083
2019-03-19 11:35:56 [Jraft-FSMCaller-disruptor-0] INFO SnapshotExecutorImpl:435 - Node <counter/127.0.0.1:8081> onSnapshotLoadDone, last_included_index: 1
last_included_term: 1
peers: "127.0.0.1:8081"
peers: "127.0.0.1:8082"
peers: "127.0.0.1:8083"

2019-03-19 11:35:56 [main] INFO NodeImpl:793 - Node <counter/127.0.0.1:8081> init, term: 1, lastLogId: LogId [index=1, term=1], conf: 127.0.0.1:8081,127.0.0.1:8082,127.0.0.1:8083, old_conf:
2019-03-19 11:35:57 [main] INFO RaftGroupService:139 - Start the RaftGroupService successfully.
Started counter server at port:8081
2019-03-19 11:35:57 [Rpc-netty-server-worker-1-thread-1] WARN RaftRpcServerFactory:263 - JRaft SET bolt.rpc.dispatch-msg-list-in-default-executor to be false for replicator pipeline optimistic.
2019-03-19 11:35:57 [counter/127.0.0.1:8081-AppendEntriesThread0] INFO LocalRaftMetaStorage:121 - Save raft meta, path=/tmp/server1\raft_meta, term=2, votedFor=0.0.0.0:0, cost time=15 ms
2019-03-19 11:35:57 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1589 - Node <counter/127.0.0.1:8081> reject term_unmatched AppendEntriesRequest from 127.0.0.1:8082 in term 2 prevLogIndex 2 prevLogTerm 2 localPrevLogTerm 0 lastLogIndex 1 entriesSize 0
2019-03-19 11:35:57 [Jraft-FSMCaller-disruptor-0] INFO StateMachineAdapter:89 - onStartFollowing: LeaderChangeContext [leaderId=127.0.0.1:8082, term=2, status=Status[ENEWLEADER<10011>: Raft node receives message from new leader with higher term.]]
2019-03-19 11:35:57 [Bolt-default-executor-6-thread-2] INFO NodeImpl:2695 - Node <counter/127.0.0.1:8081> received InstallSnapshotRequest lastIncludedLogIndex 2 lastIncludedLogTerm 2 from 127.0.0.1:8082 when lastLogId=LogId [index=1, term=1]
2019-03-19 11:35:57 [Bolt-conn-event-executor-5-thread-1] INFO ClientServiceConnectionEventProcessor:50 - Peer 127.0.0.1:8082 is connected
2019-03-19 11:35:57 [JRaft-Closure-Executor-1] INFO LocalSnapshotStorage:167 - Deleting snapshot /tmp/server1\snapshot\temp
2019-03-19 11:35:57 [JRaft-Closure-Executor-1] ERROR Utils:138 - Fail to close
java.io.IOException
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotStorage.close(LocalSnapshotStorage.java:251) ~[classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotWriter.close(LocalSnapshotWriter.java:98) ~[classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotWriter.close(LocalSnapshotWriter.java:93) ~[classes/:?]
at com.alipay.sofa.jraft.util.Utils.closeQuietly(Utils.java:135) [classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotCopier.internalCopy(LocalSnapshotCopier.java:113) [classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotCopier.startCopy(LocalSnapshotCopier.java:85) [classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.local.LocalSnapshotCopier$$Lambda$9/945205179.run(Unknown Source) [classes/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_45]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
2019-03-19 11:35:57 [Jraft-FSMCaller-disruptor-0] ERROR CounterStateMachine:124 - Raft error: %s
com.alipay.sofa.jraft.error.RaftException: ERROR_TYPE_SNAPSHOT
at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.reportError(SnapshotExecutorImpl.java:660) ~[classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.loadDownloadingSnapshot(SnapshotExecutorImpl.java:499) ~[classes/:?]
at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.installSnapshot(SnapshotExecutorImpl.java:484) ~[classes/:?]
at com.alipay.sofa.jraft.core.NodeImpl.handleInstallSnapshot(NodeImpl.java:2699) ~[classes/:?]
at com.alipay.sofa.jraft.rpc.impl.core.InstallSnapshotRequestProcessor.processRequest0(InstallSnapshotRequestProcessor.java:51) ~[classes/:?]
at com.alipay.sofa.jraft.rpc.impl.core.InstallSnapshotRequestProcessor.processRequest0(InstallSnapshotRequestProcessor.java:1) ~[classes/:?]
at com.alipay.sofa.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:58) ~[classes/:?]
at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:53) ~[classes/:?]
at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:1) ~[classes/:?]
at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.dispatchToUserProcessor(RpcRequestProcessor.java:224) ~[bolt-1.5.3.jar:?]
at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.doProcess(RpcRequestProcessor.java:145) ~[bolt-1.5.3.jar:?]
at com.alipay.remoting.rpc.protocol.RpcRequestProcessor$ProcessTask.run(RpcRequestProcessor.java:366) ~[bolt-1.5.3.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
2019-03-19 11:35:57 [Jraft-FSMCaller-disruptor-0] WARN NodeImpl:2014 - Node <counter/127.0.0.1:8081> got error=Error [type=ERROR_TYPE_SNAPSHOT, status=Status[EIO<1014>: Fail to sync writer]]
2019-03-19 11:35:57 [Jraft-FSMCaller-disruptor-0] INFO StateMachineAdapter:84 - onStopFollowing: LeaderChangeContext [leaderId=127.0.0.1:8082, term=2, status=Status[EBADNODE<10009>: Raft node(leader or candidate) is in error.]]
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-6] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-7] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-8] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-9] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-10] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-11] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-12] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-13] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-14] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:58 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:58 [Bolt-default-executor-6-thread-15] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:59 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:59 [Bolt-default-executor-6-thread-16] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:59 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:59 [Bolt-default-executor-6-thread-17] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:59 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:59 [Bolt-default-executor-6-thread-18] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:59 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:59 [Bolt-default-executor-6-thread-19] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR
2019-03-19 11:35:59 [counter/127.0.0.1:8081-AppendEntriesThread0] WARN NodeImpl:1540 - Node <counter/127.0.0.1:8081> is not in active state, current term 2
2019-03-19 11:35:59 [Bolt-default-executor-6-thread-20] WARN NodeImpl:2663 - Node <counter/127.0.0.1:8081> ignore InstallSnapshotRequest as it is not in active state STATE_ERROR

【PR】修复PreVote流程的疑似bug？

最近鄙人也在做Raft的研究和实现，看到贵团队开源的Java版Raft库，大喜若狂，遂连夜品读代码，收获良多。

但是，预投票流程看着好像有点问题，根据我理解的Raft，预投票的关键是2点：
1.预投票请求报文中的term，应该是发起预投票者的term，不需要term+1（因为prevote的出现就是解决网络分区后节点不断增加term扰乱全局而增加的优化流程）
2.收到预投票请求的节点，应该检查lastLeaderTimestamp是否超过了最小的election超时时间，如果否，则认为当前的leader依然有效，拒绝该preVote（保证稳定性）

而JRaft的实现中，感觉有点问题，也可能是鄙人不理解造成的：
1.发起者以term+1发起预投票，原因何在？
2.接收者，handlePreVoteRequest方法中，具体问题看下面代码中间的中文注释部分：

     do {
        if (request.getTerm() < this.currTerm) {
            LOG.info("Node {} ignore PreVote from {} in term {} currTerm {}", this.getNodeId(),
                request.getServerId(), request.getTerm(), this.currTerm);
            // A follower replicator may not be started when this node become leader, so we must check it.
            checkReplicator(candidateId);
            break;
        } else if (request.getTerm() == this.currTerm + 1) {

               //！！！！！此判断开始往后就根据log的term和index判断是否投赞成票！！！！
               //这样的话，不管当前节点是Follower还是leader，遇到prevote只要没有更新的log，都会赞成
               //这违背了preVote的初衷。
			
            // A follower replicator may not be started when this node become leader, so we must check it.
            // check replicator state
            checkReplicator(candidateId);
        }
        doUnlock = false;
        this.writeLock.unlock();

        final LogId logId = this.logManager.getLastLogId(true);

        doUnlock = true;
        this.writeLock.lock();
        final LogId requestLastLogId = new LogId(request.getLastLogIndex(), request.getLastLogTerm());
        granted = (requestLastLogId.compareTo(logId) >= 0);

        LOG.info(
            "Node {} received PreVote from {} in term {} currTerm {} granted {}, request last logId: {}, current last logId: {}",
            this.getNodeId(), request.getServerId(), request.getTerm(), this.currTerm, granted,
            requestLastLogId, logId);
    } while (false);

    final RequestVoteResponse.Builder responseBuilder = RequestVoteResponse.newBuilder();
    responseBuilder.setTerm(this.currTerm);
    responseBuilder.setGranted(granted);
    return responseBuilder.build();

这里是困惑之处，如果是鄙人眼拙，希望指教下阁下的思路。

鄙人按照自己的理解，修改了下源码，请求PR，主要改动有几个：
1.handlePreVoteRequest方法：前面增加if(Utils.nowMs()-lastLeaderTimestamp<=options.getElectionTimeoutMs())的判断（实现刚刚提到的论文核心要点2）
2.handlePreVoteRequest方法：如果是leader，且赞成此次preVote（term > currTerm && req.log不旧），则stepdown
3.handleElectionTimeout方法(调用preVote前)：if(Utils.nowMs()-lastLeaderTimestamp<=options.getElectionTimeoutMs())的判断去掉，因为这段貌似多余，因为electionTimout了已经，而且这段逻辑应该放到接收者才对。
4.preVote方法：req.setTerm(this.currTerm);不加1！

不知道鄙人理解是否有问题，还望阁下指教！~
如果没问题，希望接受此PR，以贡献小弟绵薄之力。

LogStorage 基于 SPI 可扩展

Your question

Your scenes

方便用户自定义实现 LogStorage，当然，可能不仅仅 LogStorage 需要基于 SPI 扩展

Your advice

实现一套 SPI 加载机制，可能需要对 ServiceLoader 稍微扩展一下

Environment

求问有release版本可以用吗？

1.2.4 发布

单元测试通过
jepsen 验证通过
文档补充
发布和 release note

LogStorage 实现数据和索引分离

通常类似 rocksdb/leveldb 都只适合做小的 key/value 存储，在 key/value 较大的时候，性能不是很理想。

目前 LogStorage 的默认实现是直接基于 Rocksdb 的 RocksdbLogStorage 实现，虽然用户可提供自己的实现，不过默认实现也有优化的余地：

将日志按照 segment 方式的 append 添加，一定大小拆分文件(1G假设）。Segment 文件名为写入的第一条日志的 log index。
rocksdb 存储的 key 保持不变为 log index，而 value 从直接存储日志数据，修改为存储日志在 segment 中的偏移量 (file_index, write_position)，value 就可以缩小到 12 个字节(8 + 4)实现。
对于 truncate 日志的实现也较为容易，可以直接按照 segment 粒度（依据 segment 文件名来定位）来删除，同时需要删除索引数据。这里需要涉及一些 recover 处理。
segment 的实现可以使用 mmap 和 group commit fsync 来提升性能。

please set develop branch as default branch

文档翻译

需要将所有文档都英文翻译下， readme 也应该国际化

ignore PreVote because the leader's lease is still valid

hi，
当前拿两个节点操作，经过几次添加删除节点操作后，出现了下面的日志，请问这种情况如何恢复？谢谢。

2019-03-29 15:34:27,010 INFO
Bolt-default-executor-6-thread-16 - Node <test/127.0.0.1:8080> ignore PreVote from 127.0.0.1:8081 in term 5 currTerm 4, because the leader 127.0.0.1:8080's lease is still valid.

FSMCallerTest的testOnCommittedError前三行代码是无用的，可以去掉

@test
public void testOnCommittedError() throws Exception {
final LogEntry log = new LogEntry(EntryType.ENTRY_TYPE_DATA);
log.getId().setIndex(11);
log.getId().setTerm(1);
Mockito.when(logManager.getTerm(10)).thenReturn(1L);
前3行是无用的，在setup中就有了
opts.setBootstrapId(new LogId(10, 1));

LogStorage 接口是否支持自定义实现？

现在LogStorage接口默认使用rocksdb实现，如果不想要rocksdb，自己实现一套简单的日志存储，能否提供参数可配置？

jraft的成熟度和使用情况如何？

看了下jraft的源码，感觉还不错，很多细节都有考虑，对raft的实现也是比较完善。
打算使用jraft，但是如何说服其他人，说服上级呢？
如果有些成功的案例，如果是生产上已经大规模的使用，这些都会比较有说服力。

(feat) Read-only member(learner)

metric 统计接口化

允许可插拔的 metric 类库接入，默认是 dropwizard，用户可实现该接口提供其他实现，如 micrometer 等。

jraft是运行在物理机上的吗？因为容器没有固定ip

现在都是云化了，都是容器，容器上如何部署raft集群呢，因为集群的启动要配置集群的多个ip，但是容器又没有固定的ip？
你们是怎么解决这个问题的呢？

ReadLease 的处理似乎并不严谨

简单的阅读了一下 read lease 部分处理的代码，发现read lease只是判断了状态是不是STATE_LEADER。

考虑这样一个case。一个 {A,B,C} raft 复制组，刚开始一切正常。过了一段时间 A和C发生了网络分区，这样{A,B}因为能达到多数，所以仍然能正常工作。在网络分区后，没有任何新日志被写入，所以几个副本的数据是完全一致的。C在election timeout后发起选主，然后和B达成一致，C成为新leader。这时候，A的心跳还没到达B，所以A这是stale leader。在A发现有新主之前，新日志被写下了。这时，A的状态其实仍然是STATE_LEADER。按照代码的实现，仍然在提供读，发生了stale read。

新leader在起来之后，应该等待一段时间才能处理请求，确保老leader的lease过期。这个时长应该超过lease period。

目前，在代码中没有看到相关的处理。当然，可能是我看得不够仔细。

点Star列表前30里，为何接近一半为两天内注册的全新账号？

https://zhuanlan.zhihu.com/p/61034386

下面提到的情况的详细截图在上述知乎文章里都已提供。希望从事实出发，合理讨论，是非曲直自有公论。

上周末与两周前的周末，均发生大量一两天内注册的无头像、无github实际使用用户密集点赞的情况。比如今天下午北京时间4点18分的时候，最近30个点赞者中，有13个是今天和昨天两天注册的全新账号。

这些新号，都是固定点赞几个完全类别、语言都不同的项目，比如jraft和一个与互联网完全无关的IT管理软件。更甚者，有多个账号在两三个小时内注册，并完成对完全一样的四个项目含jraft的点赞。

这样大量的新号的密集点赞的结果就是点赞数不合理增长并将项目推入当天trending榜。

请问，支付宝是受到灰黑产业的“打击报复性点赞攻击”了吗？为何似乎支付宝是这种“打击报复性点赞攻击”的受益方呢？

Prefer monotonic time for lastLeaderTimestamp

It's better to use monotonic time for lastLeaderTimestamp in NodeImpl.It represents the last timestamp that the node received request from leader, and it should be monotonic.

ProtoBufFile#save(Message msg, boolean sync) 返回false问题

如果ProtoBufFile.path文件已经存在，savesave将一直返回false，
LocalSnapshotCopier.internalCopy()中调用了filter();和copyFile(file);
这可能会导致调用两次LocalSnapshotWriter.sync()，第一次返回ture，第二次返回false

ignore PreVote because the leader's lease is still valid

hi，
当前拿两个节点操作，经过几次添加删除节点操作后，出现了下面的日志，请问这种情况如何恢复？谢谢。

(feat) 升级 bolt 到 v1.5.3

解除对 com.alipay.sofa:hessian 的强依赖，更多讨论见这里 sofastack/sofa-bolt#122

RheaKV多Region共享StateMachine的疑问

刚刚阅读了下RheaKV的代码，收获颇丰，但是有个疑问，甚是不解，希望指导一下。

RheaKV采用multi-raft的设计，但一个节点的多个region共享一个RocksDB状态机。
这样，snapshot是以整个节点所有region的维度来做的。
那么，当一个raft group的leader和follower之间做install snapshot时，是否会把leader所在的节点的所有数据（包括所有的raft group）都install过去了？

此处看了好久，也没有看到怎么把不同raft group的数据区分开，还望赐教。

提问：为什么StateMachine的onApply方法需要去调用 closure.run

问题已解，请关闭或删除该issue

提问请教:为什么jraft选主每个node只需发送一轮voteRequest,而ZAB协议需要发送多轮?

ZAB协议中,peer更新currentNode后,需要重新发送notification
为什么jraft更新currTerm和lastLogId不需要再继续发送Request

这样设计和zab孰优孰劣呢?

10秒护主中，网络切割恢复后，长时间的脑裂情况怎么处理

网络切割恢复后，10秒护主模式，有两个leader的情况下如何处理，接收读写、下发到follower写等

(fix) Delete existing data on start RocksRawKVStore

Every time on start RocksRawKVStore, should delete the data in the db directory.
Relying on raft's snapshot and log playback to reply to the data is the correct behavior.

Ballot#init(Configuration conf, Configuration oldConf)怎么理解这两个conf

如题

没有处理 RaftMetaStorage 的异常返回

感谢开发者的辛苦工作。

Describe the bug

Raft 的 meta 信息对 Raft 正常工作来说至关重要，看上去目前 LocalRaftMetaStorage 存储 Meta 信息时虽然会捕获 IO 异常，但仅仅是打日志和返回 false 表示存 meta 失败，使用 RaftMetaStorage 的地方均未检查返回值是否是 false，(比如这里这里这里) 也即相当于当出现 IO 异常时候仅仅只有日志记录，此后 Raft 一旦重启，恢复的 Meta 信息就是错的。

比如可能出现在某个 Term 100 下已经 Vote 过某个节点 A 为 Leader，结果 Meta 没存成功又异常重启了，启动后就忘记自己在 Term 100 下 Vote 过 A 节点为 Leader (可能恢复的是老 Term 下的 Vote For)，于是在 term 100 下又能 Vote 另一个节点为 Leader 了，从而违反了 Raft 的约定。

不知道是不是我遗漏了什么信息，如有理解错误还望见谅。

Expected behavior

处理 RaftMetaStorage 存 meta 失败时返回的 false 值，存失败时候是不是可以考虑将整个节点停止。

Actual behavior

看上去没有任何地方处理 RaftMetaStorage 在存 Meta 失败后返回的 false 值。

Steps to reproduce

比如在 Raft 启动后，将存储 Meta 的路径完全删除或修改权限让 Raft 进程无权操作目标路径。

但看上去这里真的想引起问题可能不太容易，比如即使存 Meta 失败，当前进程只要不重启就还能保持正确，即使重启只要 Leader 还在没发生选举能收到 Leader 的数据就还能在重启后更新 term 到正确值，不过看上去是有隐患在。

Environment

SOFAJRaft version: v1.2.5
JVM version (e.g. java -version): 任意
OS version (e.g. uname -a): 任意
Maven version: 任意
IDE version: 任意

时序图

能不能提供一下counter-incrementAndGet的时序图

Redis protocol compatibility layer based on RheaKV

Titan 是美图基于 TiKV 自主研发的 Redis 协议兼容层，通过将 Redis 丰富的数据类型，映射为 TiKV 中的扁平化的 Key-Value，实现了完整兼容 Redis 协议的分布式存储。
地址：https://github.com/meitu/titan
介绍：https://mp.weixin.qq.com/s/2tyAtcmKUU2L1yoE_V3SsA
基于此 Titan 的实现思路，能否考虑基于 RheaKV 来实现 Redis 协议兼容层？

Multi-Raft初始化时leader的均衡性问题

用JRaft实现multi-raft，在初始化时，目前没有办法指定哪个node优先成为leader，这或许会因为节点的启动顺序等因素，造成leader过于集中于某一个节点。有这个可能性吗？能否做下改进？

wiki的错别字

https://github.com/alipay/sofa-jraft/wiki/%E5%88%86%E5%B8%83%E5%BC%8F%E4%B8%80%E8%87%B4%E6%80%A7%3A-Raft-%E4%B8%8E-JRaft

良心-》良性？还是直接去掉

除了这里之外有没有其他群组可以讨论

性能测试工具？

是否有性能测试的工具？有比较多的raft备选方案，现在需要选择一个，期望给出性能测试的工具或者结果

NodeImpl#executeApplyingTasks方法疑似bug

if (!this.ballotBox.appendPendingTask(this.conf.getConf(), conf.isStable() ? null : conf.getOldConf(),
            task.done)) {
            Utils.runClosureInThread(task.done, new Status(RaftError.EINTERNAL, "Fail to append task."));
            return;
}

此处的return不应该是continue更合适吗

关于AddPeer的疑问

addPeer流程，假设新的node没有任何数据，那么，先把新peer的node启动起来，注意node的initialServerList不能只有这个新peer，可以加上现有的集群列表。
因为如果只有这个新peer，它就会选举自己成为leader。~~
如果initialServerList包括现有的集群列表，那么在对现有集群发送addPeer请求前，这个新peer可能会选举超时，然后发起“预投票”，原有集群的节点会拒绝此次预投票，因为它的term比较小。新peer会stepdown，然后等待下一次选举超时，继续发“预投票”，如此往复。这个过程中，如果对原集群leader进行addPeer，就可以把这个节点加进去。

不知道我的理解对不对？麻烦指教~

提问请教：example子项目里面关于CounterServer的redirect()方法作用

你好，在读example子项目源码的时候看到CounterServer类中的redirect函数有点不太理解

public ValueResponse redirect() {
    final ValueResponse response = new ValueResponse();
    response.setSuccess(false);
    if (this.node != null) {
        final PeerId leader = this.node.getLeaderId();
        if (leader != null) {
            response.setRedirect(leader.toString());
        }
    }
    return response;
}

这里设置了response.setRedirect(leader.toString());但是没有看到ValueResponse中getRedirect()方法被调用过。

在IncrementAndGetRequestProcessor类中，

@Override
public void handleRequest(final BizContext bizCtx, final AsyncContext asyncCtx, final IncrementAndGetRequest request) {
    if (!this.counterServer.getFsm().isLeader()) {
        // 这里要重定向到leader，但是却没有看到用ValueResponse中getRedirect()方法同获取leader，不知道是怎么做到重定向的
        asyncCtx.sendResponse(this.counterServer.redirect());
        return;
    }
    ...
}

感觉ValueResponse中的redirect没什么用。不知道我理解的对不对。

In RocksDBLogStorage, set BloomFilter's bits_per_key to 8 is enough

The key of log is 8 bytes

[疑问] ReadOnlyLeaseBased疑问

感谢sofa贡献的sofa-jraft(以下简称jraft),我此次观阅了ReadIndex相关的一些代码,有些疑问,请教下各位

在tikv中,描述了一个场景:
前提使用lease base:
在leader transfer过程中,由于网络分区问题导致老的leader没有stepdown以至于可以执行读请求.
场景描述原文

这个场景在jraft中貌似是没有经过处理的? (我读了transferLeadershipTo相关方法)

所以请教下在此场景下jraft是否存在问题?

是否可以优化此场景?(虽然jraft在注释中明确说明lease read是不安全的, 但是毕竟lease read要比 read index效率高)

sofastack / sofa-jraft Goto Github PK

sofa-jraft's Introduction

SOFAJRaft

Overview

Features

Requirements

Documents

Contribution

Acknowledgement

License

Community

Known Users

sofa-jraft's People

Contributors

Stargazers

Watchers

Forkers

sofa-jraft's Issues

Your question

Your scenes

Your advice

Environment

Describe the bug

Expected behavior

Actual behavior

Steps to reproduce

Environment

Recommend Projects

Recommend Topics

Recommend Org