Comments (9)
从这个日志来看,PaxosLog部分丢失了。
打开完整的日志级别,然后贴完整的启动日志来看看。
from phxpaxos.
我们bSync设置的是false
2016-11-22 09:03:51.11s CheckpointInstanceID 6497752
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8LogStoreE::ParseFileID fileid 19 offset 52932087 checksum 2691965949
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos8LogStoreE::RebuildIndex START fileid 19 offset 52932087 checksum 2691965949
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ../storage/paxoslog/g0/vfile/19.f
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::RebuildIndexForOneFile rebuild one index ok, fileid 19 offset 52932087 instanceid 6570821 checksum 2691965949 buff
er size 399 )
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos8LogStoreE::RebuildIndexForOneFile File Data End, fileid 19 offset 52932498
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8LogStoreE::RebuildIndexForOneFile file not exist, filepath ../storage/paxoslog/g0/vfile/20.f
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::RebuildIndex END rebuild ok, nowfileid 20
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ../storage/paxoslog/g0/vfile/19.f
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos8LogStoreE::Init ok, path ../storage/paxoslog/g0/vfile fileid 19 meta checksum 1676158829 nowfilesize 104857600 nowfilewriteoffs
et 52932498 )
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8DatabaseE::Init OK, db_path ../storage/paxoslog/g0
2016-11-22 09:03:51.11s Showy: PN8phxpaxos13MultiDatabaseE::Init OK, DBPath ../storage/paxoslog groupcount 1
2016-11-22 09:03:51.11s Showy: PN8phxpaxos5PNodeE::InitLogStorage OK, use default logstorage
2016-11-22 09:03:51.11s Showy: PN8phxpaxos5PNodeE::InitNetWork OK, use default network
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos18MasterStateMachineE::Init OK, master nodeid 2095519671909359521 version 6560819 expiretime 1222895188
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8DatabaseE::GetFromLevelDB LevelDB.Get not found, instanceid 18446744073709551614
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos20SystemVariablesStoreE::Read DB.Get not found, groupidx 0
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos9SystemVSME::Init variables not exist
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos9SystemVSME::RefleshNodeID ip 10.10.122.228 port 8097 nodeid 16463482425872228257
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos9SystemVSME::RefleshNodeID ip 10.10.123.130 port 8097 nodeid 9402119685132001185
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos9SystemVSME::RefleshNodeID ip 10.10.123.153 port 8097 nodeid 11059444348004343713
2016-11-22 09:03:51.11s Imp(0): PN8phxpaxos6ConfigE::Init OK
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos11MasterDamonE::TryBeMaster Ohter as master, can't try be master, masterid 2095519671909359521 myid 9402119685132001185
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos11MasterDamonE::run TryBeMaster, sleep time 3299ms
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8PaxosLogE::GetMaxInstanceIDFromLog OK, MaxInstanceID 6570821 groupidsx 0
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8LogStoreE::ParseFileID fileid 19 offset 52932087 checksum 2691965949
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ../storage/paxoslog/g0/vfile/19.f
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::Read ok, fileid 19 offset 52932087 instanceid 6570821 buffer size 399
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos13AcceptorStateE::Load GroupIdx 0 InstanceID 6570821 PromiseID 246 PromiseNodeID 2095519671909359521 AccectpedID 246 AcceptedN
odeID 2095519671909359521 ValueLen 359 Checksum 275208196
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8AcceptorE::Init OK
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8DatabaseE::GetFromLevelDB LevelDB.Get not found, instanceid 18446744073709551615
2016-11-22 09:03:51.11s ERR(0): PN8phxpaxos8DatabaseE::GetMinChosenInstanceID no min chosen instanceid
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8LogStoreE::ParseFileID fileid 0 offset 34 checksum 4187667134
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ../storage/paxoslog/g0/vfile/0.f
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::Read ok, fileid 0 offset 34 instanceid 0 buffer size 70
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos7CleanerE::FixMinChosenInstanceID ok, old minchosen 0 fix minchosen 0
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8InstanceE::Init Acceptor.OK, Log.InstanceID 6570821 Checkpoint.InstanceID 6497753
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8LogStoreE::ParseFileID fileid 19 offset 24157577 checksum 2042402563
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ../storage/paxoslog/g0/vfile/19.f
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8LogStoreE::Read ok, fileid 19 offset 24157577 instanceid 6497753 buffer size 399
2016-11-22 09:03:51.11s no need to sync checkpoint, skiptimes 1
2016-11-22 09:03:51.11s DEBUG(0): PN8phxpaxos8DatabaseE::GetFromLevelDB LevelDB.Get not found, instanceid 6497754
2016-11-22 09:03:51.11s Showy(0): PN8phxpaxos8PaxosLogE::ReadState DB.Get not found, groupidx 0
2016-11-22 09:03:51.11s ERR(0): PN8phxpaxos8InstanceE::PlayLog log read fail, instanceid 6497754 ret 1
from phxpaxos.
leveldb的数据丢了一个instance 6497754,原因未知。
你编译一下src/tools目录下的paxos_log_tools工具,检查一下6497753之后的数据丢失情况。
from phxpaxos.
检查了一下 6497753 到6570821(最大值)之间的数据
6497754 6498905 6530370 6540384 ,然后是65707722到6553500之间大概丢了16660个数据。
磁盘是普通的sata盘,bSync设置的是false
from phxpaxos.
期间机器是否有重启过?如果bSync设置为false并且机器重启的话是有可能出问题的。
目前的解决办法只能直接删掉paxos log数据重启了。
from phxpaxos.
bSync设置为false的话在sata盘上性能比较差的,
phxpaxos 是否可以提供一个stop的接口,这样重启前可以先存盘paxos log,另外如果能做到只丢后面的数据而不是中间的数据应该比较合适,这样不会影响服务启动,丢掉的数据能从集群中别的机器同步过来
from phxpaxos.
机器重启没有机会写磁盘的,比如突然机器断电。
另外一般情况应该也不会丢中间数据的,这里的情况应该也属于极端异常了。
要做到不丢数据,除了设置bSync为true,暂时没有好的方法,另外phxpaxos对sata盘的性能很差,如果要用在sata盘,建议可以自己重写存储模块。
from phxpaxos.
from phxpaxos.
暂时先在paxos_log_tools基础上加了个方法,从第一个丢失的开始到最大值都删除掉
from phxpaxos.
Related Issues (20)
- checkpoint机制是不是与多个group不好同时采用?因为多个group的话,每个group都有自己的镜像数据,新机器加入的话,难道每个group都要从旧机器接收镜像追赶进度,而每个group学完都会退出进程 HOT 2
- 请问batch propose 2KB 15W,压测条件为单条数据2KB再进行一定量的合并吗? HOT 2
- phxpaxos 如何保证latest read HOT 1
- 头文件找不到 HOT 1
- 构建环境可以简化不?
- TLA model?
- 局域网测试sample失败 HOT 1
- 一个log文件(.f文件)里面存放几个instance?
- 用户写请求,提交了上次选主超时的value,导致主的租约为0,出现失主现象
- 提案通过后,为啥leader节点先执行状态机
- 关于编译手册的疑问
- 切片收发问题 HOT 1
- 在云环境中与libfaketime的使用产生冲突问题
- 请问夺主时,BeforePropose起到什么作用?
- 如果发生master切换,如何保证新 master 的数据是最新的? HOT 2
- 当 输入paxos的value过长时会有问题吗? HOT 1
- CheckpointSender 线程回收问题 HOT 1
- checkpoint 接收新的状态机 paxos_log日志,重建index后, 在下次重启前以及checkpoint文件传输前,propose超时定时器超时,写入llstance到paxos_log HOT 2
- Project status
- sample phxkv 依赖问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phxpaxos.