hangc0276 / bookkeeper Goto Github PK
View Code? Open in Web Editor NEWThis project forked from apache/bookkeeper
Apache Bookkeeper
Home Page: https://bookkeeper.apache.org
License: Apache License 2.0
This project forked from apache/bookkeeper
Apache Bookkeeper
Home Page: https://bookkeeper.apache.org
License: Apache License 2.0
FEATURE REQUEST
Please describe the feature you are requesting.
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have).
Are you currently using any workarounds to address this issue?
Provide any additional detail on your proposed use case for this feature.
For branch https://github.com/hangc0276/bookkeeper/tree/chenhag/4.14.4-for-pmem, when start the bookie and replay the journal files, it will throw OOM and the bookie start failed.
2022-03-02T17:08:12,889+0800 [main] ERROR org.apache.bookkeeper.common.component.AbstractLifecycleComponent - Failed to start Component: bookie-server
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) ~[?:?]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:349) ~[?:?]
at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:842) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:995) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:961) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1015) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:156) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.common.component.LifecycleComponentStack$$Lambda$207/0x00000001003e0040.accept(Unknown Source) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406) [com.google.guava-guava-30.1-jre.jar:?]
at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:85) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.server.Main.doMain(Main.java:234) [org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
We need the read rocksDB index latency to indicate whether the rocksDB cache is enough.
Add the following test into TestRackawareEnsemblePlacementPolicy.java
can reproduce the issue.
@Test
public void testNewEnsemblePolicyWithMultipleRacks() throws Exception {
BookieSocketAddress addr1 = new BookieSocketAddress("127.0.0.1", 3181);
BookieSocketAddress addr2 = new BookieSocketAddress("127.0.0.2", 3181);
BookieSocketAddress addr3 = new BookieSocketAddress("127.0.0.3", 3181);
BookieSocketAddress addr4 = new BookieSocketAddress("127.0.0.4", 3181);
BookieSocketAddress addr5 = new BookieSocketAddress("127.0.0.5", 3181);
// update dns mapping
StaticDNSResolver.addNodeToRack(addr1.getHostName(), "/default-region/r1");
StaticDNSResolver.addNodeToRack(addr2.getHostName(), "/default-region/r1");
StaticDNSResolver.addNodeToRack(addr3.getHostName(), "/default-region/r2");
StaticDNSResolver.addNodeToRack(addr4.getHostName(), "/default-region/r3");
//StaticDNSResolver.addNodeToRack(addr5.getHostName(), "/default-region/r1");
// Update cluster
Set<BookieId> addrs = new HashSet<BookieId>();
addrs.add(addr1.toBookieId());
addrs.add(addr2.toBookieId());
addrs.add(addr3.toBookieId());
//addrs.add(addr5.toBookieId());
addrs.add(addr4.toBookieId());
repp.onClusterChanged(addrs, new HashSet<BookieId>());
try {
int ensembleSize = 3;
int writeQuorumSize = 3;
int ackQuorumSize = 2;
Set<BookieId> excludeBookies = new HashSet<>();
//excludeBookies.add(addr4.toBookieId());
//excludeBookies.add(addr3.toBookieId());
for (int i = 0; i < 50; ++i) {
EnsemblePlacementPolicy.PlacementResult<List<BookieId>> ensembleResponse =
repp.newEnsemble(ensembleSize, writeQuorumSize,
ackQuorumSize, null, excludeBookies);
List<BookieId> ensemble = ensembleResponse.getResult();
if (ensemble.contains(addr1.toBookieId()) && ensemble.contains(addr2.toBookieId())) {
LOG.error("The same ensemble.");
ensemble.forEach(t -> {
LOG.info("[hangc] {}", t);
});
}
LOG.info("==========");
}
} catch (Exception e ){
LOG.error("failed ", e);
}
}
Currently, Pulsar has many components, including broker, bookie client, bookkeeper server, zookeeper, etc, each component has its own dashboard. When there are performance issues, it is a little difficult to check the metrics from each component.
I plan to create a performance dashboard to hold all the performance-related metrics.
Ensemble: 2-2-2
bin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m
to force recover the ledgerbin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m
to force recover the ledgerbin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m
to force recover the ledgerbin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m
to read messages and recover the ledger, the message can be read out and ledger keeps in CLOSED statebin/bookkeeper shell readledger -fe 160 -le 160 -l 1 -r -m
to read messages and recover the ledger, the message will throw read failed exception and ledger keeps in CLOSED state. It means the ledger's last segment can't be replicated.We can't deal with it except delete this ledger. When the ledger runs into IN_RECOVERY state and can't recover to CLOSED state, it means part of the ledger's data has been lost.
The only issue is that Pulsar's SkipUnRecoverableLedger flag can't cover this case and the topic load into the Pulsar broker failed.
Let me walk through the steps:
BP-34 introduced a metadata checker to validate whether the ledger fragment has the right placement policy. After the check, just report the check result without providing recovery actions when the ledgers placement policy is not satisfied.
Related PR: apache#1902
conf/bk_server.conf
to turn it on.LedgerId
to replicate, it will run getUnderreplicatedFragments
to get the target missed bookies fragment. After getting the fragment, it will trigger replicated operation.BUG REPORT
Describe the bug
比如对于某一个 ledger 来说,它在 bookie 集群中已经不存在了,但是在zk的underreplaction 的列表中还一直存在,所以 replication Worker 线程就会一直去尝试备份这个数据,但是这个数据又不在,导致备份失败,看这条失败信息的日志会一直持续几个月
需要确认几个点:
For ledger disk usage reaches 95%, the minor and major compaction stopped issue, it is used to protect the ledger disk due to the garbage collection occupying more storage size. We have two parameters to control this behavior.
isForceGCAllowWhenNoSpace
and forceAllowCompaction
Enable this two flags, bookie won't disable minor and major compaction when ledger disk usage reaches 90% and 95%. It will trigger major compaction when ledger disk usage reaches 90% and 95%. However, it has the risk of making the disk usage 100% and may introduce other unexpected issues. I prefer to enable these two flags, but we need the following changes: (Will discuss with @fantapsody @tuteng )
diskUsageThreshold=0.90
, diskUsageWarnThreshold=0.85
and diskUsageLwmThreshold=0.85
forceAllowCompaction
If we only enable this flag and disable isForceGCAllowWhenNoSpace
, the bookie will disable minor and major. But we can use the REST API command curl -XPUT http://<bookie-ip>:<port>/api/v1/bookie/gc -d '{"forceMajor": true}'
to trigger major or minor compaction when the bookie runs into read-only mode. (This feature is only support since BookKeeper 4.15.0+, and Pulsar 2.11.0+ )Minor Compaction: If one entrylog file's remaining data size is lower than this threshold, the entrylog file will be compacted. The default value is 0.2. For example, the total entry log file size is 1GB, 900 MB of data has been expired, and the remaining data size 100 MB, which is lower than 0.2. This entrylog file will be compacted. The compacted process follows the following steps:
Major Compaction: If one entrylog file's remaining data size is lower than this threshold, the entrylog file will be compacted. The default value is 0.5. The compaction process is the same as minor compaction. The more remaining data in the entry log file, the more extra disk space will be used during the compaction process.
diskUsageWarnThreshold: When the ledger disk usage reaches this threshold, the bookie will suspend major compaction. The default value is 0.90
diskUsageThreshold: When the ledger disk usage reaches this threshold, the bookie will run into read-only mode and suspend minor and major compaction. When the disk usage is lower than this threshold, resume minor compaction. The default value is 0.95
diskUsageLwmThreshold: When the ledger disk usage is lower than this threshold, the bookie will recover to read-write mode and resume major and minor compaction. The default value is 0.95
FEATURE REQUEST
Please describe the feature you are requesting.
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have).
Are you currently using any workarounds to address this issue?
Provide any additional detail on your proposed use case for this feature.
BUG REPORT
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
There are duplicated ledger usage metrics, remove the unused one.
Hello Everyone, after upgrading my pulsar cluster to 2.8.2 I have noticed that index deletion is sometimes taking around 60 seconds which cause the CPU to spike 100%, is it normal or there is any parameter I need to tune accordingly ? TIA
[2022-02-28T07:25:42.531Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:191 Deleting indexes for ledgers: [3385184, 3385239, 3385159, 3385142, 3385124, 3385193, 3384879, 3385165, 3385916]
[2022-02-28T07:26:34.089Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:266 Deleted indexes for 201065 entries from 9 ledgers in 51.557 seconds
[2022-02-28T07:40:42.534Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:191 Deleting indexes for ledgers: [3385379, 3385367, 3385718, 3385365, 3385412, 3385167, 3385357, 3386141]
[2022-02-28T07:41:47.867Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:266 Deleted indexes for 134590 entries from 8 ledgers in 65.332 seconds
Disk usage 预留部分空间给compaction, 否则,当ledger disk达到阈值后,compaction无法工作
Current bookie's GC can only be manually triggered by REST API one by one. We need a command tool to trigger all the bookie's GC in one command
Currently, Bookie's lastAddConfirm logic is a little complicated, and in some cases, the lastAddConfirm result is not a fixed value. When some users find this not a fixed value issue, they may be confused about the bookie's protocol, we need a blog deep into the detail of how bookie lastAddConfirm works and why lastAddConfirm is not a fixed value is reasonable.
YanZhao will take over this blog.
BUG REPORT
Describe the bug
r1 -> bk1, bk2
r2 -> bk3
r3 -> bk4
enable EnforceMinNumRacksPerWriteQuorum and set minNumRacksPerWriteQuorumConfValue=2
When bk3 or bk4 is quarantined, the new ensemble selection should succeed, because it fulfills min rack is 2.
However, the new ensemble selection result some times will fail.
Run the following unit test in TestRackawareEnsemblePlacementPolicy.java
can reproduce the bug.
final int minNumRacksPerWriteQuorumConfValue = 2;
conf.setMinNumRacksPerWriteQuorum(minNumRacksPerWriteQuorumConfValue);
conf.setEnforceMinNumRacksPerWriteQuorum(true);
@Test
public void testNewEnsemblePolicyWithMultipleRacksV2() throws Exception {
BookieSocketAddress addr1 = new BookieSocketAddress("127.0.0.1", 3181);
BookieSocketAddress addr2 = new BookieSocketAddress("127.0.0.2", 3181);
BookieSocketAddress addr3 = new BookieSocketAddress("127.0.0.3", 3181);
BookieSocketAddress addr4 = new BookieSocketAddress("127.0.0.4", 3181);
BookieSocketAddress addr5 = new BookieSocketAddress("127.0.0.5", 3181);
// update dns mapping
StaticDNSResolver.addNodeToRack(addr1.getHostName(), "/default-region/r1");
StaticDNSResolver.addNodeToRack(addr2.getHostName(), "/default-region/r1");
StaticDNSResolver.addNodeToRack(addr3.getHostName(), "/default-region/r2");
StaticDNSResolver.addNodeToRack(addr4.getHostName(), "/default-region/r3");
//StaticDNSResolver.addNodeToRack(addr5.getHostName(), "/default-region/r1");
// Update cluster
Set<BookieId> addrs = new HashSet<BookieId>();
addrs.add(addr1.toBookieId());
addrs.add(addr2.toBookieId());
addrs.add(addr3.toBookieId());
//addrs.add(addr5.toBookieId());
addrs.add(addr4.toBookieId());
repp.onClusterChanged(addrs, new HashSet<BookieId>());
try {
int ensembleSize = 3;
int writeQuorumSize = 3;
int ackQuorumSize = 2;
Set<BookieId> excludeBookies = new HashSet<>();
excludeBookies.add(addr4.toBookieId());
//excludeBookies.add(addr3.toBookieId());
for (int i = 0; i < 50; ++i) {
EnsemblePlacementPolicy.PlacementResult<List<BookieId>> ensembleResponse =
repp.newEnsemble(ensembleSize, writeQuorumSize,
ackQuorumSize, null, excludeBookies);
List<BookieId> ensemble = ensembleResponse.getResult();
ensemble.forEach(t -> {
LOG.info("[hangc] {}", t);
});
LOG.info("==========");
}
} catch (Exception e ){
LOG.error("failed ", e);
}
}
When one bookie is lost, the auditor will mark the ledgers, which belong to the lost bookie, as under replicate state, and waiting for the replication worker to replicate.
However, the lost bookie comes back soon, is there any way to cancel those under replicate ledgers?
I guess it will be filtered by https://github.com/apache/bookkeeper/blob/8eb26dbeb0988d04136d63805eccd9d466d309b9/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L371
Hello everyone, I am getting below exception in autorecovery and have 2570 ledgers to recover.
I have autoskipnonrecoverabledata also set to true, Can anyone please help ?
07:30:00.326 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadEntryProcessorV3 - IOException while reading entry: 0 from ledger 1402504
java.io.IOException: org.apache.bookkeeper.bookie.EntryLogger$EntryLookupException$MissingLogFileException: Missing entryLog 78 for ledgerId 1402504, entry 0 at offset 693965588
at org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:836) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:860) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getEntry(SingleDirectoryDbLedgerStorage.java:452) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
For https://github.com/hangc0276/bookkeeper/tree/chenhag/4.14.4-for-pmem , we use pmem plugin and integrate with pulsar, it will block the whole process when write cache trigger flush.
pulsarctl-bookie_rackinfo -b 127.0.0.1:3181 -z 127.0.0.1:2281 -r /test-region/test-rack
(Note: for RackAwarePlacementPolicy, the rack name /test-region/test-rack
is not allowed in bin/pulsar-admin
, but allowed in pulsarctl-bookie-rackinfo
command. So we use pulsarctl-bookie-rackinfo
command to set rack info)After execute the above command, the broker throws the following exception, but new ledger can be created.
2023-07-13T18:33:03,327+0800 [main-EventThread] INFO org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T18:33:03,406+0800 [main-EventThread] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T18:33:03,438+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 1
Expected number of leaves:1
/default-rack/127.0.0.1:3182
2023-07-13T18:33:03,441+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
2023-07-13T18:34:48,787+0800 [pulsar-registration-client-33-1] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3182
2023-07-13T18:34:56,751+0800 [main-EventThread] INFO org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,764+0800 [main-EventThread] INFO org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.87.Final.jar:4.1.87.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
The ledger created failed with the following logs
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/tt/persistent/t2-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 116, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 50, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
pulsarctl-bookie_rackinfo -b 127.0.0.1:3182 -z 127.0.0.1:2281 -r /test-region/test-rack
. The broker will throw the following exception and new ledger still can't be created out.2023-07-13T19:12:10,051+0800 [main-EventThread] INFO org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,061+0800 [main-EventThread] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,061+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0
2023-07-13T19:12:10,062+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
2023-07-13T19:12:10,075+0800 [main-EventThread] INFO org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,078+0800 [main-EventThread] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0
2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
Broker logs
pulsar-broker-MacBook-Pro-3.lan.log
BUG REPORT
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
There are a lot of tests using Thread.sleep()
, which may lead to flaky tests. Use await to make them more strong.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.