hangc0276 / bookkeeper Goto Github PK

This project forked from apache/bookkeeper

Apache Bookkeeper

Home Page: https://bookkeeper.apache.org

License: Apache License 2.0

Shell 1.11% Java 96.68% Python 1.29% Makefile 0.04% JavaScript 0.07% Groovy 0.29% C++ 0.11% C 0.07% Roff 0.24% Thrift 0.01% Dockerfile 0.08% SCSS 0.01%

bookkeeper's People

Contributors

Watchers

Forkers

lujiwen

bookkeeper's Issues

Migrate BookieRackAffinityMapping into BookKeeper repo

FEATURE REQUEST

Please describe the feature you are requesting.
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have).
Are you currently using any workarounds to address this issue?
Provide any additional detail on your proposed use case for this feature.

When reply the journal log on bookie start, it will start failed.

Description

For branch https://github.com/hangc0276/bookkeeper/tree/chenhag/4.14.4-for-pmem, when start the bookie and replay the journal files, it will throw OOM and the bookie start failed.

2022-03-02T17:08:12,889+0800 [main] ERROR org.apache.bookkeeper.common.component.AbstractLifecycleComponent - Failed to start Component: bookie-server
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) ~[?:?]
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:349) ~[?:?]
        at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:842) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:995) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:961) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1015) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:156) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.common.component.LifecycleComponentStack$$Lambda$207/0x00000001003e0040.accept(Unknown Source) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406) [com.google.guava-guava-30.1-jre.jar:?]
        at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:85) [org.apache.bookkeeper-bookkeeper-common-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.server.Main.doMain(Main.java:234) [org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]
        at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4-pmem-SNAPSHOT.jar:4.14.4-pmem-SNAPSHOT]

Add read latency of the rocksDB index.

Motivation

We need the read rocksDB index latency to indicate whether the rocksDB cache is enough.

RackAware placement policy can not try-my-best to ensure replicas located into different racks.

Describe

Add the following test into TestRackawareEnsemblePlacementPolicy.java can reproduce the issue.

@Test
    public void testNewEnsemblePolicyWithMultipleRacks() throws Exception {
        BookieSocketAddress addr1 = new BookieSocketAddress("127.0.0.1", 3181);
        BookieSocketAddress addr2 = new BookieSocketAddress("127.0.0.2", 3181);
        BookieSocketAddress addr3 = new BookieSocketAddress("127.0.0.3", 3181);
        BookieSocketAddress addr4 = new BookieSocketAddress("127.0.0.4", 3181);
        BookieSocketAddress addr5 = new BookieSocketAddress("127.0.0.5", 3181);
        // update dns mapping
        StaticDNSResolver.addNodeToRack(addr1.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr2.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr3.getHostName(), "/default-region/r2");
        StaticDNSResolver.addNodeToRack(addr4.getHostName(), "/default-region/r3");
        //StaticDNSResolver.addNodeToRack(addr5.getHostName(), "/default-region/r1");
        // Update cluster
        Set<BookieId> addrs = new HashSet<BookieId>();
        addrs.add(addr1.toBookieId());
        addrs.add(addr2.toBookieId());
        addrs.add(addr3.toBookieId());
        //addrs.add(addr5.toBookieId());
        addrs.add(addr4.toBookieId());
        repp.onClusterChanged(addrs, new HashSet<BookieId>());

        try {
            int ensembleSize = 3;
            int writeQuorumSize = 3;
            int ackQuorumSize = 2;

            Set<BookieId> excludeBookies = new HashSet<>();
            //excludeBookies.add(addr4.toBookieId());
            //excludeBookies.add(addr3.toBookieId());

            for (int i = 0; i < 50; ++i) {
                EnsemblePlacementPolicy.PlacementResult<List<BookieId>> ensembleResponse =
                    repp.newEnsemble(ensembleSize, writeQuorumSize,
                        ackQuorumSize, null, excludeBookies);
                List<BookieId> ensemble = ensembleResponse.getResult();

                if (ensemble.contains(addr1.toBookieId()) && ensemble.contains(addr2.toBookieId())) {
                    LOG.error("The same ensemble.");
                    ensemble.forEach(t -> {
                        LOG.info("[hangc] {}", t);
                    });
                }
                LOG.info("==========");
            }
        } catch (Exception e ){
            LOG.error("failed ", e);
        }
    }

Create a performance dashboard for Pulsar

Motivation

Currently, Pulsar has many components, including broker, bookie client, bookkeeper server, zookeeper, etc, each component has its own dashboard. When there are performance issues, it is a little difficult to check the metrics from each component.

I plan to create a performance dashboard to hold all the performance-related metrics.

How to generate IN_RECOVERY state ledgers

Ensemble: 2-2-2

Case 1

Ledger1 (0={bk1, bk2}, 150={bk2, bk3}) OPEN state
Shutdown bk2 and bk3
Use command bin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m to force recover the ledger
Force recover will fail and the Ledger keeps in IN_RECOVERY state, and can't be recovered

Case 2

Ledger1 (0={bk1, bk2}) OPEN state
Shutdown bk1 and bk2
Use command bin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m to force recover the ledger
Force recover will fail and the Ledger keeps in IN_RECOVERY state, and can't be recovered

Case3

Ledger 1 (0={bk1, bk2}, 150={bk2, bk3})
Shutdown bk2 and bk3
Use command bin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m to force recover the ledger
Force recover will fail and the Ledger keeps in IN_RECOVERY state, and can't be recovered
Unload the topic
Ledger 1 will be marked as CLOSED state and lastEntry will be set (for example, lastEntry will be set to 220) -> The Ledger's CLOSED state is update by BookKeeper client when the topic unloaded
Load the topic failed even though we enabled SkipUnRecoverableLedger
Use command bin/bookkeeper shell readledger -fe 1 -le 1 -l 1 -r -m to read messages and recover the ledger, the message can be read out and ledger keeps in CLOSED state
Use command bin/bookkeeper shell readledger -fe 160 -le 160 -l 1 -r -m to read messages and recover the ledger, the message will throw read failed exception and ledger keeps in CLOSED state. It means the ledger's last segment can't be replicated.

How to deal with IN_RECOVERY state ledgers in decommission

We can't deal with it except delete this ledger. When the ledger runs into IN_RECOVERY state and can't recover to CLOSED state, it means part of the ledger's data has been lost.

The only issue is that Pulsar's SkipUnRecoverableLedger flag can't cover this case and the topic load into the Pulsar broker failed.

Pulsar SkipUnRecoverableLedger doesn't work

Ledger 1 (0={bk1, bk2}, 150={bk2, bk3})
Shutdown bk2 and bk3
Unload the topic
Ledger 1 will be marked as CLOSED state
Load the topic failed even though we enabled SkipUnRecoverableLedger

Auto Recovery may lead to bookie direct memory leak

Description

When auto recovery does replication, some bookies' direct memory keeps increasing.

Waiting for more context.

BookKeeper auto recovery support placement policy detect and recover

Motivation

Let me walk through the steps:

They have two zones, they have a rack aware policy that ensures it writes across two zones
They had some data on a topic with long retention
They ran a disaster recovery test, during this test, they shut down one zone
During the period of the DR test, auto-recovery ran. Because the DR test only has one zone active, and because the default of auto-recovery is to do rack aware with the best effort, it recovered up to an expected number of replicas
They stopped the DR test and all was well, but now that ledger was only on one zone
They ran another DR test, this time basically moving data to the prod zone, but now data is missing because it is all only on one zone

Solutions

BP-34 introduced a metadata checker to validate whether the ledger fragment has the right placement policy. After the check, just report the check result without providing recovery actions when the ledgers placement policy is not satisfied.

Related PR: apache#1902

Modifications

In Auditor

The placement checker is disabled by default, add a parameter in conf/bk_server.conf to turn it on.
Add a flag to control whether to mark the placement policy unsatisfied ledgers as under-replicated.

Auto Recovery

For the current replicator, after getting the LedgerId to replicate, it will run getUnderreplicatedFragments to get the target missed bookies fragment. After getting the fragment, it will trigger replicated operation.

https://github.com/apache/bookkeeper/blob/8eb26dbeb0988d04136d63805eccd9d466d309b9/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L371

Under replicated ledger一直无法恢复

BUG REPORT

Describe the bug

比如对于某一个 ledger 来说，它在 bookie 集群中已经不存在了，但是在zk的underreplaction 的列表中还一直存在，所以 replication Worker 线程就会一直去尝试备份这个数据，但是这个数据又不在，导致备份失败，看这条失败信息的日志会一直持续几个月

需要确认几个点：

处于under replicate状态的ledger，如果ledger被过期了，是不是就永远无法从under replicate状态中恢复了
replication worker需要兼容这种case, 如果ledger已经不存在了，就要将under replicate状态移除

How to deal with ledger disk reaches 95%

For ledger disk usage reaches 95%, the minor and major compaction stopped issue, it is used to protect the ledger disk due to the garbage collection occupying more storage size. We have two parameters to control this behavior.

isForceGCAllowWhenNoSpace and forceAllowCompaction Enable this two flags, bookie won't disable minor and major compaction when ledger disk usage reaches 90% and 95%. It will trigger major compaction when ledger disk usage reaches 90% and 95%. However, it has the risk of making the disk usage 100% and may introduce other unexpected issues. I prefer to enable these two flags, but we need the following changes: (Will discuss with @fantapsody @tuteng )
- Change diskUsageThreshold=0.90, diskUsageWarnThreshold=0.85 and diskUsageLwmThreshold=0.85
forceAllowCompaction If we only enable this flag and disable isForceGCAllowWhenNoSpace, the bookie will disable minor and major. But we can use the REST API command curl -XPUT http://<bookie-ip>:<port>/api/v1/bookie/gc -d '{"forceMajor": true}' to trigger major or minor compaction when the bookie runs into read-only mode. (This feature is only support since BookKeeper 4.15.0+, and Pulsar 2.11.0+ )

Knowledge

Minor Compaction: If one entrylog file's remaining data size is lower than this threshold, the entrylog file will be compacted. The default value is 0.2. For example, the total entry log file size is 1GB, 900 MB of data has been expired, and the remaining data size 100 MB, which is lower than 0.2. This entrylog file will be compacted. The compacted process follows the following steps:
- Bookie will read the remaining 100MB of data from this entrylog file and write to a new entrylog file
- Delete the old entrylog file.
Major Compaction: If one entrylog file's remaining data size is lower than this threshold, the entrylog file will be compacted. The default value is 0.5. The compaction process is the same as minor compaction. The more remaining data in the entry log file, the more extra disk space will be used during the compaction process.
diskUsageWarnThreshold: When the ledger disk usage reaches this threshold, the bookie will suspend major compaction. The default value is 0.90
diskUsageThreshold: When the ledger disk usage reaches this threshold, the bookie will run into read-only mode and suspend minor and major compaction. When the disk usage is lower than this threshold, resume minor compaction. The default value is 0.95
diskUsageLwmThreshold: When the ledger disk usage is lower than this threshold, the bookie will recover to read-write mode and resume major and minor compaction. The default value is 0.95

Auditor add rack/region placement policy check

Make target ledger fulfill rack/region placement policy depends on auto recovery
1. How to trigger recover ledger
2. After recovered, how to clean up redundancy replicas
3. How to mark a ledger into underreplicate state
Auditor support rack/region placement policy check

Website doc for direct IO

FEATURE REQUEST

Please describe the feature you are requesting.
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have).
Are you currently using any workarounds to address this issue?
Provide any additional detail on your proposed use case for this feature.

BookKeeper throughput can't reach up under Pmem disk

The bookie profile is

flamegraph.html.zip

BK simple test support read and delete written ledgers

BUG REPORT

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Remove unnecessary disk usage metrics

Description

There are duplicated ledger usage metrics, remove the unused one.

https://github.com/apache/bookkeeper/blob/c3e5bfe431c1d0c6b0dba6d56b6b1b19dec0ecdd/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/LedgerDirsManager.java#L66-L93

Index deletion makes CPU to spike 100%

Hello Everyone, after upgrading my pulsar cluster to 2.8.2 I have noticed that index deletion is sometimes taking around 60 seconds which cause the CPU to spike 100%, is it normal or there is any parameter I need to tune accordingly ? TIA
[2022-02-28T07:25:42.531Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:191 Deleting indexes for ledgers: [3385184, 3385239, 3385159, 3385142, 3385124, 3385193, 3384879, 3385165, 3385916]
[2022-02-28T07:26:34.089Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:266 Deleted indexes for 201065 entries from 9 ledgers in 51.557 seconds
[2022-02-28T07:40:42.534Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:191 Deleting indexes for ledgers: [3385379, 3385367, 3385718, 3385365, 3385412, 3385167, 3385357, 3386141]
[2022-02-28T07:41:47.867Z] INFO db-storage-cleanup-10-1 EntryLocationIndex:266 Deleted indexes for 134590 entries from 8 ledgers in 65.332 seconds

Disk usage 预留部分空间给compaction

Disk usage 预留部分空间给compaction, 否则，当ledger disk达到阈值后，compaction无法工作

BookKeeper command support trigger GC for all the bookies

Motivation

Current bookie's GC can only be manually triggered by REST API one by one. We need a command tool to trigger all the bookie's GC in one command

[Blog] Explain how bookie lastAddConfirm works

Motivation

Currently, Bookie's lastAddConfirm logic is a little complicated, and in some cases, the lastAddConfirm result is not a fixed value. When some users find this not a fixed value issue, they may be confused about the bookie's protocol, we need a blog deep into the detail of how bookie lastAddConfirm works and why lastAddConfirm is not a fixed value is reasonable.

YanZhao will take over this blog.

RackAware placement policy can not ensure new ensemble selection succeed when one rack goes down.

BUG REPORT

Describe the bug

r1 -> bk1, bk2
r2 -> bk3
r3 -> bk4

enable EnforceMinNumRacksPerWriteQuorum and set minNumRacksPerWriteQuorumConfValue=2

When bk3 or bk4 is quarantined, the new ensemble selection should succeed, because it fulfills min rack is 2.
However, the new ensemble selection result some times will fail.

Run the following unit test in TestRackawareEnsemblePlacementPolicy.java can reproduce the bug.

final int minNumRacksPerWriteQuorumConfValue = 2;
conf.setMinNumRacksPerWriteQuorum(minNumRacksPerWriteQuorumConfValue);
conf.setEnforceMinNumRacksPerWriteQuorum(true);

@Test
    public void testNewEnsemblePolicyWithMultipleRacksV2() throws Exception {
        BookieSocketAddress addr1 = new BookieSocketAddress("127.0.0.1", 3181);
        BookieSocketAddress addr2 = new BookieSocketAddress("127.0.0.2", 3181);
        BookieSocketAddress addr3 = new BookieSocketAddress("127.0.0.3", 3181);
        BookieSocketAddress addr4 = new BookieSocketAddress("127.0.0.4", 3181);
        BookieSocketAddress addr5 = new BookieSocketAddress("127.0.0.5", 3181);
        // update dns mapping
        StaticDNSResolver.addNodeToRack(addr1.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr2.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr3.getHostName(), "/default-region/r2");
        StaticDNSResolver.addNodeToRack(addr4.getHostName(), "/default-region/r3");
        //StaticDNSResolver.addNodeToRack(addr5.getHostName(), "/default-region/r1");
        // Update cluster
        Set<BookieId> addrs = new HashSet<BookieId>();
        addrs.add(addr1.toBookieId());
        addrs.add(addr2.toBookieId());
        addrs.add(addr3.toBookieId());
        //addrs.add(addr5.toBookieId());
        addrs.add(addr4.toBookieId());
        repp.onClusterChanged(addrs, new HashSet<BookieId>());

        try {
            int ensembleSize = 3;
            int writeQuorumSize = 3;
            int ackQuorumSize = 2;

            Set<BookieId> excludeBookies = new HashSet<>();
            excludeBookies.add(addr4.toBookieId());
            //excludeBookies.add(addr3.toBookieId());

            for (int i = 0; i < 50; ++i) {
                EnsemblePlacementPolicy.PlacementResult<List<BookieId>> ensembleResponse =
                    repp.newEnsemble(ensembleSize, writeQuorumSize,
                        ackQuorumSize, null, excludeBookies);
                List<BookieId> ensemble = ensembleResponse.getResult();

                ensemble.forEach(t -> {
                    LOG.info("[hangc] {}", t);
                });
                LOG.info("==========");
            }
        } catch (Exception e ){
            LOG.error("failed ", e);
        }
    }

Need to cancel under replicated ledgers when lost bookie come back

Motivation

When one bookie is lost, the auditor will mark the ledgers, which belong to the lost bookie, as under replicate state, and waiting for the replication worker to replicate.

However, the lost bookie comes back soon, is there any way to cancel those under replicate ledgers?

I guess it will be filtered by https://github.com/apache/bookkeeper/blob/8eb26dbeb0988d04136d63805eccd9d466d309b9/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L371

BK AutoRecovery failed

Hello everyone, I am getting below exception in autorecovery and have 2570 ledgers to recover.
I have autoskipnonrecoverabledata also set to true, Can anyone please help ?

07:30:00.326 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadEntryProcessorV3 - IOException while reading entry: 0 from ledger 1402504
java.io.IOException: org.apache.bookkeeper.bookie.EntryLogger$EntryLookupException$MissingLogFileException: Missing entryLog 78 for ledgerId 1402504, entry 0 at offset 693965588
	at org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:836) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:860) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.getEntry(SingleDirectoryDbLedgerStorage.java:452) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]

Need to comfirm

bookie shutdown的时候，是否会将memtable中的数据flush下去
bookie重启之后，compaction特别快，但是正常触发时，compaction删除很慢
bk rest server 端口和prometheus端口一样，配置不一样的话，会覆盖prometheus端口
bin/bookkeeper shell支持触发compaction

When write cache is full and trigger flush, the write will be blocked

Description

For https://github.com/hangc0276/bookkeeper/tree/chenhag/4.14.4-for-pmem , we use pmem plugin and integrate with pulsar, it will block the whole process when write cache trigger flush.

Failed to create new Ensemble

Steps to reproduce

Set up a Pulsar cluster with 1 Zookeeper, 2 Bookies (bookie0 -> 127.0.0.1:3181, bookie1 -> 127.0.0.1:3182) and 1 Broker
Set Bookie with RackAwarePlacementPolicy and doesn't configure rack info
Set Broker's E-W-A to 2-2-2
Use pulsarctl-bookie-rackinfo command to set rack info for bookie0 pulsarctl-bookie_rackinfo -b 127.0.0.1:3181 -z 127.0.0.1:2281 -r /test-region/test-rack (Note: for RackAwarePlacementPolicy, the rack name /test-region/test-rack is not allowed in bin/pulsar-admin, but allowed in pulsarctl-bookie-rackinfo command. So we use pulsarctl-bookie-rackinfo command to set rack info)

After execute the above command, the broker throws the following exception, but new ledger can be created.

2023-07-13T18:33:03,327+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T18:33:03,406+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T18:33:03,438+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 1
Expected number of leaves:1
/default-rack/127.0.0.1:3182

2023-07-13T18:33:03,441+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]

Restart bookie1. The broker throws the following exception and new ledgers can't be created

2023-07-13T18:34:48,787+0800 [pulsar-registration-client-33-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3182
2023-07-13T18:34:56,751+0800 [main-EventThread] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,764+0800 [main-EventThread] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.87.Final.jar:4.1.87.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]

The ledger created failed with the following logs

2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/tt/persistent/t2-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 116, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 50, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].

Use the following command to set bookie1's rack info with the same format as bookie0 pulsarctl-bookie_rackinfo -b 127.0.0.1:3182 -z 127.0.0.1:2281 -r /test-region/test-rack. The broker will throw the following exception and new ledger still can't be created out.

2023-07-13T19:12:10,051+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,061+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,061+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T19:12:10,062+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
2023-07-13T19:12:10,075+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,078+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]

Restart the broker, this issue is resolved and new ledgers can be created.

Broker logs
pulsar-broker-MacBook-Pro-3.lan.log

Add website doc for BookieStateReadOnlyService

BUG REPORT

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Replace Thread.sleep with Await

Motivation

There are a lot of tests using Thread.sleep(), which may lead to flaky tests. Use await to make them more strong.

hangc0276 / bookkeeper Goto Github PK

bookkeeper's People

Contributors

Watchers

Forkers

bookkeeper's Issues

Description

Motivation

Describe

Motivation

How to generate IN_RECOVERY state ledgers

Case 1

Case 2

Case3

How to deal with IN_RECOVERY state ledgers in decommission

Pulsar SkipUnRecoverableLedger doesn't work

Description

Motivation

Solutions

Modifications

In Auditor

Auto Recovery

Knowledge

Description

Motivation

Motivation

Motivation

Description

Steps to reproduce

Motivation

Recommend Projects

Recommend Topics

Recommend Org