Comments (3)
Yes, this is another area where we need to add more documentation.
We use the BookKeeper rack-aware placement policy. The "rack" concept can be anything, (rack/region/availability zone). The base idea is: when forming the ensemble for a new ledger, pick bookies from different racks, to reduce the chances of data unavailability.
If bookies from different racks are not available, the policy falls back to chose randomly across available bookies.
The ZkBookieRackAffinityMapping
is a way to feed the rack information into the rack-aware policy. The JSON z-node content you've attached looks correct. The first level is the group-name, and can be used to provide bookie isolation (strictly picking from a subset of bookies).
You only need to update the /bookies
z-node when you add new bookies. There is no need to update it when processes are up or down, that is controlled by the list in /ledgers/available
.
If the /bookies
z-node is not there, the BK client will choose randomly. Same thing if all bookies are in the same "rack" (whatever that means).
So, in AWS, you'd probably want to create instances in different AZs and update the /bookies
z-node to reflect that.
My main question here is, are we doing it right ? 😛
Yes, just make sure you have bookies from different "racks" in the configuration.
And how can we check if it's working?
You should be able to verify that all ledgers should have the ensemble containing bookies from different AZs :
bin/bookkeeper shell ledgermetadata -ledgerid $MY_LEDGER_ID
Should we tell pulsar to use this class ?
It's automatically used when bookkeeperClientRackawarePolicyEnabled=true
is set on conf/broker.conf
.
from pulsar.
@merlimat We're getting lots of this messages:
2017-01-30 19:44:40,500 - WARN - [pulsar-io-39-5:RackawareEnsemblePlacementPolicy@526] - Failed to choose a bookie from /default-rack : excluded [], fallback to choose bookie randomly from the cluster.
Is this normal? It looks like our topolopgy is not being used. Contents of our znode /bookie
{"us-east-1":{"10.64.103.105:3181":{"rack":"us-east-1d","hostname":"ip-10-64-103-105.ec2.internal"},"10.64.102.228:3181":{"rack":"us-east-1c","hostname":"ip-10-64-102-228.ec2.internal"},"10.64.102.146:3181":{"rack":"us-east-1c","hostname":"ip-10-64-102-146.ec2.internal"},"10.64.103.213:3181":{"rack":"us-east-1e","hostname":"ip-10-64-103-213.ec2.internal"},"10.64.102.121:3181":{"rack":"us-east-1a","hostname":"ip-10-64-102-121.ec2.internal"},"10.64.102.43:3181":{"rack":"us-east-1a","hostname":"ip-10-64-102-43.ec2.internal"}}}
from pulsar.
Looking at a random ledger metadata, we can see the ensemble members are all from different AZs, but we don't know if it's just a coincidence or if the affinity is really working and that WARN is expected.
from pulsar.
Related Issues (20)
- [Bug] [cli] Pulsar-client cli doesn't support timeout values given in apache pulsar reference website
- [Bug] [cli] Pulsar Tokens Create is mishandling time units (specifically, treating seconds as milliseconds) HOT 1
- Excessive memory allocation in OTel broker metrics HOT 3
- [Bug] pulsar-admin 2.x links don't properly link to 3.2 HOT 6
- [Bug] Offload to S3 triggered manually returns success, while not uploading HOT 2
- [Bug] Broker became irresponsive due to deadlock during race-condition in metadatastore callback
- [Bug] Broker is failing to load stats-internal with broken schema ledger
- [Doc] Incorrect description of UniformLoadShedder in pulsar site. HOT 1
- [Doc] PIP-356: Improve "Support Geo-Replication starts at earliest position" doc
- [Bug][client] Consumer implementation might change message processing order when ack timeout is set
- High GC pause cases high publish latency HOT 3
- Flaky-test: ExtensibleLoadManagerImplTest.testGetMetrics (fails consistently)
- [Bug] bin/pulsar-zookeeper-ruok.sh fails with apachepulsar/pulsar:3.3.0 image
- [Bug] maven build fails with Java 22 HOT 1
- [Bug] nslookup in apachepulsar/pulsar:3.3.0 isn't compatible with kubernetes search domains
- Pulsar Standalone: --wipe-data does not work with RocksDB backend in 3.2.3
- [Doc] Search doesn't work on pulsar website HOT 2
- [Bug] [docs] Pulsar 3.3 javadoc is in Chinese HOT 1
- [Bug] Broker became irresponsive due to too many open files error HOT 2
- [Doc] Document the removal of compaction
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pulsar.