Comments (3)
You shouldn't need to modify the entrypoint, as most of these configuration options already accept environment variables.
As an example, you can set the number of workers per process using the environment variable BYTEWAX_WORKERS_PER_PROCESS
.
from bytewax.
Hi rafael.
The connector does not let kafka handle the offset with the group.id
mechanism by default because we keep track of it in each partition internally. Bytewax does that to ensure that in the event of a restart, the dataflow can resume from the latest known checkpoint and ensure that all the messages will be processed at least once.
If you don't care about recovery, you can set group.id
and enable auto.commit
. But that means that when your dataflow restarts, you loose all the messages that were received but not processed, loosing the at-least-once
semantics the operator normally offers.
The nice thing is that you don't need to use group.id
to achieve the parallelization level you want, the connector already does that and creates a bytewax input partition for each partition in each topic, so if you have a single topic with 10 partitions, you can run the dataflow with 10 workers (or 10 processes), and each one will handle a single kafka partition.
The warning is related to the fact the we pass the same config used in the consumer to an AdminClient we use when bootstrapping the dataflow to retrieve the list of partitions for each topic (see here). That client is seen as a "producer", so you get the warning, but you can safely ignore that.
from bytewax.
Thanks for the response. @Psykopear
If I understand correctly, I will have to modify the entrypoint.sh used by the Image bytewax/bytewax:0.18.1-python3.11
#!/bin/sh
cd $BYTEWAX_WORKDIR
. /venv/bin/activate
python -m bytewax.run -w$NUM_WORKERS $BYTEWAX_PYTHON_FILE_PATH
echo 'Process ended.'
if [ "$BYTEWAX_KEEP_CONTAINER_ALIVE" = true ]
then
echo 'Keeping container alive...';
while :; do sleep 1; done
fi
NUM_WORKERS is the known number of partitions of a given topic.
so In a given Dockerfile I will just need to
FROM bytewax/bytewax:0.18.1-python3.11
ENV NUM_WORKERS=10
COPY entrypoint.sh /bytewax/entrypoint.sh
from bytewax.
Related Issues (20)
- [FEATURE] Expose window metadata HOT 1
- Python code examples in Rust code are outdated HOT 1
- Separate `epoch_interval` and `snapshot_interval`
- `count_window` only sends the count to the event clock's `dt_getter` function
- [FEATURE] Release a Python 3.12 wheel HOT 1
- Provide worker count and worker index in list_parts of FixedPartitionedSource HOT 2
- Allow intra-file source parallelism HOT 1
- Some mechanism for queuing batch source partition reads
- Backup interval example from docs does not work
- [FEATURE] - Add Auth to RedpandaSchemaRegistry HOT 1
- Bytewax does not scale in case of single process and multiple workers HOT 3
- Order of streams in SessionWindow HOT 3
- RTD flyout view page source link is broken for API docs HOT 4
- Calendar windower
- Data missing from windows when `align_to` is very long ago
- [FEATURE] Make CodSpeed work HOT 3
- Inconsistent SessionWindow output HOT 8
- Prometheus monitoring fails because of the corrupted metrics response HOT 2
- [FEATURE] Visualize data flow graph HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bytewax.