Comments (4)
In short, if implementing the batch iceberg source takes much time due to its complexity, a parquet file source with decent performance is good enough to help move the POC forward. The user will consider switching RW only if RW's iceberg batch source is fast enough.
Must the file format be Parquet? Is it possible to use a CSV file that has been supported in our file source? If it is ok to test a CSV file first, we can support file source batch read first, to test the performance. BTW, I tested insert select from a RisingWave table to another RisingWave table last week. Can we just compare the streaming load from Kafka to a table with insert select from a table to another table?
from risingwave.
Just try to provide the complete context, the details of the POC user request can be found: https://www.notion.so/risingwave-labs/optimize-parquet-source-for-batch-load-dc498a043d504621bf56461690b14bd7?d=84ebdf5d7469412680278059c5898be8
In short, if implementing the batch iceberg source takes much time due to its complexity, a parquet file source with decent performance is good enough to help move the POC forward. The user will consider switching RW only if RW's iceberg batch source is fast enough.
from risingwave.
Is it possible to use a CSV file that has been supported in our file source? If it is ok to test a CSV file first,
Good point, I think it is ok as CSV is a less efficient format than Parquet in terms of read and write performance.
If we achieve decent enough performance for CSV files, we can be even faster when using Parquet. I guess it is a very convincing argument to the POC user.
BTW, I tested insert select from a RisingWave table to another RisingWave table last week. Can we just compare the streaming load from Kafka to a table with insert select from a table to another table?
I can try to communicate this first. The closer to the user's real use case, the better, but definitely nothing wrong if we use what we have at the moment, could you post the link to the last week's results? 🙏
I am also thinking of adding this to the daily performance tests
from risingwave.
I can try to communicate this first. The closer to the user's real use case, the better, but definitely nothing wrong if we use what we have at the moment, could you post the link to the last week's results? 🙏 I am also thinking of adding this to the daily performance tests
from risingwave.
Related Issues (20)
- Longevity test CN and Meta OOM nightly-20240201 HOT 6
- Allow `WHERE` clause when create table. HOT 1
- Postgres CDC: `uuid` column cannot ingest into `varchar` column
- SQL syntax error: Issue with the AUTHORIZATION keyword in CREATE SCHEMA statement
- perf: improve nexmark q16 scaling up performance HOT 1
- perf: improve nexmark q8(EOWC) (w/ and w/o scaling up) performance HOT 1
- perf: improve nexmark q101 scaling up performance HOT 2
- Chaos-mesh test CN-0 paniced after kubebench start HOT 1
- nightly-20240130 nexmark-q5-many-windows perf degradation HOT 3
- WASM UDF failed in e2e-standalone-test with error: `rustup could not choose a version of cargo to run`
- Background ddl tests failed with: `scale actors failed error=the materialized view of fragment 401008 is in state INITIAL` HOT 2
- Refactor: reuse `single_node` mode for `playground` and `standalone` HOT 6
- PG CDC data checksums inconsistent HOT 2
- perf: improve nexmark q102 scaling up performance HOT 2
- feat: cast `Serial` to `Bigint`
- cherrypick chore: remove table properties in proto (#14794) to branch release-1.7
- E2e test deterministic simulation - recovery failed: `thread '<unnamed>' panicked at src/meta/src/stream/source_manager.rs:677:66:`
- Bug (compactor): Investigate compactor oom bug and improve memory allocation HOT 1
- bug: breaking changes to protobuf by WIRE rules may cause backwards compatibility in sql-backend HOT 1
- Forbid changes to protobuf structs' field names that are persited in meta
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from risingwave.