Comments (5)
ok,谢谢!目前也忙得像狗,等缓过神若有增加一些特性,到时再提交上去
from flink-connector-clickhouse.
你好,下载代码测试,发现Source在不设置scan.partition.*等参数时,多并行度下,读取了N份数据出来。能如何解决这个问题呢?谢谢!
您好,能提供下具体配置和表相关情况不?
from flink-connector-clickhouse.
你好,下载代码测试,发现Source在不设置scan.partition.*等参数时,多并行度下,读取了N份数据出来。能如何解决这个问题呢?谢谢!
您好,能提供下具体配置和表相关情况不?
您好!
====================flink代码====================
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment()
.setRuntimeMode(RuntimeExecutionMode.BATCH);
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.executeSql(
"CREATE TABLE event_jian_ying_2 (\n" +
" arg
STRING,\n" +
" user_id
DECIMAL(11,0),\n" +
" event_time
TIMESTAMP\n" +
") WITH (\n" +
" 'connector' = 'clickhouse',\n" +
" 'username' = 'default',\n" +
" 'password' = '...',\n" +
" 'url' = '...',\n" +
" 'database-name' = '...',\n" +
" 'table-name' = 'event_jian_ying_2',\n" +
" 'sink.batch-size' = '500',\n" +
" 'sink.flush-interval' = '1000',\n" +
" 'sink.max-retries' = '3'" +
")");
Table table = tableEnv.sqlQuery("select * from event_jian_ying_2");
table.execute().print();
====================表结构====================
CREATE TABLE dxp.event_jian_ying_2
(
`user_id` UInt64,
`event_time` DateTime,
`arg` String
)
ENGINE = MergeTree
ORDER BY event_time
SETTINGS index_granularity = 8192
我看ClickHouseBatchInputFormat中的createInputSplits方法
@OverRide
public InputSplit[] createInputSplits(int minNumSplits) {
int splitNum = parameterValues != null ? parameterValues.length : minNumSplits;
return createGenericInputSplits(splitNum);
}
还没全看代码,parameterValues 不知道存在什么内容。但是minNumSplits这个的值是执行的并行度,若parameterValues 为空且我执行的查询语句是没分片的,这样每个Split执行同样的查询语句,就导致查询N份了。这种情况splitNum应该设置为1吧?
另外下面个人的建议
1、scan.partition.column能否支持“数值”跟“日期”?
2、scan.partition.num,若不设置取并行度
3、scan.partition.lower-bound、scan.partition.upper-bound若不传值,使用scan.partition.column去获取最小值与最大值,然后进行分片
谢谢!
from flink-connector-clickhouse.
@clisho , parameterValues
是存放 between ? and ?
所需的两个参数的;
scan.partition.column
当前只支持整型数值,其他暂时还没计划支持;- 在不设置
scan.partition.num
时没办法多并行度运行(不能确定数据以何种方式并行查找),分布式表除外,可以并行读取local 表; - 给
lower-bound
,upper-bound
赋默认值这个是个不错的功能,但近期没空闲时间做这块,欢迎帮实现该Feature;
多次读取相同数据是一个bug,计划今晚修复;
from flink-connector-clickhouse.
from flink-connector-clickhouse.
Related Issues (20)
- 配置properties.*属性,报错 HOT 1
- Contribute Clickhouse Connector To Apache Flink HOT 10
- Support Flink 1.17
- Support ClickHouse JDBC 0.6.0 HOT 1
- properties.* HOT 1
- 使用请教 HOT 5
- 我使用sql连接clickhouse查询的时候也出现了java.lang.ClassNotFoundException: org.apache.flink.connector.clickhouse.internal.ClickHouseBatchInputFormat HOT 1
- jar 包放进flink lib 目录出现classnotfind HOT 6
- flink cdc to clickhouse HOT 1
- 请问要在flink/lib目录放哪些包才可以运行呢? HOT 1
- 最新的1.71需要jdk11吗?用jdk8报错了 HOT 1
- flink-connector-clickhouse里面的sink怎么使用?有例子吗? HOT 2
- java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class java.lang.Number HOT 7
- I cannot use OPTION 'properties.*'. HOT 4
- Execution default of goal io.github.zentol.japicmp:japicmp-maven-plugin:0.16.0_m325:cmp failed: version can neither be null, empty nor blank HOT 2
- Add Apache Listen to File hard
- sink exception e.flink.connector.clickhouse.internal.ClickHouseBatchOutputFormat HOT 3
- how to pack a fat jar HOT 1
- 有人编译成功么,可以分享下么 HOT 1
- 使用flink 版本是1.19 flink cdc 时报错了,应该是序列化的问题 Caused by: java.lang.ClassNotFoundException: org.apache.flink.connector.clickhouse.internal.ClickHouseBatchOutputFormat HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flink-connector-clickhouse.