Coder Social home page Coder Social logo

Clickhouse source will pull up all of the table data when the job start, TM will crash down if table data too big about flink-connector-clickhouse HOT 3 CLOSED

LinhaiXu avatar LinhaiXu commented on July 19, 2024
Clickhouse source will pull up all of the table data when the job start, TM will crash down if table data too big

from flink-connector-clickhouse.

Comments (3)

itinycheng avatar itinycheng commented on July 19, 2024

Sorry, not support for now, you can read data by partition or limit the number of reads.
In batch mode, seems all data will be read at once, is there any way to read data in multiple batches in one source instance?

from flink-connector-clickhouse.

LinhaiXu avatar LinhaiXu commented on July 19, 2024

Sorry, not support for now, you can read data by partition or limit the number of reads. In batch mode, seems all data will be read at once, is there any way to read data in multiple batches in one source instance?

Thank you for your reply.
But my job is in streaming mode, and the source DDL looks like this:

tableEnv.executeSql(
      """
        |CREATE TABLE stock_tbl (
        | `id` string,
        | `stock_id` string,
        | `display_type` int,
        | `display_name` string,
        | `data_type` int,
        | `compare_type` int,
        | `value` double,
        | `data_source_id` string,
        | `date_range_begin` string,
        | `date_range_end` string
        |) WITH (
        | 'connector' = 'clickhouse',
        | 'url' = 'clickhouse://xxx:8123',
        | 'username' = 'xxx',
        | 'password' = 'xxx',
        | 'database-name' = 'xxx',
        | 'table-name' = 'xxx',
        | 'scan.partition.column' = 'xxx',
        | 'scan.partition.num' = '4',
        | 'scan.partition.lower-bound' = '1',
        | 'scan.partition.upper-bound' = '4'
        |)
        |""".stripMargin)

The clickhouse table has 130 million records, how can I limit the number of reads? I didn't see the relevant options in your examples

from flink-connector-clickhouse.

itinycheng avatar itinycheng commented on July 19, 2024
  1. I mean select * from table limit n, but this does not meet your needs.
  2. scan.partition.* can help to read data in parallel, each subtask reads a part of the whole table data.
  3. In streaming mode, data is consumed one by one and sent downstream, it shouldn't cause out of memory, I did a test, in streaming mode, read ClickHouse and write MySQL:
    image

About your problem:
You can use config pipeline.operator-chaining = false to disable chaining job operators, this can help to locate the problem.

from flink-connector-clickhouse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.