Coder Social home page Coder Social logo

Comments (13)

st1page avatar st1page commented on May 27, 2024 1

Can this just be expressed with

create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

Internally, NowExecutor will be used

The issue is that it will generate too many data in the table and also it is a big generate_series batch query.

from risingwave.

st1page avatar st1page commented on May 27, 2024 1
```sql
create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

Oh, create mv with a generate_series is even worse

create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

WIth this query, every time the now()'s value changes, RW will retract all the result of the old TVF generate_series and send the new value.

from risingwave.

st1page avatar st1page commented on May 27, 2024 1

Ohh so sorry I misunderstand it ... LGTM to use the syntax.

generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024 1

By the way, share the time_bucket_gapfill from TimescaleDB.

Compared with Join, it has 2 advantages:

  1. It allows specifying how to interpolate values.
  2. Streaming Join is very expensive

Regarding of the user interface, I slightly prefer this than JOIN, also because it's more straigh-forward

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024 1

Oh, I just realized the discussion around gap filling and interpolation has been a little bit off-topic. Let's focus on the original requirement for this issue. That is, supporting

create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

with some efficient incremental way.

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024

Can this just be expressed with

create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

Internally, NowExecutor will be used

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024

also it is a big generate_series batch query.

Didn't get this. Can you explain it? I don't see any batch query here.

By the way, regarding the original query in PR description

CREATE VIEW timer_per_minutes AS SELECT * FROM per_minutes where minute > now();

Do you mean where minute < now()?

from risingwave.

st1page avatar st1page commented on May 27, 2024

CREATE VIEW timer_per_minutes AS SELECT * FROM per_minutes where minute > now();

Do you mean where minute < now()?

Yes, I will change it.

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024

Oh, create mv with a generate_series is even worse

create materialized view xxx as generate_series(
    '2020-01-01 00:00:00'::TIMESTAMP, -- start 
    NOW(), -- end
    interval '1' minute  -- step
);

WIth this query, every time the now()'s value changes, RW will retract all the result of the old TVF generate_series and send the new value.

Yeah, by the current implementation, it is. I am just proposing to use the syntax. We should make it incremental internally.

from risingwave.

st1page avatar st1page commented on May 27, 2024

Streaming Join is very expensive

So, Even a streaming time_bucket_gapfill is better because it exploits the property that the time columns are basically ordered and that the gaps are not very large? 🤔 It is interesting.

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024

Streaming Join is very expensive

So, Even a streaming time_bucket_gapfill is better because it exploits the property that the time columns are basically ordered and that the gaps are not very large? 🤔 It is interesting.

Let me explain why a (naive) outer join is not the optimal approach.

For example, let's consider the hardest case first - the only left row in a time window was deleted, in this case, a gap is created, and we need to fill the gap with a row of NULLs.

No matter which approach we are using, to implement this, the operator must memorize how many rows were in this window before the delete. Actually, this is exactly what the degree table of left outer join does.

-- imagine it as 
upstream_data LEFT OUTER JOIN per_minutes_series

However, the rest state tables, such as the materialized rows of both left & right sides, were useless, because

  • The left state table (i.e. the materialized rows from upstream_data) is not necessary because every time an upstream_data row comes, it can be emitted immediately and doesn't need to wait for per_minutes_series
  • The right statble table (i.e. the materialized rows from per_minutes_series) is wasteful because it's very simple. Just keeping the start_time and end_time would be enough.

I am thinking that we might need to introduce a special operator, saying, like a much simplified outer join operator with a degree table. Of course, this can be a second step and naive join can be used before it's completed, but before this we need to carefully consider the syntax: join, aggregate function, or TVF.

from risingwave.

st1page avatar st1page commented on May 27, 2024

dolphinDB allow a interval syntax in the group by clause, which supports the "fill" argument

fill indicates how to fill the missing values of the result. It can take the value of “prev”, “post”, “linear”, “null”, a specific numeric value and “none”.
“prev”: the previous value
“post”: the next value
“linear”: linear interpolation. For non-numerical data, linear interpolation cannot be used and the “prev” method will be used instead.
“null”: null value
a specific numeric value.
“none”: do not interpolate

https://docs.dolphindb.cn/en/help/SQLStatements/interval.html

from risingwave.

fuyufjh avatar fuyufjh commented on May 27, 2024

https://docs.dolphindb.cn/en/help/SQLStatements/interval.html

Interesting. While for RisingWave, the closest syntax is actually the time window function i.e. TUMBLE and HOP. But regarding of implementation, it seems that reusing the state in Agg is the most efficient way.

from risingwave.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.