Coder Social home page Coder Social logo

rsqldb's Introduction

RSQLDB

The database build for stream processing.

Overview

RSQLDB is a database for stream processing build on the of RocketMQ. It is distributed, highly available,scalable. SQL can be used to define a stream processing task, and RSQLDB parse it into a stream-processing task. RSQLDB offers the fellow core features:

  • Restful API - create, query, stop, stream-processing tasks;
  • Standard SQL - describe the stream processing task with standard sql;
  • Materialized views - incremental calculation on the of stream;

Deploy

Run RocketMQ 5.0 locally

Steps are as follows:

  • Install Java

  • Download RocketMQ

  • Start NameServer

  • Start Broker

More details can be obtained at quick start;

Run RSQLDB

From source code:

  • Git clone and compile
git clone [email protected]:alibaba/rsqldb.git

#compile antlr4 file to source code:
mvn clean compile -DskipTests
  • Run the entrance method:
  com.alibaba.rsqldb.rest.Application

From distribution package

  • Download distribution package
  • Unzip package
unzip rsqldb-distribution.zip
  • Start rsqldb
cd rsqldb && sh bin/start.sh

Use Cases and Examples

Filter

select *
from sourceTable where age>20 and name like '%mack';

Join

SELECT Websites.name as `count`, Websites.url as url, SUM(access_log.count) AS nums 
FROM access_log 
WHERE access_log.`count` > 100
INNER JOIN Websites ON access_log.site_id=Websites.id and access_log.url=Websites.url

Window

select 
TUMBLE_START(ts, INTERVAL '5' SECOND)       AS window_start,
TUMBLE_END(ts, INTERVAL '5' SECOND)         AS window_end,
position                                    AS position,
sum(num)                                    AS sumNum
from  sourceTable
where num > 5
group by TUMBLE(ts, INTERVAL '5' SECOND), position
having sum(num) < 20;
  • More examples can be found here.

rsqldb's People

Contributors

duhenglucky avatar j-ching avatar linfan avatar ni-ze avatar speak2me avatar yuanxiaodong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsqldb's Issues

support TOPN in sql

test sql

CREATE TABLE source_table (
  IP VARCHAR,
  `TIME` VARCHAR 
)WITH(
  ...
);

CREATE TABLE result_table (
  rownum BIGINT,
  start_time VARCHAR,
  IP VARCHAR,
  cc BIGINT,
  PRIMARY KEY (start_time, IP)
) WITH (
  ...
);

INSERT INTO result_table
SELECT rownum,start_time,IP,cc
FROM (
  SELECT *,
     ROW_NUMBER() OVER (PARTITION BY start_time ORDER BY cc DESC) AS rownum
  FROM (
        SELECT SUBSTRING(`TIME`,1,2) AS start_time, 
        COUNT(IP) AS cc,
        IP
        FROM  source_table
        GROUP BY SUBSTRING(`TIME`,1,2), IP
    )a
) t
WHERE rownum <= 3  

test data:

ip(VARCHAR) time(VARCHAR)
192.168.1.1 100000000
192.168.1.2 100000000
192.168.1.2 100000000
192.168.1.3 100030000
192.168.1.3 100000000
192.168.1.3 100000000

expected result:

rownum(BIGINT) start_time(VARCHAR) ip(VARCHAR) cc(BIGINT)
1 10 192.168.1.3 3
2 10 192.168.1.2 2
3 10 192.168.1.1 1

make having filter right

  1. make having filter right;
  2. add unit test for having filter;
  3. add code for print operator: print the result in log for select ... from..

support nested join sql

Inner join and left join are support by rsqldb, However, nested join is not well support in rsqldb.

Three is a nested join sql in below:

CREATE TABLE ticket
(
    `id`           BIGINT,
    `perform_id`   BIGINT,
    `position`     VARCHAR,
    `gmt_modified` TIMESTAMP,
    primary key (id)
) WITH (
      type = 'rocketmq',
      topic = 'rsqldb-ticket',
      groupName = 'rsqldb-ticket',
      namesrvAddr = '127.0.0.1:9876',
      isJsonData = 'true',
      msgIsJsonArray = 'false'
      );


CREATE TABLE odeum
(
    `id`           BIGINT,
    `name`         VARCHAR,
    `gmt_modified` TIMESTAMP,
    primary key (id)
) WITH (
      type = 'rocketmq',
      topic = 'rsqldb-odeum',
      groupName = 'rsqldb-odeum',
      namesrvAddr = '127.0.0.1:9876',
      isJsonData = 'true',
      msgIsJsonArray = 'false'
      );

CREATE TABLE perform
(
    `id`           BIGINT,
    `name`         VARCHAR,
    `odeum_id`     BIGINT,
    `gmt_modified` TIMESTAMP,
    primary key (id)
) WITH (
      type = 'rocketmq',
      topic = 'rsqldb-perform',
      groupName = 'rsqldb-perform',
      namesrvAddr = '127.0.0.1:9876',
      isJsonData = 'true',
      msgIsJsonArray = 'false'
      );

CREATE TABLE task_sink
(
    ticket_id    BIGINT,
    `position`   VARCHAR,
    odeum_name   VARCHAR,
    perform_name VARCHAR
) WITH (
      type = 'print'
      );

CREATE VIEW test_view AS
SELECT
    a.ticket_id     AS ticket_id,
    a.`position`    AS `position`,
    b.name          AS odeum_name,
    a.perform_name  AS perform_name
FROM (SELECT t.id         AS ticket_id,
             t.`position` AS `position`,
             p.name       AS perform_name,
             p.odeum_id   AS odeum_id
      FROM ticket AS t JOIN perform AS p ON t.perform_id = p.id
      ) a
JOIN odeum AS b ON a.odeum_id = b.id;

INSERT INTO task_sink
SELECT ticket_id,
       `position`,
       odeum_name,
       perform_name
FROM test_view;

RSQLDB support submit streams task in SDK

在目前的版本中,还没有实现客户端,仅暴露HTTP接口作为提交、管理流处理任务用。需要一个client的方式提交SQL,这样用于可以依赖这个client SDK直接提交流计算任务。另外,目前流处理任务的数据处理流向是,RocketMQ(源数据) -> RSQLDB ->RocketMQ(结果数据),也就是说RSQLDB将流处理结果数据会写回到RocketMQ的另一个topic中。在某些使用场景中,使用者希望能将计算结果数据返回到客户端,而不是写入到topic中,因此也需要通过client SDK将计算结果返回;

  1. 将计算结果写入到Topic中,已经实现:
CREATE TABLE `rocketmq_source`
(
  field_1 INT,
  field_2 VARCHAR,
  field_3 VARCHAR,
  field_4 VARCHAR
) WITH (
    topic = 'rsqldb-source',
    data_format='json'
    );


CREATE TABLE `task_sink_2`
(
  field_1 INT,
  field_2 VARCHAR,
  field_3 VARCHAR,
  field_4 VARCHAR
) WITH (
    topic = 'rsqldb-sink',
    data_format='json'
    );

//将计算结果写入到task_sink_2对应topic rsqldb-sink中
insert into task_sink_2
select field_1, sum(field_2) 
from rocketmq_source where field_1=1;
  1. 将计算结果返回给客户端,未实现:
CREATE TABLE `rocketmq_source`
(
  field_1 INT,
  field_2 VARCHAR,
  field_3 VARCHAR,
  field_4 VARCHAR
) WITH (
    topic = 'rsqldb-source',
    data_format='json'
    );


//将计算结果返回给客户端
select field_1, sum(field_2) 
from rocketmq_source where field_1=1 EMIT CHANGES;

support UNION ALL in sql

The UNION ALL statement merges two streaming data. The fields of the two streaming data must be exactly the same, including the field type and field order.

test data:

  1. test_source_union1
a(varchar) b(bigint) c(bigint)
test1 1 10
  1. test_source_union2
a(varchar) b(bigint) c(bigint)
test1 1 10
test2 2 20
  1. test_source_union3
    a(varchar) | b(bigint) | c(bigint)
    -- | -- | --
    test1 | 1 | 10
    test2 | 2 | 20
    test1 | 1 | 10

test sql:

SELECT
    a,
    sum(b) as d,
    sum(c) as e
FROM 
    (SELECT * from test_source_union1
    UNION ALL
    SELECT * from test_source_union2
    UNION ALL
    SELECT * from test_source_union3
    )t
 GROUP BY a;   

Expected result:

a(varchar) d(bigint) e(bigint)
test1 1 10
test2 2 20
test1 2 20
test1 3 30
test2 4 40
test1 4 40

make count(*) right

  1. make count(*) right;
  2. modify the configure of ObjectMapper;
  3. fix the error of integer cast to BigDecimal

support having in sql

sql tamplate:

  SELECT [ ALL | DISTINCT ]{ * | projectItem [, projectItem ]* }
  FROM tableExpression
  [ WHERE booleanExpression ]
  [ GROUP BY { groupItem [, groupItem ]* } ]
  [ HAVING booleanExpression ];

test sql:

SELECT Customer,SUM(OrderPrice) FROM XXX
GROUP BY Customer
HAVING SUM(OrderPrice)<2000;

test data:

Customer OrderPrice
Bush 1000
Carter 1600
Bush 700
Bush 300
Adams 2000
Carter 100

expected result:

Customer SUM(OrderPrice)
Carter 1700

Can not find parent artifact org.apache.rocketmq:rocketmq-streams:pom:1.0.0-SNAPSHOT

➜  rsqldb git:(main) mvn clean install
[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for com.alibaba:rsqldb:1.0.0-SNAPSHOT: Could not find artifact org.apache.rocketmq:rocketmq-streams:pom:1.0.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 7, column 13
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project com.alibaba:rsqldb:1.0.0-SNAPSHOT (/home/lisirrx/code/rsqldb/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM for com.alibaba:rsqldb:1.0.0-SNAPSHOT: Could not find artifact org.apache.rocketmq:rocketmq-streams:pom:1.0.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 7, column 13 -> [Help 2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.