Coder Social home page Coder Social logo

datafibers-community / df_data_service Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 30.0 22.35 MB

DataFibers Data Service

Home Page: http://www.datafibers.com

License: Apache License 2.0

Shell 0.01% Java 72.71% JavaScript 12.78% HTML 4.47% CSS 10.02%
flink hadoop kafka mongo streaming vertx

df_data_service's People

Contributors

datafibers avatar mikehao avatar schubertzhu avatar willddy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

df_data_service's Issues

Document refactory

readthedocs.org seems to have official way for documentation.
We may consider to release our final doc in this format

DF Permission Design and POC

One of value added feature is to enable permission on data sending and consuming from Kafka topics. Since we use Mongo DB, we can enable it there. However, the current design it to commit in mongo once the Kafka forward is complete, I think we need to check this.

In addition, user login and permission need to be discussed here as well

Schema View has schema update issue

There are following issues for schema view

  • sometimes, there is additional string, such as "connect.name = 'test'" after the schema string
  • update schema string failed as "Conflict"

Dockerize DF Environment

This will be a sub task for #32
Use docker image to build DF developer environment
Use docker composite to build DF developer service

When connectConfig supports multiple default, all content comes to json body

when we use below function to map different default value to the same field entity, all enties are sent. We see both connectorConfig and connectorConfig_2 are in the json body

myApp.config(function(RestangularProvider) {
RestangularProvider.addElementTransformer('posts', function(element) {
element.connectorConfig_2 = element.connectorConfig;
return element;
});
});

Get jobId from Flink Stream Execute

Currently, the jobId is captured from console screen in the separate thread. It is observed that exceptions on the console output stream sometimes. The proposed change may come from following solutions.

  • Create Flink execution series of class so that we can get jobID from job objects to Mongo
  • Search from Flink rest API to find the jobID (To be check if it has such API)

Kafka Topic Management

We'll add a new tab in the web UI to manage Kafka topics. We can either use Kafka Client or Kafka REST (DELETE does not support yet) #21 #18

Improve DF Environment Setup

We need to create Environment setup Scripts with following features

  • Support different profiles to easily chose/config which version to install
  • Create Linux guide scripts (installvm.sh) to generate profiles and call vagrant
  • Create Winows guide scripts (installvm.sh) to generate profiles and call vagrant
  • Create general shell scripts for Linux outside of Vagrant. Script can be reused from vagrant shell.
  • Create docker installation

Evaluate UI Framework

We are looking for better UI framework if possible

  • Easy work with REST as NG-Admin
  • Easy to extend
  • Dynamic update approach and mvc

The options come in mind are as follows.

  • NG-Admin (using now)
  • Admin-on-rest. This is likely where we go.
  • Other

Fix Meta View (History) for DF Ingestion Issues

We need to show a view to map schema to topic so that we know which topic using what schema.
This can be added to the topic management view
#24
We also need to identify list of data attribute, physical or logic/business to keep in mongodb/schema registery

  • df_meta is added as topic for df metadata in the latest branch
  • df_meta collection is created in mongo to accept the meta from mongo sink
  • function is added to the genericFileSource to send metadata for the file processed

Add Admin Tool in DF CLI

we can add a new command option -a = admin tool to support calling admin tools, such as

  • cleanup mongodb
  • lunch flink jar
  • kill flink job
  • delete topic
  • launch/stop connects
  • lanuch df environment, kafka etc, flink, etc

Certified connects - Improvement on Generic File Connect

  • Rename processed files - now in memory and processing files
  • CSV, JSON, TXT format processing
  • Collect and write file/data meta data to another meta topic in kafka
    • Connect start and end meta_data, config, status,timestamp, cuid
    • DONE - File start and end meta_data, file meta, status, timestamp, cuid
  • Able to turn off metadata collection to df_meta, enable by default
  • Create file sinks
  • Support disable schema validation
  • Support file extention overwrittern
  • Support read file from HDFS and partition
  • XML format processing
  • MainFrame format processing
  • Delimited file format processing
  • Avro Data format processing
  • Data validation design
    • row_count validation
    • biz_date validation
    • validation in header/trailer section of the file
    • validation in control files

Polish error msg and do documentation

HelpFun. errorMsg() is used for reporting errors. We need polish its usage and do documentation for error standard as follows in df complete guide.

error_id, error_class_name, error_method_name, error_details

Agent hang there without response.

From @datafibers on October 13, 2016 14:28

From @datafibers on August 10, 2016 1:20

The agent is hanging there without receiving responses from server. Once restart the agent, it works again. In this case, it is not an issue at the server side.

  • Add some logging to your code and also set exceptionHandler on HttpClientRequest and HttpClientResponse to see eventual errors.
  • Make sure the file handles are correctly closed after each iteration
  • If the number of files is large, have you checked that the number of maximum open file descriptors on the client side is set to a sufficiently high number (i.e. ulimit -n)

Copied from original issue: datafibers/df_demo#5

Copied from original issue: datafibers-community/df_demo#1

parquet data format support

We need to research the possibility of supporting processing data in parquet format since it is more efficient and support schema evolution

Redesign connect name in POPJ

Right now, connect is imputed by user. However, it must be unique. In addition, we need an additional attribute can be referenced in connect for meta information delivery.

  • Assign mongodb new ObjectId() as connect/transform, which is like uid
  • Refactory taskId to taskSeq
  • Refactory connect to connectUid
  • Add JobUID to POPJ, default 'NOT_ASSIGNED'
  • Inject a cid attribute which is the alias for name in connectConfig
  • When add connect to mongo, reuse this I'd instead of new generation
    _Id = connectUID = connectConfig.name = connectConfig.cid
  • Update certified connect will use this cid in meta data message as pk
  • Update UDF Jar name to use ObjectID #61

Support delete topic

Since we integrate topic and schema (subject), we need to try

  • can we delete topic
  • can we also delete schema as well

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.