Coder Social home page Coder Social logo

manishch22 / reporting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tf-govstack/reporting

0.0 0.0 0.0 1.31 MB

For MOSIP reporting and analytics

License: Mozilla Public License 2.0

Shell 5.14% Python 5.07% Java 87.39% Dockerfile 2.40%

reporting's Introduction

Reporting framework

Introduction

Reporting framework for real-time streaming data and visualization.

Installation in Kubernetes cluster

Prerequisites

  • MOSIP cluster installed as given here
  • Elasticsearch and Kibana already running in the cluster.
  • Postgres installed with extended.conf. (MOSIP default install has this configured)

Install data pipeline

  • Inspect scripts/values.yaml for modules to be installed.
  • Inspect scripts/values-init.yaml for connector configs.
  • Run
cd scripts
./install.sh [kube-config-file]

All components will be installed in reporting namespace of the cluster.

  • NOTE: for the db_user use superuser/postgres for now, because any other user would require the db_ownership permission, create permission & replication permission. (TODO: solve the problems when using a different user.)
  • NOTE: before installing, reporting-init debezium configuration, make sure to include all tables under that db beforehand. If one wants to add another table from the same db, it might be harder later on. (TODO: develop some script that adds additional tables under the same db)

Upload Kibana dashboards

Various Kibana dashboards are available in dashboards folder. Upload all of them with the following script:

cd scripts
./load_kibana_dashboards.sh

The dashboards may also be uploaded manually using Kibana UI.

Custom connectors

Install your own connectors as given here

Cleanup/uninstall

To clean up the entire data pipeline follow the steps given here

CAUTION: Know what you are doing!

Notes - Kafka Connectors & Transforms

  • Debezium, kafka "SOURCE" connector, puts all the (WAL logs) data into kafka in a raw manner, without modifying anything. So there are no "transformations" on source connector side.
  • So its the job of whoever it is, that's reading these kafka topics, to modify the data in the way it is desired be put into elasticsearch.
  • Currently, we are using Confluent's Elasticsearch Kafka Connector, "SINK" connector, (like debezium, this is also a kafka connector) to put data from kafka topics into elasticsearch indices. But this method also puts the data raw into elasticsearch. (Right now we have multiple sink "connections" between kafka and elasticsearch, almost one for each topic)
  • To modify the data, we can use kafka connect's SMTs(Single Message Transforms). Basically each of these transforms change each kafka record in a particular way. So each individual connection can be configured such that these SMTs are applied in a chain.
  • Please note that from here on the terms "connetor" and "connection" are used invariably, which mean each kafka-connection (source/sink). All the reference connector configuration files can be found here.
  • For how each of these ES connections configures and uses the transforms, and how they are chained, refer to any of the file here. For more info on the connectors themselves, refer here.
  • Side note: We have explored Spark (i.e., a method that doesn't use kafka sink connectors) to stream these kafka topics and put that data into elasticsearch manually. There are many complications this way. So currently continuing with the ES kafka connect + SMT way
  • So the custom transforms that are written, just need to be available in the docker image of the es-kafka-connector. Find the code for these transforms and more details on how to develop and build these transforms and build the es-kafka-connector docker image etc, here.
  • Please note slot.drop.on.stop property in file debez-sample-conn.api under here should be false in production.Because it is set to false so not to delete the logical replication slot when the connector stops in a graceful, expected way. Set to true in only testing or development environments.Dropping the slot allows the database to discard WAL segments. When the connector restarts it performs a new snapshot or it can continue from a persistent offset in the Kafka Connect offsets topic.

License

This project is licensed under the terms of Mozilla Public License 2.0.

reporting's People

Contributors

ckm007 avatar gandharvsuri avatar gsasikumar avatar hosurkrishnan avatar lalithkota avatar mohanraj209 avatar pjoshi751 avatar prafulrakhade avatar rakhimosip avatar rakshitha650 avatar rambhatt1591 avatar sadanandegowda avatar sowmya695 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.