Coder Social home page Coder Social logo

intertwin-eu / dcnios Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 1.0 43.81 MB

DCNiOS is a Data Connector through Apache NiFi for OSCAR, that facilitates the creation of event-driven processes connecting a Storage System.

Home Page: https://intertwin-eu.github.io/dcnios/

License: Apache License 2.0

Python 83.28% Dockerfile 0.61% Shell 0.14% JavaScript 13.31% CSS 2.66%
data data-connectors nifi oscar python

dcnios's Introduction

dCNiOS

dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods.

Apache NiFi is a reliable system to process and distribute data through powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

OSCAR is an open-source platform for serverless event-driven data processing of containerized applications across the computing continuum.

Together with dCNiOS (dCache + NiFi + OSCAR), you can manage the creation of event-driven data processing flows. As shown in the figure, when files are uploaded to dCache, events are ingested in Apache NiFi, which can queue them up depending on the (modifiable at runtime) ingestion rate, to be then delegated for processing into a scalable OSCAR cluster, where a user-defined application based on a Docker image can process the data file.

dCNiOS Workflow

Therefore, dCNiOS has been made to interact with NiFi and deploy a complete dataflow. It uses HTTP calls to communicate with a Nifi cluster, which can be automatically deployed by the Infrastructure Manager (IM). Apache NiFi is deployed on a dynamically provisioned Kubernetes cluster running with a custom Docker image named ghcr.io/grycap/nifi-sse:latest. This new image includes a client for the dCache SSE Event Interface, kindly provided by Paul Millar in GitHub. It does not require a Nifi registry.

All the dataflow information is described in a YAML file, and by executing the dCNiOS command-line interface, this dataflow is deployed on Nifi.

From predefined recipes (ProcessGroup in Nifi, .json files) created before,

dCNiOS inserts a general flow and changes the variables to create a concrete workflow.

By default, two process group recipes have been created:

  1. dcache, which is an active listener for a dCache instance. The Server-sent Events SSE client actively listens for these events in a user-defined folder in dCache. When a file is uploaded to that folder in dCache, NiFi will introduce the event in the dataflow.
  2. InvokeOSCAR, an HTTP call to invoke an OSCAR service asynchronously. OSCAR supports this events specification to let the user decide whether the file should be pre-staged into the execution sandbox to locally process the data within an OSCAR job or to delegate the processing of the event into an external tool, such as a workflow orchestration platform, thus reducing data movements across the systems.

Getting Started

Prerequisites

  • OSCAR cluster with services deployed
  • Nifi Cluster deployed
  • A package provider such as Anaconda

Installation

Create an environment with conda and use it.

conda create --name dcnios python=3.7.6
conda activate dcnios

Install all the requirements defined in requirements.txt

pip install -r requeriments.txt

Or only install the minimal requirements that dCNiOS needs.

pip install pyyaml==6.0 requests==2.28.2 oscar_python==1.0.3

Authors

Instituto de Instrumentación para Imagen Molecular (I3M), Centro Mixto CSIC — Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, España

Versions and Maintenance

There is only one version in maintenance:

  • The main branch in the source code repository maintains a working state version of the software component.
  • Documentation is updated with the new software versions involving any substantial or minimal change in the application's behavior. Any issue can be reported in the Issues section of the GitHub project
  • Documentation is updated whenever reported as inaccurate or unclear.

Licensing

dCNiOS is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Acknowledgements

This work was supported by the project “An interdisciplinary Digital Twin Engine for science’’ (interTwin), which has received funding from the European Union’s Horizon Europe Programme under Grant 101058386.

dCNiOS Workflow

More information

You can find more information in the OSCAR's blog.

dcnios's People

Contributors

esparig avatar gmolto avatar sergiolangaritabenitez avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

esparig

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.