Coder Social home page Coder Social logo

splunk / deep-learning-toolkit Goto Github PK

View Code? Open in Web Editor NEW
15.0 17.0 7.0 15.82 MB

Deep Learning Toolkit for Splunk

License: Other

Shell 0.09% Python 89.99% Ruby 0.02% CSS 0.07% JavaScript 1.41% C++ 0.34% C 0.02% Dockerfile 0.05% Smarty 0.01% Jupyter Notebook 8.01%
splunk kubernetes spark tensorflow pytorch dask

deep-learning-toolkit's Introduction

Deep Learning Toolkit for Splunk v4

Deep Learning Toolkit for Splunk (DLTK) is a Splunk App that allows users to better deploy, observe and scale advanced data science and machine learning projects with Splunk. In a cloud-native way, DLTK lets you elastically and dynamically deploy models to CPU and GPU based container environments such as Docker, Kubernetes or OpenShift to run algorithms in libraries like Spark, TensorFlow, PyTorch, Rapids and Dask.

๐Ÿ”ด Important Note ๐Ÿ”ด

This project (DLTK v4) was created in 2020 as a new app architecture for DLTK. Due to various reasons - low traction in the open source world - this project in currently not considered to be actively maintained and continued. However, DLTK version 3.x is available on Splunkbase and is actively maintained and developed. Parts of the v4 project may be ported into the splunkbase version of DLTK. If you want to get started with DLTK we recommend to use DLTK 3.x from splunkbase. If you are using this app version v4 you might want to consider migrating it to the splunkbase version. Finally, if you want to install DLTK v4 from this repository, please see the Installation Guide.

Documentation

This repository is used to build DLTK. If you are looking for documentation, please see the following:

Community Supported

Deep Learning Toolkit for Splunk is an open source project developed by Splunkers with contributions from the community of partners and customers. It will be enhanced, maintained and supported by the community, led by Splunkers with deep subject matter expertise. This is not an official Splunk supported product.

License

See LICENSE.

Contributors

Andreas Greeske, Anthony Tellez, Greg Ainslie-Malik, Philipp Drieger, Robert Fujara

If you are interested in contributing to this project, please see the Contributing Guidelines.

deep-learning-toolkit's People

Contributors

hovu96 avatar pdrieger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-learning-toolkit's Issues

Wrong conf file encoding on windows

The app writes the conf files on windows with UTF-8-BOM Encoding. This leads to errors in the parse method in the conf_files_splunk_client.py

Possible solution would be to detect the encoding before parsing the files:

    # from requests.utils (guess_json_utf(data))
    def detect_encoding(self, path):
        _null = "\x00".encode("ascii")  # encoding to ASCII for Python 3
        _null2 = _null * 2
        _null3 = _null * 3
        with open(path, "rb") as fp:
            data = fp.readline()
        sample = data[:4]
        if sample in (codecs.BOM_UTF32_LE, codecs.BOM_UTF32_BE):
            return "utf-32"  # BOM included
        if sample[:3] == codecs.BOM_UTF8:
            return "utf-8-sig"  # BOM included, MS style (discouraged)
        if sample[:2] in (codecs.BOM_UTF16_LE, codecs.BOM_UTF16_BE):
            return "utf-16"  # BOM included
        nullcount = sample.count(_null)
        if nullcount == 0:
            return "utf-8"
        if nullcount == 2:
            if sample[::2] == _null2:  # 1st and 3rd are null
                return "utf-16-be"
            if sample[1::2] == _null2:  # 2nd and 4th are null
                return "utf-16-le"
            # Did not detect 2 valid UTF-16 ascii-range characters
        if nullcount == 3:
            if sample[:3] == _null3:
                return "utf-32-be"
            if sample[1:] == _null3:
                return "utf-32-le"
            # Did not detect a valid UTF-32 ascii-range character
        return "utf-8"

    def parse(self, path):
        if os.path.exists(path):
            parser = ConfigParser(
                delimiters=("="),
                strict=False,
                default_section="__default__",
            )
            encoding = self.detect_encoding(path)
            with open(path, "r", encoding=encoding) as fp:
                content = fp.read()

            content = content.replace("\\\n", "")
            parser.read_string(content)
            return parser
        else:
            return None

PM4Py dashboard example

Create a simple process mining dashboard with input data and extracted graph visualisation based on existing OSS custom viz.

Overview page

High level overview of status with regards to: environments, algorithms.
Example content page (similar to DLTK 3 content overview)

Update and restructure base golden image

  • add /srv/notebooks/data folder for staging
  • change the algorithm name to make clear what's edited (notebook)
  • sync current edited algo to root folder
  • rename notebooks to examples
  • add README welcome page that explains how the sync of the algorithm with Splunk works

H2O runtime test image

As a user I want to easily spin up and connect to a H2O image and have a Flow notebook.

Syntax of version 3 in the jupyter notebook examples

As example in "Named Entity Recognision" in stage 1 of the notebook the version 3 syntax is used: | fit MLTKContainer instead of compute statement:

| compute algorithm="Named Entity Recognition and Extraction" environment="DockerDesktop" method="fit" fields="text" feature_variables="text" mode=stage

Causalnex integration and example

As a user I want to use causalnex for causal reasoning with bayesian networks and visualize findings in an example analysis dashboard.

  • pip install causalnex in base image
  • add example notebook
  • add example dashboard

User cannot change basic algorithm information from UI

Currently, this is only possible by manually editing the dltk_algorithms.conf file.

Maybe add an item to the "actions" menu for opening an dialog:
image

The dialog could allow the user to enter/change the following algorithm fields:

  • Name
  • Description
  • Category

We should add an rest endpoint for GETing and PUTting this information. E.g. dltk/algorithm_details

Focus on text boxes in modal dialogs

When opening modal dialog that asks for entering a name, the focus should automatically be set to textbox.

Currently it looks like this:
image

This is the goal:
image

Integrate basic MLflow for model management

As a user I want to use MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow should be accessible from the base image.

Error in 'compute' command: Error handling chunk: urllib.error.HTTPError: HTTP Error 413: Request Entity Too Large

Reading from a lookup file and running the compute command returns an error:
01-04-2021 16:07:22.970 INFO DispatchExecutor - END OPEN: Processor=compute

01-04-2021 16:07:22.970 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=PREVIEW

01-04-2021 16:07:22.979 ERROR ChunkedExternProcessor - stderr: begin command handler (chunk_size=3131679)

01-04-2021 16:07:22.979 ERROR ChunkedExternProcessor - stderr: is final chunk from Splunk

01-04-2021 16:07:22.979 ERROR ChunkedExternProcessor - stderr: call execution handler (3131679 bytes of data)

01-04-2021 16:07:22.986 ERROR ChunkedExternProcessor - stderr: sending 3101728 bytes

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: Traceback (most recent call last):

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/etc/apps/dltk/bin/dltk/core/execution/command.py", line 390, in handle_chunk

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: result = self.execution.handle(self.buffer, final_chunk_from_splunk)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/etc/apps/dltk/bin/dltk/runtime/base/base_execution.py", line 82, in handle

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: response = urllib.request.urlopen(request)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 222, in urlopen

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: return opener.open(url, data, timeout)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 531, in open

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: response = meth(req, response)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 641, in http_response

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: 'http', request, response, code, msg, hdrs)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 569, in error

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: return self._call_chain(*args)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 503, in _call_chain

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: result = func(*args)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/lib/python3.7/urllib/request.py", line 649, in http_error_default

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: raise HTTPError(req.full_url, code, msg, hdrs, fp)

01-04-2021 16:07:23.103 ERROR ChunkedExternProcessor - stderr: urllib.error.HTTPError: HTTP Error 413: Request Entity Too Large

01-04-2021 16:07:23.104 ERROR ChunkedExternProcessor - stderr: Error handling chunk: urllib.error.HTTPError: HTTP Error 413: Request Entity Too Large

01-04-2021 16:07:23.104 ERROR ChunkedExternProcessor - stderr: NoneType: None

01-04-2021 16:07:23.105 ERROR ChunkedExternProcessor - Error in 'compute' command: Error handling chunk: urllib.error.HTTPError: HTTP Error 413: Request Entity Too Large

Test GPU algorithm

Test run of GPU enabled algorithms on:

  • AWS GPU node (cloud)
  • DGX GPU workstation (on-prem)

Migrate missing DLTK 3.3 algorithms

Continue migrating a few remaining algorithms from DLTK 3.3 into the new framework. Adjust dashboard searches and refactor notebooks a bit.

PM4Py algorithm example

Create a simple process mining notebook to retrieve the process flow graph from a given log data source with the help of pm4py algorithm.

H2O example

As a user I want to have a simple example for H2O runtime so I can send data, train a model and return predictions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.