Coder Social home page Coder Social logo

grok's Introduction

Grok Library

A Java library for extracting structured data from unstructured data

This library was inspired by the logstash inteceptor or filter available here

http://logstash.net/docs/1.4.0/filters/grok

This grok library comes with pre-defined patterns

https://github.com/aicer/grok/tree/master/src/main/resources/grok_built_in_patterns

However, you can also create your own custom named patterns.

SYNTAX

The syntax for the patterns are as follows

%{PATTERN_NAME:NAMED_GROUP_IN_RESULT}

For example, the following pattern

%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}

will extract an email address, password and year of birth from the following string

55BB778 - [email protected] secret123 4439 Valid Data Stream

The PATTERN_NAME has to be defined in the dictionary and the group names, username, password and yearOfBirth will be used to retrieve the values from the extraction results.

How to Include It as a Maven Dependency

<dependency>
    <groupId>org.aicer.grok</groupId>
    <artifactId>grok</artifactId>
    <version>0.9.0</version>
</dependency>

How to Use the Library

Patterns can be loaded in 4 ways by invoking the following methods on the dictionary object.

GrokDictionary.addBuiltInDictionaries()

This loads all the built in dictionaries from the class path

GrokDictionary.addDictionary(File)

final GrokDictionary dictionary = new GrokDictionary();

// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();

// Add custom pattern
dictionary.addDictionary(new File(patternDirectoryOrFilePath));

// Resolve all expressions loaded
dictionary.bind();

Here custom patterns can be loaded into the dictionary by passing in a File object representing the directory where the patterns are stored

GrokDictionary.addDictionary(InputStream)

Here custom patterns can be loaded into the dictionary by passing in an inpustream containing the named expressions

GrokDictionary.addDictionary(Reader)

Here a custom pattern can be added by passing a reader contain the named pattern

final GrokDictionary dictionary = new GrokDictionary();

// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();

// Add custom pattern
dictionary.addDictionary(new StringReader("DOMAINTLD [a-zA-Z]+"));
dictionary.addDictionary(new StringReader("EMAIL %{NOTSPACE}@%{WORD}\.%{DOMAINTLD}"));

// Resolve all expressions loaded
dictionary.bind();

Example of How to Use The Library

public final class GrokStage {

  private static final void displayResults(final Map<String, String> results) {
    if (results != null) {
      for(Map.Entry<String, String> entry : results.entrySet()) {
        System.out.println(entry.getKey() + "=" + entry.getValue());
      }
    }
  }

  public static void main(String[] args) {

    final String rawDataLine1 = "1234567 - [email protected] cc55ZZ35 1789 Hello Grok";
    final String rawDataLine2 = "98AA541 - [email protected] mmddgg22 8800 Hello Grok";
    final String rawDataLine3 = "55BB778 - [email protected] secret123 4439 Valid Data Stream";

    final String expression = "%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}";

    final GrokDictionary dictionary = new GrokDictionary();

    // Load the built-in dictionaries
    dictionary.addBuiltInDictionaries();

    // Resolve all expressions loaded
    dictionary.bind();

    // Take a look at how many expressions have been loaded
    System.out.println("Dictionary Size: " + dictionary.getDictionarySize());

    Grok compiledPattern = dictionary.compileExpression(expression);

    displayResults(compiledPattern.extractNamedGroups(rawDataLine1));
    displayResults(compiledPattern.extractNamedGroups(rawDataLine2));
    displayResults(compiledPattern.extractNamedGroups(rawDataLine3));
  }
}

Which gives the folllowing output

Dictionary Size: 91

[email protected]
password=cc55ZZ35
yearOfBirth=1789

[email protected]
password=mmddgg22
yearOfBirth=8800

[email protected]
password=secret123
yearOfBirth=4439

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.