Coder Social home page Coder Social logo

marusyk / grok.net Goto Github PK

View Code? Open in Web Editor NEW
230.0 9.0 52.0 144 KB

.NET implementation of the grok ๐Ÿ“

License: MIT License

C# 94.46% PowerShell 5.48% Shell 0.06%
grok grok-parser grok-patterns grokking c-sharp-library dotnet-standard nuget nuget-package hacktoberfest csharp

grok.net's Introduction

Grok

Cross-platform .NET grok implementation as a NuGet package

Build GitHub release) License contributions welcome

NuGet version Nuget PowerShell Gallery Version PowerShell Gallery

Code Coverage

Coverage Status

How to Install

Install as a library from Nuget:

Grok.Net

PM> Install-Package Grok.Net

Install as a PowerShell module from PowershellGallery:

Grok

Install-Module -Name Grok

Dependency

Since v.2.0.0, the grok uses the PCRE.NET library for regex.

What is grok

Grok is a great way to parse unstructured log data into something structured and queryable. It sits on top of Regular Expression (regex) and uses text patterns to match lines in log files.

A great way to get started with building your grok filters is this grok debug tool: https://grokdebugger.com

What can I use Grok for?

  • reporting errors and other patterns from logs and processes
  • parsing complex text output and converting it to JSON for external processing
  • apply 'write-once use-everywhere' to regular expressions
  • automatically providing patterns for unknown text inputs (logs you want patterns generated for future matching)

The syntax for a grok pattern is %{SYNTAX:SEMANTIC}

The SYNTAX is the name of the pattern that will match your text. SEMANTIC is the key.

For example, 3.44 will be matched by the NUMBER pattern, and 55.3.244.1 will be matched by the IP pattern. 3.44 could be the duration of an event, so you could call it simply duration. Further, a string 55.3.244.1 might identify the client making a request. For the above example, your grok filter would look something like this:

%{NUMBER:duration} %{IP:client}

Examples: With that idea of syntax and semantics, we can pull out useful fields from a sample log like this fictional HTTP request log:

55.3.244.1 GET /index.html 15824 0.043

The pattern for this could be:

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

More about grok

How to use

Create a new instance with grok pattern:

Grok grok = new Grok("%{MONTHDAY:month}-%{MONTHDAY:day}-%{MONTHDAY:year} %{TIME:timestamp};%{WORD:id};%{LOGLEVEL:loglevel};%{WORD:func};%{GREEDYDATA:msg}");

then prepare some logs to parse

string logs = @"06-21-19 21:00:13:589241;15;INFO;main;DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165
                06-22-19 22:00:13:589265;156;WARN;main;DECODED: 775233900043 EMPTY DISTANCE: --------";

You are ready to parse and print the result

var grokResult = grok.Parse(logs);
foreach (var item in grokResult)
{
    Console.WriteLine($"{item.Key} : {item.Value}");
}

output:

month : 06
day : 21
year : 19
timestamp : 21:00:13:589241
id : 15
loglevel : INFO
func : main
msg : DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165
month : 06
day : 22
year : 19
timestamp : 22:00:13:589265
id : 156
loglevel : WARN
func : main
msg : DECODED: 775233900043 EMPTY DISTANCE: --------

or use ToDictionary() on grokResult to get the result as IReadOnlyDictionary<string, IEnumerable<object>>

Custom grok patterns

There is the possibility to add your own patterns.

using file

Create a file and write the pattern you need as the pattern name, space, and then the regexp for that pattern.

For example, Patterns\grok-custom-patterns:

ZIPCODE [1-9]{1}[0-9]{2}\s{0,1}[0-9]{3}

then load the file and pass the stream to Grok:

FileStream customPatterns = System.IO.File.OpenRead(@"Patterns\grok-custom-patterns");
Grok grok = new Grok("%{ZIPCODE:zipcode}:%{EMAILADDRESS:email}", customPatterns);
var grokResult = grok.Parse($"122001:[email protected]");

using in-memory

Define a collection of patterns

var custom = new Dictionary<string, string>
{
    {"BASE64", "(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$"}
};

and use it as follows

var grok = new Grok("Basic %{BASE64:credentials}", custom);
GrokResult grokResult = grok.Parse("Basic YWRtaW46cGEkJHdvcmQ=");

PowerShell Module

Install and use the Grok as a PowerShell module

grok -i "06-21-19 21:00:13:589241;15;INFO;main;DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165" -g "%{MONTHDAY:month}-%{MONTHDAY:day}-%{MONTHDAY:year} %{TIME:timestamp};%{WORD:id};%{LOGLEVEL:loglevel};%{WORD:func};%{GREEDYDATA:msg}"

To get help use help grok command

Build

On Windows:

build.ps1

On Linux/Mac:

build.sh

Contributing

Would you like to help make grok.net even better? We keep a list of issues that are approachable for newcomers under the good-first-issue label.

Also. please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Thanks to @martinjt. The project is based on martinjt/grokdotnet.

grok.net's People

Contributors

adityanr avatar ahmadtantowi avatar demigodplayz avatar dinkrr avatar eddami avatar euronay avatar farzammohammadi avatar giomaz avatar halilkocaoz avatar iblackshadow avatar justin-lloyd avatar manasvigoyal avatar marce1994 avatar marusyk avatar mattfromrva avatar nik-base avatar nikhil-1503 avatar ricsilt avatar sandreas avatar shikharmittal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grok.net's Issues

Add Custom Grok Patterns Documentation

From the code it looks like it loads the file locally to bring in grok patterns. It'd be great if I could provide a supplemental file with custom patterns to also include. Or if that already exists it'd be great to see how to do so.

Tips, improvements, suggestions

This project is a simple implementation of grok parser. It would be great to have some suggestions how it can be improved. Maybe some ideas for new features.

Enable PackageValidation tool

The SDK provides a tool to validate NuGet packages right after creating them. At the moment, it provides the following checks:

  • Validates that there are no breaking changes across versions
  • Validates that the package has the same set of publics APIs for all the different runtime-specific implementations
  • Helps developers catch any applicability holes

To enable it, you can add the following property to your project file:

<Project>
  <PropertyGroup>
    <EnablePackageValidation>true</EnablePackageValidation>

    <!-- Optional: Detect breaking changes from a previous stable version -->
    <PackageValidationBaselineVersion>1.0.0</PackageValidationBaselineVersion>
  </PropertyGroup>
</Project>

Starting

Hi. I would like to help out on this project. There a way to get started?

Grok Pattern Validation Enhancement

Issue Description:
The current implementation of Grok does not validate if the Grok patterns specified in an expression are defined in the loaded patterns. This can lead to runtime errors when there are typos or usage of undefined patterns.

Proposed Enhancement:
I propose adding a method to validate Grok patterns against the set of loaded patterns. This method will ensure each pattern used in a Grok expression is defined. Here's the proposed implementation:

private void ValidateGrokPattern(string grokPattern)
{
    var grokPatternRegex = new Regex("%\\{(.*?)(?::\\w+)?\\}");
    MatchCollection matches = grokPatternRegex.Matches(grokPattern);

    foreach (Match match in matches.Cast<Match>())
    {
        var patternName = match.Groups[1].Value;
        if (!_patterns.ContainsKey(patternName))
        {
            throw new FormatException($"Invalid Grok pattern: Pattern '{patternName}' not found.");
        }
    }
}

If you agree on this enhancement I can be assigned this issue and will submit a PR. Thank you.

Invalid custom patterns are silently ignored

Hello,

at first, I would like to thank you for this awesome project. Very helpful! During my tests I noticed something, that might be a little confusing or at least there is room for improvement...

I used the following code with a custom pattern:

var stream = new MemoryStream(Encoding.Default.GetBytes("NOTDIRSEP [^/\\]*"));
var grok = new Grok("input/%{NOTDIRSEP:genre}/", stream);
var result = grok.Parse("input/Fantasy/");

The result was empty, no errors. After some debugging I noticed, that I accidentaly used an invalid pattern (escaping is hard! ;-) ):

# invalid pattern
NOTDIRSEP [^/\\]*

# valid pattern
NOTDIRSEP [^/\\\\]*

So, this produced the expected result:

var stream = new MemoryStream(Encoding.Default.GetBytes("NOTDIRSEP [^/\\\\]*"));
var grok = new Grok("input/%{NOTDIRSEP:genre}/", stream);
var result = grok.Parse("input/Fantasy/");

It turned out, that the invalid pattern was just silently ignored without the possibility of handling an error - except running the ignored regex check manually beforehand. See

Regex.Match("", strArray[1]);

There is also a little inconsistency:

  • LoadCustomPatterns may throw a new FormatException("Custom pattern was not in a correct form"); when the pattern is not separated by (space) - called in a constructor, which seems a bit of a code smell
  • Invalid regular expressions are silently ignored

I think there should be at least something like public List<(string, string)> InvalidCustomPatterns, while adding a list item (customPattern, errorMessage), if the Regex-Check fails - or, since there already is an exception, just rethrowing it with a more detailed error.

What do you think?

Multiline strings

Hi,

Since we're essentially passing down Regex to Grok, and Regex supports multiline via RegexOptions.Singleline, could it be supported in this package? I have some logs that unfortunately have \ns in them and I don't want to create a new object via .Replace() call. As it stands now, when I try to parse such string I only get items up to newline :(

Add documentation of custom grok patterns

Context

The support of custom grok patterns was added in #15

Just add the directory with the name Patterns and a file (the file name doesn't matter) with your own patterns.

Like Patterns\grok-custom-patterns:

ZIPCODE [1-9]{1}[0-9]{2}\s{0,1}[0-9]{3}

and use:

Grok grok = new Grok("%{ZIPCODE:zipcode}:%{EMAILADDRESS:email}");
var grokResult = grok.Parse($"122001:[email protected]");

DoD

Describe a new feature in README

Base64 content detection

Hi,

I'd like to add the ability to grok base64 strings from text.
I would add a pattern to detect Base64 to grok-patterns and then have a validator to run over any matches to ensure they truly were base64 encoded using something like below on each match and filtering out the ones that return false:

public static bool IsBase64String(string base64)
{
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}

As per contributing guidelines, I'm raising an issue for discussion and if approved, I'll put a PR together.
Thanks :)

Include symbols (PDB) for binary

Consumers should be able to debug the code of the package if something doesn't work as expected.
Use <DebugType>embedded</DebugType>, include the PDbs in the package, or use a symbol package (snupkg)
PDB should use the portable format to be compatible with all platforms. The file is also smaller than the Windows PDB format.

Tips, improvements, suggestions

This project is a simple implementation of grok parser. It would be great to have some suggestions how it can be improved. Maybe some ideas for new features.

GrokResult as a dictionary

Currently, the result is a list. This code:

Grok grok = new Grok("%{MONTHDAY:month}-%{MONTHDAY:day}-%{MONTHDAY:year} %{TIME:timestamp};%{WORD:id};%{LOGLEVEL:loglevel};%{WORD:func};%{GREEDYDATA:msg}");
string logs = @"06-21-19 21:00:13:589241;15;INFO;main;DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165
                06-22-19 22:00:13:589265;156;WARN;main;DECODED: 775233900043 EMPTY DISTANCE: --------";
var grokResult = grok.Parse(logs);
foreach (var item in grokResult)
{
  Console.WriteLine($"{item.Key} : {item.Value}");
}

will print :

month : 06
day : 21
year : 19
timestamp : 21:00:13:589241
id : 15
loglevel : INFO
func : main
msg : DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165
month : 06
day : 22
year : 19
timestamp : 22:00:13:589265
id : 156
loglevel : WARN
func : main
msg : DECODED: 775233900043 EMPTY DISTANCE: --------

It could be really useful to group the keys, e.g grokResult.AsDictionary(); to get:

month:
        06
        06
day:
        21
        21
year:
        19
        19
timestamp:
        21:00:13:589241
        21:00:13:589265
id:
        15
        156
loglevel:
        INFO
        WARN
func:
        main
        main
msg:
        DECODED: 775233900043 DECODED BY: 18500738 DISTANCE: 1.5165
        DECODED: 775233900043 EMPTY DISTANCE: --------

Add validation of custom grok patterns file

Context

The support of custom grok patterns was added in #15

Just add the directory with name Patterns and a file (the file name doesn't matter) with your own patterns.

Like Patterns\grok-custom-patterns:

ZIPCODE [1-9]{1}[0-9]{2}\s{0,1}[0-9]{3}

and use:

Grok grok = new Grok("%{ZIPCODE:zipcode}:%{EMAILADDRESS:email}");
var grokResult = grok.Parse($"122001:[email protected]");

IndexOutOfRangeException

There is an issue when that custom file has the wrong content.

How to reproduce

Add to the Patterns\grok-custom-patterns the following data:

pattern

How to fix

Add some validation on load of custom grok file content. It should be pattern name, space, then the regexp for that pattern

Workflow improvements

Workflow in this repository should be both minimalistic and working example of creating your own NuGet available library. It needs some discussions what is a "good practice here". Here are some topics to discuss, even before implementing:

All with respect to versions/tags:

  • take a version of the library from .csproj?
  • when to tag?
  • when to create a release?
  • when and how to trigger publishing NuGet package?

Use a faster regular expression engine

The Regex class in the .NET uses the .NET regular expression engine. This engine is a good general-purpose engine, but it is not the fastest. If we need the best possible performance, you can use a faster regular expression engine, such as the PCRE library.

The PCRE library is a faster regular expression engine that is available for free.

We can use the PCRE library in C# by using the PCRE.NET: https://github.com/ltrzesniewski/pcre-net library. This library provides a .NET wrapper for the PCRE library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.