Coder Social home page Coder Social logo

joni's Introduction

joni

Maven Central Build Status

Java port of Oniguruma regexp library

Usage

Imports

    import org.jcodings.specific.UTF8Encoding;
    import org.joni.Matcher;
    import org.joni.Option;
    import org.joni.Regex;

Matching

    
    byte[] pattern = "a*".getBytes();
    byte[] str = "aaa".getBytes();

    Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
    Matcher matcher = regex.matcher(str);
    int result = matcher.search(0, str.length, Option.DEFAULT);

Using captures

  byte[] pattern = "(a*)".getBytes();
  byte[] str = "aaa".getBytes();

  Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
  Matcher matcher = regex.matcher(str);
  int result = matcher.search(0, str.length, Option.DEFAULT);
  if (result != -1) {
      Region region = matcher.getEagerRegion();
  }

Using named captures

  byte[] pattern = "(?<name>a*)".getBytes();
  byte[] str = "aaa".getBytes();

  Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
  Matcher matcher = regex.matcher(str);
  int result = matcher.search(0, str.length, Option.DEFAULT);
  if (result != -1) {
      Region region = matcher.getEagerRegion();
      for (Iterator<NameEntry> entry = regex.namedBackrefIterator(); entry.hasNext();) {
          NameEntry e = entry.next();
          int number = e.getBackRefs()[0]; // can have many refs per name
          // int begin = region.beg[number];
          // int end = region.end[number];

      }
  }

License

Joni is released under the MIT License.

joni's People

Contributors

anba avatar angelozerr avatar arthurscchan avatar bbrowning avatar chenzhang22 avatar dependabot[bot] avatar edwardbetts avatar enebo avatar haozhun avatar headius avatar henrich avatar henry-thompson avatar jirkamarsik avatar joelhockey avatar jordansissel avatar kares avatar kishorkunal-raj avatar lopex avatar michaelklishin avatar nezda avatar nicksieger avatar qmx avatar sebthom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

joni's Issues

regexp causes hang in jruby but terminates in MRI

In MRI 2.6:

% ruby -v 
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]
% ruby -e 'puts "foo========:bar baz================================================bingo".scan(/(?:=+=+)+:/)'
========:

With Latest JRuby snapshot:

/tmp/jruby-9.2.8.0-SNAPSHOT % java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
/tmp/jruby-9.2.8.0-SNAPSHOT % jruby -v
jruby 9.2.8.0-SNAPSHOT (2.5.3) 2019-07-19 b416404 OpenJDK 64-Bit Server VM 11.0.1+13 on 11.0.1+13 +jit [darwin-x86_64]
/tmp/jruby-9.2.8.0-SNAPSHOT % jruby -e 'puts "foo========:bar baz================================================bingo".scan(/(?:=+=+)+:/)'
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.jruby.util.SecurityHelper to field java.lang.reflect.Field.modifiers
WARNING: Please consider reporting this to the maintainers of org.jruby.util.SecurityHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

[edit] Also hangs on jruby 1.7.27, 9.2.5.0, 9.2.7.0

Region named capture (-1--1)

Hi,

Can you give me some feedback on this issue?

I'm trying to mach Named capture groups in a multiline byte[] content and getting a -1 -1 index range for group2 for pattern (A).

Pattern A is: (?[0-9.]{1,5}%)|(?dev = .*)

However, pattern (B) works fine for non multiline (\n) content.

Pattern B is: (?[0-9.]{1,5}%).*(?dev = .*)

Debug regex on: https://regex101.com/r/y2ER1a/1

Content (with multiline) is:
Content >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=57 time=12.934 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=13.145 ms

--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 12.934/13.040/13.145/0.106 ms
Content <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

My debug output is:

result = 226
D region Region:
0: (226-230) 1: (226-230) 2: (-1--1)
D nameEntry loss 1
loss -> 226, 230
0.0%
D nameEntry rtt 2
java.lang.StringIndexOutOfBoundsException: String index out of range: -1

My code is:

public class RegexOutputPluginHandler {

private byte[] patternBytes;
private Regex regex = null;


public RegexOutputPluginHandler() {
	String pattern = "(?<loss>[0-9\\.]{1,5}%)|(?<rtt>dev = .*)";
	
	if (pattern != null) {
		this.patternBytes = pattern.getBytes();
		this.regex = new Regex(this.patternBytes, 0, this.patternBytes.length, Option.MULTILINE, UTF8Encoding.INSTANCE);
	}
	
}


public Map<String, Object> extract(byte[] content) {
	
	System.err.println("D content " + content);

	if (content == null) {
		return null;
	}
	
	System.err.println("D content len " + content.length);

	Map<String, Object> fields = new HashMap<String, Object>();

	Matcher matcher = regex.matcher(content);
	int result = matcher.search(0, content.length, Option.MULTILINE);

	System.out.println("result = " + result);

if (result != -1) {
		Region region = matcher.getEagerRegion();
		
		System.out.println("D region " + region.toString());
		
		for (Iterator<NameEntry> entry = regex.namedBackrefIterator(); entry.hasNext();) {
			NameEntry e = entry.next();
			
			System.out.println("D nameEntry " + e.toString());
			
			int number = e.getBackRefs()[0]; // can have many refs per name
			int begin = region.beg[number];
			int end = region.end[number];

			String fieldName = new String(e.name, e.nameP, e.nameEnd - e.nameP);
			String fieldContent = new String(content, begin, end - begin);


			System.out.println(fieldName + " -> " + begin + ", " + end);
			System.out.println(fieldContent);

		}
	}else {
		System.err.println("D matcher none");
	}
	
	return fields;
}

}

how to interrupt hanging thread?

call thread.interrupt() and nothing happened. so how to stop the hanging thread?

Charset _charset = Charset.forName("GB18030");
/* text containing irregular binary data will make thread hang */
Thread thread = new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			String key = "a"; // any character
			byte[] pattern = key.getBytes(_charset);

			Regex regex = new Regex(pattern, 0, pattern.length, Option.IGNORECASE, GB18030Encoding.INSTANCE);

			byte[] source = new byte[]{0x2f, 0x2f, (byte) 0xaf}; // text content.
			/* Encoded by GB18030, It reads "//๏ฟฝ" where ๏ฟฝ means that "0xaf" is wrong or unsupported? */
			System.out.println(new String(source, _charset));

			Matcher matcher = regex.matcher(source);
			// search Interruptible ?
			int idx=matcher.searchInterruptible(0, source.length, Option.DEFAULT);
			System.out.println(idx+"");
		} catch (InterruptedException e) {
			System.out.println("InterruptedException");
			e.printStackTrace();
		}
	}
});

thread.start();

new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			Thread.sleep(500);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		System.out.println("interrupt !!! ");
		thread.interrupt();  // called but not working. 
	}
}).start();

v2.1.30

joni seems to be 1.5 slower than simple JNI bindings

Steps to reproduce

  1. onig4j-v003-src.zip
  2. Update jni/Makefile with proper JAVA_HOME and then call make
  3. Update lib location in src/onig4j/OnigRegex.java
  4. Run OnigPerformanceTest

We've got following results:
java: 4261ms
joni: 5798ms
onig: 3511ms
tm4e: 18ms

With a straightforward approach joni is about 1.5 times slower than oniguruma bindings.

tm4e major boost seems to be a result of src/org/eclipse/tm4e/core/internal/oniguruma/OnigRegExp.java:49: if a regexp is called consequently on the same string it just returns latest cached match result

Failed to parse textmate regex: invalid pattern in look-behind

There is a pattern to match a string from 0 to 71 chars long: (?<=^.{0,71}).
I use it in a lot of cases in different places.
As I discovered, there is no functionality for handling lookbehind/lookahead together with the {min,max} quantifier.
Is there any workaround/planned development for this? Thank you

Temporarily commented out 'character class has duplicated range'

In 2.1.14 we updated some data and this uncovered some issues with joni and JRuby interactions involving warnings. The main visible issue is some regexps are generating the warning:

character class has duplicated range

This warning is sometimes coming out from internal expansions (like \X). If an expansion is internally diplicating we definitely do not want end users to be warned. We actually fixed one case where we were making a regexp UTF-8 when it shouldn't have been, but we are still see some other missing cases.

Joni's design compounds this issue because some constructor paths use the DEFAULT WarnCallback which is literally a system.err.println() call. This means we cannot change anything in JRuby specifically to avoid this potentially being used since not all joni Regex code is from JRuby core. We also have native extension authors who might be calling a constructor using DEFAULT.

That probably was not a super clear description but the solution should be reasonably easy to follow:

  • uncomment warn for 'character class has duplicated range' (in ScanEnvironment)
  • Add ability to register a default WarnCallback handler
  • (on jruby side) use this new register API

Additional things to do:

  • (on jruby side) audit all regex constructors and figure out where our remaining duplicated class warnings are coming from
  • augment joni warning to provide the actual regexp which is generating the warning (MRI does print out the failing regexp). But warn(message, regexp) would be a great API for debugging issues like this so we should change joni to be like that.

\g with not existing subexpression name leads to java.lang.StringIndexOutOfBoundsException

If subexpression name doesn't exist (e.g. "\\gA") then java.lang.StringIndexOutOfBoundsException exception is thrown:

offset 4, count 7, length 8 (java.lang.StringIndexOutOfBoundsException)
	from java.lang.String.checkBoundsOffCount(String.java:4587)
	from java.lang.String.<init>(String.java:523)
	from java.lang.String.<init>(String.java:1413)
	from org.joni.Lexer.syntaxWarn(Lexer.java:1327)
	from org.joni.Lexer.fetchTokenFor_subexpCall(Lexer.java:916)
	from org.joni.Lexer.fetchToken(Lexer.java:1152)
	from org.joni.Parser.parseRegexp(Parser.java:1383)
	from org.joni.Analyser.compile(Analyser.java:78)
	from org.joni.Regex.<init>(Regex.java:155)
	from org.joni.Regex.<init>(Regex.java:134)
...

Notes

It seems the issue is in the following code:

// src/org/joni/Lexer.java
    protected final void syntaxWarn(String message) {
        if (env.warnings != WarnCallback.NONE) {
            env.warnings.warn(message + ": /" + new String(bytes, getBegin(), getEnd()) + "/");
        }
    }

And new String(bytes, getBegin(), getEnd()) should be replaced with new String(bytes, getBegin(), getEnd() - getBegin()) as far as String constructor accepts offset and length arguments instead of start and end indices.

Graphenes are not matched correctly using \X

Testing the letter ร  in the form of a graphene encoded as U+0061 U+0300 using a Ruby MRI (2.3.1 here but version doesn't matter), a \X will match the graphene:

$ irb
2.3.1 :001 > x = "h\u0061\u0300llo"
 => "haฬ€llo" 
2.3.1 :002 > x =~ /h\Xllo/
 => 0 

The match fails when testing the same thing using JRuby:

$ irb
jruby-9.1.7.0 :001 > x = "h\u0061\u0300llo"
 => "haฬ€llo"
jruby-9.1.7.0 :002 > x =~ /h\Xllo/
 => nil 

Enable GitHub Discussions

GitHub Discussions provides a transparent place to discuss things that don't make as much sense directly as issues. For example:

... but Onigmo hasn't been updated for quite a while. Can this version (used by jruby) include a superset of Ruby features (and fixes)?

  • Q & A / how do I X?
  • djl includes a "Show and tell" area which could be cool. (I'm curious which high profile libraries and systems use joni!)

namedBackrefIterator throws NPE when there is no named

I wrote the following tests:

import org.jcodings.specific.UTF8Encoding;
import org.joni.Matcher;
import org.joni.Option;
import org.joni.Regex;
import org.junit.Assert;
import org.junit.Test;


public class TestJoni {

    @Test
    public void testWithName() {
        byte[] pattern = "(?<name>a)a*".getBytes();
        byte[] str = "aaa".getBytes();

        Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
        Matcher matcher = regex.matcher(str);
        int result = matcher.search(0, str.length, Option.DEFAULT);
        
        Assert.assertEquals(0, result);
        Assert.assertNotNull(regex.namedBackrefIterator());
        
    }

    @Test
    public void testNoName() {
        byte[] pattern = "(a)a*".getBytes();
        byte[] str = "aaa".getBytes();

        Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
        Matcher matcher = regex.matcher(str);
        int result = matcher.search(0, str.length, Option.DEFAULT);
        
        Assert.assertEquals(0, result);
        Assert.assertNotNull(regex.namedBackrefIterator());
        
    }

}

The first test (testWithName) succed, my code is correct.
The second fails with a NPE:

java.lang.NullPointerException
	at org.joni.Regex.namedBackrefIterator(Regex.java:260)
	at TestJoni.testNoName(TestJoni.java:35)

I think namedBackrefIterator don't like when there is no named pattern. It should return an empty iterator (or null) instead.

Joni spins forever on invalid input

If you create a Matcher from a byte array containing invalid UTF-8, the match() method will spin forever due to invalid characters not being handled by ByteCodeMachine. For example, in the method opAnyCharStar():

    while (s < range) {
        ...
        int n = enc.length(bytes, s, end);
        if (s + n > range) {opFail(); return;}
        ...
    }

The enc.length() call returns -1 for malformed input, but this value isn't checked for, so the loop never exits. I haven't looked at this deeply enough to know the correct solution, but there are a ton of calls and none of them are checked.

Copyright issue

Hi, I'd like to use this library, but I can't find copyright.
I think this library is MIT license. So, There should be 'Copyright (c) ' text.

Please tell me where I can find those copyright text.

Joni regex matcher hang.

When I run the following code, the program will hang, is it a bug?

import org.jcodings.specific.UTF8Encoding;
import org.joni.Matcher;
import org.joni.Option;
import org.joni.Regex;

public class Demo {
    public static void main(String[] args) {
        byte[] str = "m1666666654656dsffddfssubscribeaaaaa_3499_g415780803".getBytes();
        byte[] pattern = "^([a-z0-9]+)+$".getBytes();

        Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
        Matcher matcher = regex.matcher(str);
        int result = matcher.search(0, str.length, Option.DEFAULT);
        System.out.println("result: " + result);
    }
}

ArrayIndexOutOfBoundsException for valid input

JONI fails with ArrayIndexOutOfBoundsException for pattern ^show\s*(\b.+\b)\s*vs\s*(\b.+\b)$ and input show c.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
	at io.airlift.jcodings.specific.UTF8Encoding.length(UTF8Encoding.java:35)
	at io.airlift.jcodings.specific.BaseUTF8Encoding.mbcToCode(BaseUTF8Encoding.java:91)
	at io.airlift.jcodings.specific.UTF8Encoding.mbcToCode(UTF8Encoding.java:24)
	at io.airlift.jcodings.Encoding.isMbcWord(Encoding.java:469)
	at io.airlift.joni.ByteCodeMachine.opWordBound(ByteCodeMachine.java:1063)
	at io.airlift.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:239)
	at io.airlift.joni.Matcher.matchCheck(Matcher.java:304)
	at io.airlift.joni.Matcher.searchInterruptible(Matcher.java:457)
	at io.airlift.joni.Matcher.search(Matcher.java:318)

Reproduction:

        byte[] pattern = "^show\\s*(\\b.+\\b)\\s*vs\\s*(\\b.+\\b)$".getBytes(StandardCharsets.UTF_8);
        byte[] str = ("show c").getBytes(StandardCharsets.UTF_8);
        Regex regex = new Regex(pattern, 0, pattern.length, Option.NEGATE_SINGLELINE, UTF8Encoding.INSTANCE, Syntax.Java);
        Matcher matcher = regex.matcher(str);
        int result = matcher.search(0, str.length, Option.DEFAULT);
        System.out.println(result);

joni interprets `[\w-#]` as `[ !"#0-9A-Z_a-z]`

Joni interprets [\w-#] as [ !"#0-9A-Z_a-z] in both default syntax and Java syntax. Java Pattern interprets it as [-0-9A-Z_a-z].

An addition question: in general, is it considered a bug if interpretation doesn't match Java pattern when syntax is set to Java?

Support for specifying \G position

In Onigmo, it is possible to specify the position of \G by calling the function onig_search_gpos. I have implemented this functionality in my fork here.

If you think this could be a useful addition for Joni as well, I'm very happy to PR it inโ€”happy to make any modifications beforehand as well, please just let me know.

I'm not sure what API you would want, but for now I have implemented it as overloaded implementations of Matcher#search and Matcher#searchInterruptible:

search(int gpos, int start, int range, int option) and searchInterruptible(int gpos, int start, int range, int option).

Thanks!

Enable ability to escape from combinatorial explosion early

Joni's look-ahead/look-behind feature in evaluating regex matches can find themselves in large recursive loops causing things like elastic/elasticsearch#28731 to occur.

It would be nice to be able to enable Config.CEC so that combinatorial explosion heuristically checks can be applied to prevent certain matches to explode things.

The ability to interrupt the engine thread is nice, but it would be great if one did not have to spawn a timer on a separate thread just to watch the engine and prevent it from taking up too many resources.

any thoughts?

Multiline Option with ^ and $ anchors

Hi,

I am struggling with proper configuration of Option passed to search method with the Syntax.ECMAScript. I would expect that with Option.DEFAULT / Option.NONE regex with usage of ^ ,$ anchors and no explicit newline will fail with newline character. For example

byte[] pattern = "^[a-z]{1,10}$".getBytes();
byte[] str = "a\nb".getBytes();

Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE, Syntax.ECMAScript);
Matcher matcher = regex.matcher(str);
int result = matcher.search(0, str.length, Option.DEFAULT);

should results with -1 but currently results with 0. Even passing Option.SINGLELINE does not change it. What I did to make this work, was to subtract the Option.MULTILINE

int result = matcher.search(0, str.length, -Option.MULTILINE)

I have tested this case with multiple online regex tools and JavaScript regex implementation in my browser and this example always gives me no match (as I expect). Only adding multiline option gives me similar result as with Joni library.

Setting syntax to Java works as expected and gives similar result as this snippet with built-in java regex

String pattern = "^[a-z]{1,10}$";
String str = "a\nb";

Pattern p = Pattern.compile(pattern);
java.util.regex.Matcher m = p.matcher(str);
boolean result = m.find();

Is the MULTILINE option default for library ECMAScript syntax and should it be? I was digging into the ECMAScript and looks like multiline = false is the default (user has to explicitly pass m flag).

ArrayIndexOutOfBoundsException for grapheme_clusters

Not sure if I should report this here or to JRuby but the error seems to come from joni:

$ bin/jruby -e 'p [0xA4].pack("C").force_encoding("UTF-8").grapheme_clusters'
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: -1
java.lang.ArrayIndexOutOfBoundsException: -1
                          length at org/jcodings/specific/UTF8Encoding.java:30
                       isMbcHead at org/jcodings/Encoding.java:497
                      opCClassMB at org/joni/ByteCodeMachine.java:793
                         execute at org/joni/ByteCodeMachine.java:203
                         matchAt at org/joni/ByteCodeMachine.java:167
                     matchCommon at org/joni/Matcher.java:115
                           match at org/joni/Matcher.java:92
       enumerateGraphemeClusters at org/jruby/RubyString.java:5859
               grapheme_clusters at org/jruby/RubyString.java:5872
                            call at org/jruby/RubyString$INVOKER$i$0$0$grapheme_clusters.gen:-1
                            call at org/jruby/internal/runtime/methods/JavaMethod.java:309
                    cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:323
                            call at org/jruby/runtime/callsite/CachingCallSite.java:139
  invokeOther5:grapheme_clusters at -e:1
                          <main> at -e:1
             invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                            load at org/jruby/ir/Compiler.java:94
                       runScript at org/jruby/Ruby.java:852
                     runNormally at org/jruby/Ruby.java:771
                     runNormally at org/jruby/Ruby.java:789
                     runFromMain at org/jruby/Ruby.java:601
                   doRunFromMain at org/jruby/Main.java:415
                     internalRun at org/jruby/Main.java:307
                             run at org/jruby/Main.java:234
                            main at org/jruby/Main.java:206

The new spec spec/ruby/core/string/shared/grapheme_clusters.rb added in jruby/jruby#5385 fails due to this.

It's slow (hang and cause OutOfMemoryError) in certain case

        <dependency>
            <groupId>org.jruby.joni</groupId>
            <artifactId>joni</artifactId>
            <version>2.1.30</version>
        </dependency>

code:

	/** A half of a 32kb binary text block encoded in GB18030 among which I want to execute regex search */
	final static byte[] Data =new byte[]{
			(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x41,(byte)0x62,(byte)0x6f,(byte)0x75,(byte)0x74,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0x7f,(byte)0x41,(byte)0x62,(byte)0x72,(byte)0x69,(byte)0x20,(byte)0x48,(byte)0x65,(byte)0x72,(byte)0x62,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0x88,(byte)0x41,(byte)0x63,(byte)0x61,(byte)0x63,(byte)0x69,(byte)0x61,(byte)0x20,(byte)0x63,(byte)0x61,(byte)0x74,(byte)0x65,(byte)0x63,(byte)0x68,(byte)0x75,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0x8f,(byte)0x41,(byte)0x63,(byte)0x61,(byte)0x6e,(byte)0x74,(byte)0x68,(byte)0x6f,(byte)0x70,(byte)0x61,(byte)0x6e,(byte)0x61,(byte)0x63,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x63,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x43,(byte)0x6f,(byte)0x72,(byte)0x74,(byte)0x65,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0x98,(byte)0x41,(byte)0x63,(byte)0x61,(byte)0x6e,(byte)0x74,(byte)0x68,(byte)0x6f,(byte)0x70,(byte)0x61,(byte)0x6e,(byte)0x61,(byte)0x63,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x53,(byte)0x65,(byte)0x6e,(byte)0x74,(byte)0x69,(byte)0x63,(byte)0x6f,(byte)0x73,(byte)0x69,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x20,(byte)0x65,(byte)0x74,(byte)0x20,(byte)0x43,(byte)0x61,(byte)0x75,(byte)0x6c,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xa1,(byte)0x41,(byte)0x63,(byte)0x6f,(byte)0x6e,(byte)0x69,(byte)0x74,(byte)0x69,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xa8,(byte)0x41,(byte)0x63,(byte)0x6f,(byte)0x6e,(byte)0x69,(byte)0x74,(byte)0x69,(byte)0x20,(byte)0x54,(byte)0x75,(byte)0x62,(byte)0x65,(byte)0x72,(byte)0x20,(byte)0x4c,(byte)0x61,(byte)0x74,(byte)0x65,(byte)0x72,(byte)0x61,(byte)0x6c,(byte)0x65,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xaf,(byte)0x41,(byte)0x63,(byte)0x6f,(byte)0x6e,(byte)0x69,(byte)0x74,(byte)0x75,(byte)0x6d,(byte)0x20,(byte)0x62,(byte)0x72,(byte)0x61,(byte)0x63,(byte)0x68,(byte)0x79,(byte)0x70,(byte)0x6f,(byte)0x64,(byte)0x75,(byte)0x6d,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xbc,(byte)0x41,(byte)0x63,(byte)0x6f,(byte)0x72,(byte)0x69,(byte)0x20,(byte)0x52,(byte)0x68,(byte)0x69,(byte)0x7a,(byte)0x6f,(byte)0x6d,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xc5,(byte)0x41,(byte)0x63,(byte)0x74,(byte)0x69,(byte)0x6e,(byte)0x6f,(byte)0x6c,(byte)0x69,(byte)0x74,(byte)0x75,(byte)0x6d,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xce,(byte)0x41,(byte)0x63,(byte)0x79,(byte)0x72,(byte)0x61,(byte)0x6e,(byte)0x74,(byte)0x68,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x42,(byte)0x69,(byte)0x64,(byte)0x65,(byte)0x6e,(byte)0x74,(byte)0x61,(byte)0x74,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xd5,(byte)0x41,(byte)0x64,(byte)0x65,(byte)0x6e,(byte)0x6f,(byte)0x70,(byte)0x68,(byte)0x6f,(byte)0x72,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xde,(byte)0x41,(byte)0x65,(byte)0x73,(byte)0x63,(byte)0x75,(byte)0x6c,(byte)0x69,(byte)0x20,(byte)0x46,(byte)0x72,(byte)0x75,(byte)0x63,(byte)0x74,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xe7,(byte)0x41,(byte)0x67,(byte)0x61,(byte)0x73,(byte)0x74,(byte)0x61,(byte)0x63,(byte)0x68,(byte)0x65,(byte)0x73,(byte)0x20,(byte)0x73,(byte)0x65,(byte)0x75,(byte)0x20,(byte)0x50,(byte)0x6f,(byte)0x67,(byte)0x6f,(byte)0x73,(byte)0x74,(byte)0x65,(byte)0x6d,(byte)0x69,(byte)0x20,(byte)0x48,(byte)0x65,(byte)0x72,(byte)0x62,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xee,(byte)0x41,(byte)0x67,(byte)0x6b,(byte)0x69,(byte)0x73,(byte)0x74,(byte)0x72,(byte)0x6f,(byte)0x64,(byte)0x6f,(byte)0x6e,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xf5,(byte)0x41,(byte)0x67,(byte)0x72,(byte)0x69,(byte)0x6d,(byte)0x6f,(byte)0x6e,(byte)0x69,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x48,(byte)0x65,(byte)0x72,(byte)0x62,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0e,(byte)0xfe,(byte)0x41,(byte)0x67,(byte)0x72,(byte)0x69,(byte)0x6d,(byte)0x6f,(byte)0x6e,(byte)0x69,(byte)0x61,(byte)0x20,(byte)0x70,(byte)0x69,(byte)0x6c,(byte)0x6f,(byte)0x73,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x07,(byte)0x41,(byte)0x69,(byte)0x64,(byte)0x69,(byte)0x63,(byte)0x68,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x10,(byte)0x41,(byte)0x69,(byte)0x6c,(byte)0x61,(byte)0x6e,(byte)0x74,(byte)0x68,(byte)0x69,(byte)0x20,(byte)0x43,(byte)0x6f,(byte)0x72,(byte)0x74,(byte)0x65,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x17,(byte)0x41,(byte)0x69,(byte)0x79,(byte)0x65,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x1e,(byte)0x41,(byte)0x6c,(byte)0x62,(byte)0x69,(byte)0x7a,(byte)0x7a,(byte)0x69,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x43,(byte)0x6f,(byte)0x72,(byte)0x74,(byte)0x65,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x27,(byte)0x41,(byte)0x6c,(byte)0x67,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x54,(byte)0x68,(byte)0x61,(byte)0x6c,(byte)0x6c,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x2e,(byte)0x41,(byte)0x6c,(byte)0x69,(byte)0x73,(byte)0x6d,(byte)0x61,(byte)0x74,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x52,(byte)0x68,(byte)0x69,(byte)0x7a,(byte)0x6f,(byte)0x6d,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x35,(byte)0x41,(byte)0x6c,(byte)0x6c,(byte)0x69,(byte)0x69,(byte)0x20,(byte)0x42,(byte)0x75,(byte)0x6c,(byte)0x62,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x3c,(byte)0x41,(byte)0x6c,(byte)0x6c,(byte)0x69,(byte)0x69,(byte)0x20,(byte)0x46,(byte)0x69,(byte)0x73,(byte)0x74,(byte)0x75,(byte)0x6c,(byte)0x6f,(byte)0x73,(byte)0x69,(byte)0x20,(byte)0x42,(byte)0x75,(byte)0x6c,(byte)0x62,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x43,(byte)0x41,(byte)0x6c,(byte)0x6c,(byte)0x69,(byte)0x69,(byte)0x20,(byte)0x53,(byte)0x61,(byte)0x74,(byte)0x69,(byte)0x76,(byte)0x69,(byte)0x20,(byte)0x42,(byte)0x75,(byte)0x6c,(byte)0x62,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x4a,(byte)0x41,(byte)0x6c,(byte)0x6c,(byte)0x69,(byte)0x69,(byte)0x20,(byte)0x54,(byte)0x75,(byte)0x62,(byte)0x65,(byte)0x72,(byte)0x6f,(byte)0x73,(byte)0x69,(byte)0x20,(byte)0x53,(byte)0x65,(byte)0x6d,(byte)0x65,(byte)0x6e,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x53,(byte)0x41,(byte)0x6c,(byte)0x6f,(byte)0x65,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x5a,(byte)0x41,(byte)0x6c,(byte)0x70,(byte)0x69,(byte)0x6e,(byte)0x69,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x4b,(byte)0x61,(byte)0x74,(byte)0x73,(byte)0x75,(byte)0x6d,(byte)0x61,(byte)0x64,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x53,(byte)0x65,(byte)0x6d,(byte)0x65,(byte)0x6e,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x63,(byte)0x41,(byte)0x6c,(byte)0x70,(byte)0x69,(byte)0x6e,(byte)0x69,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x4f,(byte)0x66,(byte)0x66,(byte)0x69,(byte)0x63,(byte)0x69,(byte)0x6e,(byte)0x61,(byte)0x72,(byte)0x75,(byte)0x6d,(byte)0x20,(byte)0x52,(byte)0x68,(byte)0x69,(byte)0x7a,(byte)0x6f,(byte)0x6d,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x6c,(byte)0x41,(byte)0x6c,(byte)0x70,(byte)0x69,(byte)0x6e,(byte)0x69,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x4f,(byte)0x78,(byte)0x79,(byte)0x70,(byte)0x68,(byte)0x79,(byte)0x6c,(byte)0x6c,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x46,(byte)0x72,(byte)0x75,(byte)0x63,(byte)0x74,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x75,(byte)0x41,(byte)0x6c,(byte)0x75,(byte)0x6d,(byte)0x65,(byte)0x6e,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x7c,(byte)0x41,(byte)0x6d,(byte)0x6f,(byte)0x6d,(byte)0x69,(byte)0x20,(byte)0x43,(byte)0x61,(byte)0x72,(byte)0x64,(byte)0x61,(byte)0x6d,(byte)0x6f,(byte)0x6d,(byte)0x69,(byte)0x20,(byte)0x46,(byte)0x72,(byte)0x75,(byte)0x63,(byte)0x74,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x83,(byte)0x41,(byte)0x6d,(byte)0x6f,(byte)0x6d,(byte)0x69,(byte)0x20,(byte)0x53,(byte)0x65,(byte)0x6d,(byte)0x65,(byte)0x6e,(byte)0x20,(byte)0x73,(byte)0x65,(byte)0x75,(byte)0x20,(byte)0x46,(byte)0x72,(byte)0x75,(byte)0x63,(byte)0x74,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x8a,(byte)0x41,(byte)0x6d,(byte)0x6f,(byte)0x6d,(byte)0x69,(byte)0x20,(byte)0x54,(byte)0x73,(byte)0x61,(byte)0x6f,(byte)0x2d,(byte)0x6b,(byte)0x6f,(byte)0x20,(byte)0x46,(byte)0x72,(byte)0x75,(byte)0x63,(byte)0x74,(byte)0x75,(byte)0x73,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x91,(byte)0x41,(byte)0x6d,(byte)0x70,(byte)0x65,(byte)0x6c,(byte)0x6f,(byte)0x70,(byte)0x73,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x98,(byte)0x41,(byte)0x6d,(byte)0x79,(byte)0x64,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x43,(byte)0x61,(byte)0x72,(byte)0x61,(byte)0x70,(byte)0x61,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0x9f,(byte)0x41,(byte)0x6e,(byte)0x64,(byte)0x72,(byte)0x6f,(byte)0x67,(byte)0x72,(byte)0x61,(byte)0x70,(byte)0x68,(byte)0x69,(byte)0x64,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x48,(byte)0x65,(byte)0x72,(byte)0x62,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xa8,(byte)0x41,(byte)0x6e,(byte)0x65,(byte)0x6d,(byte)0x61,(byte)0x72,(byte)0x72,(byte)0x68,(byte)0x65,(byte)0x6e,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x52,(byte)0x68,(byte)0x69,(byte)0x7a,(byte)0x6f,(byte)0x6d,(byte)0x61,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xaf,(byte)0x41,(byte)0x6e,(byte)0x67,(byte)0x65,(byte)0x6c,(byte)0x69,(byte)0x63,(byte)0x61,(byte)0x20,(byte)0x44,(byte)0x61,(byte)0x68,(byte)0x75,(byte)0x72,(byte)0x69,(byte)0x63,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xb6,(byte)0x41,(byte)0x6e,(byte)0x67,(byte)0x65,(byte)0x6c,(byte)0x69,(byte)0x63,(byte)0x61,(byte)0x20,(byte)0x44,(byte)0x75,(byte)0x68,(byte)0x75,(byte)0x6f,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xbd,(byte)0x41,(byte)0x6e,(byte)0x67,(byte)0x65,(byte)0x6c,(byte)0x69,(byte)0x63,(byte)0x61,(byte)0x65,(byte)0x20,(byte)0x53,(byte)0x69,(byte)0x6e,(byte)0x65,(byte)0x6e,(byte)0x73,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x52,(byte)0x61,(byte)0x64,(byte)0x69,(byte)0x78,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xc4,(byte)0x41,(byte)0x6e,(byte)0x74,(byte)0x65,(byte)0x6c,(byte)0x6f,(byte)0x70,(byte)0x69,(byte)0x73,(byte)0x20,(byte)0x43,(byte)0x6f,(byte)0x72,(byte)0x6e,(byte)0x75,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xcd,(byte)0x41,(byte)0x70,(byte)0x6f,(byte)0x63,(byte)0x79,(byte)0x6e,(byte)0x75,(byte)0x6d,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x00,(byte)0x0f,(byte)0xd6,(byte)0x41,(byte)0x50,(byte)0x55,(byte)0x44,(byte)0xcf,(byte)0xb5,(byte)0xcd,(byte)0xb3,(byte)0xd6,(byte)0xd7,(byte)0xc1,(byte)0xf6,(byte)0xa3,(byte)0xa8,(byte)0xb2,(byte)0xa1,(byte)0xc0,(byte)0xed,(byte)0xd1,(byte)0xa7,(byte)0xa3,(byte)0xa9,(byte)0x00,
	};

	public static void main(String[] args) throws Exception {
			byte[] pattern = ".*happy".getBytes();
			Regex Joniregex = new Regex(pattern, 0, pattern.length, Option.IGNORECASE, UTF8Encoding.INSTANCE);
			Matcher Jonimatcher;
			Jonimatcher = Joniregex.matcher(data);
		try {
			System.out.println(""+Jonimatcher.match(1177, 1199, Option.DEFAULT));
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

1199-1177=22. It should be very fast.

org.jcodings.exception.CharacterPropertyException: invalid character property name <graphemeclusterbreak=emodifier>

Recently, it starts to fail to be tested on Debian unstable environment with below message

[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.05 s - in org.joni.test.TestU
[INFO] Running org.joni.test.TestU8
Pattern: [/\X/] Str: ["
"] Encoding: [UTF-8] Option: [] Syntax: [TEST]
org.jcodings.exception.CharacterPropertyException: invalid character property name <graphemeclusterbreak=emodifier>
        at org.jruby.jcodings/org.jcodings.unicode.UnicodeEncoding.propertyNameToCType(UnicodeEncoding.java:99)
        at org.jruby.joni/org.joni.Parser$GraphemeNames.nameToCtype(Parser.java:954)
        at org.jruby.joni/org.joni.Parser.parseExtendedGraphemeCluster(Parser.java:1082)
        at org.jruby.joni/org.joni.Parser.parseExp(Parser.java:792)
        at org.jruby.joni/org.joni.Parser.parseBranch(Parser.java:1520)
        at org.jruby.joni/org.joni.Parser.parseSubExp(Parser.java:1546)
        at org.jruby.joni/org.joni.Parser.parseRegexp(Parser.java:1579)
        at org.jruby.joni/org.joni.Analyser.compile(Analyser.java:78)
        at org.jruby.joni/org.joni.Regex.<init>(Regex.java:155)
        at org.jruby.joni/org.joni.Regex.<init>(Regex.java:134)
        at org.jruby.joni/org.joni.test.Test.xx(Test.java:113)
        at org.jruby.joni/org.joni.test.Test.x2s(Test.java:223)
        at org.jruby.joni/org.joni.test.Test.x2s(Test.java:218)
        at org.jruby.joni/org.joni.test.TestU8.test(TestU8.java:112)
        at org.jruby.joni/org.joni.test.Test.testRegexp(Test.java:256)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:52)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
        at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
SEVERE ERROR: invalid character property name <graphemeclusterbreak=emodifier>
(snip)

However, it succeeded once in Octobar 2018.
Could you give me an advice, please?

Valid UTF-8 input can cause infinite loop in JONI

In #7, @electrum identified a location that can cause inifinite loop in JONI. It is marked as won't fix because input can be sanitized beforehand and JONI assumes that the input is always valid.

When the pattern is "\uD8000", it can be pre-sanitized, as you suggested in #7. What if the pattern is "\\uD800"? How can the user sanitize it?

If JONI is willing to add a check, it would be the same fix for #7, checking whether the return value of enc.length is negative in OptExactInfo.concatStr.

Support for matches, replaceAll and split

Hi,
I'd like to know if there are any plans supporting operations matches, replaceAll, which java.util.regex has, and split which is a frequently-used operation.

guava has split, but it is based on java.util.regex.

Thanks.

[Q] Is it possible to improve parse speed of the Joni regexp library.

Hello, team.

First of all. Thank you very much for making great JRuby software.

Now, I'm making embulk-parser-joni_regexp.
I wanted to use Oniguruma compatible regular expression library.
That's why I'm using Joni.

Currently, My Joni code over three times slower than java.util.regex library.

My original code is here.
https://github.com/hiroyuki-sato/embulk-parser-joni_regexp/blob/master/src/main/java/org/embulk/parser/joni_regexp/JoniRegexpParserPlugin.java#L91-L119

I made test code regexp_test for compare regex speed.

The main part is the following.

It it possible to improve parse speed of the Joni regex library?

Thank you for your advice.

Kind regards.

format string

        String format = "^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \\[(?<time>[^\\]]*)\\] \"(?<method>\\S+)(?: +(?<path>[^ ]*) +\\S*)?\" (?<code>[^ ]*) (?<size>[^ ]*)(?: \"(?<referer>[^\\\"]*)\" \"(?<agent>[^\\\"]*)\")?$";

Joni

        byte[] pattern = format.getBytes(StandardCharsets.UTF_8);
        Regex regexp = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE);
// ...
            while (true) {
// ...
                byte[] line_bytes = line.getBytes(StandardCharsets.UTF_8);
                Matcher matcher = regexp.matcher(line_bytes);
                int result = matcher.search(0, line_bytes.length, Option.DEFAULT);
// ...
            }

java.util.regex

        Pattern pattern = Pattern.compile(format);

            while (true) {
// ...
                Matcher matcher = pattern.matcher(line);
                if (matcher.matches()) {
// ...
            }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.