Coder Social home page Coder Social logo

Regex RE2 toolkit about administration HOT 11 CLOSED

ibmstreams avatar ibmstreams commented on June 14, 2024
Regex RE2 toolkit

from administration.

Comments (11)

ddebrunner avatar ddebrunner commented on June 14, 2024

I would consider having the toolkit have a generic name, like Regex, rather than HighPerf, and then have a re2 namespace within it (e.g. com.ibm.streamsx.regex.re2).

Then any additional regex related functionality can be added to the toolkit in the future.

from administration.

leongor avatar leongor commented on June 14, 2024

It's a good idea.
For now the toolkit name is com.ibm.streams.regex (also the namespace), and only the operator name is HighPerfRegex.
Actually, I can call the operator similarly to SPL function - RegexMatchRE2 and leave the namespace com.ibm.streams.regex.

from administration.

ddebrunner avatar ddebrunner commented on June 14, 2024

+1 to the toolkit, though the pattern in IBMStreams is to use namespaces starting with 'com.ibm.streamsx' (note the x). The namepace prefix com.ibm.streams is reserved for the product.

from administration.

leongor avatar leongor commented on June 14, 2024

Ok, com.ibm.streamsx.regex indeed.
Thanks.

from administration.

ddebrunner avatar ddebrunner commented on June 14, 2024

Yeah, the RE2 name is better than HighPerf, as what happens when someone comes out with a faster one. :-)

from administration.

rrea avatar rrea commented on June 14, 2024

Since we have regex capability in Text Toolkit (aka System T) can you differentiate the two and why we need both?

from administration.

hildrum avatar hildrum commented on June 14, 2024

@leongor I think I've seen a presentation on this toolkit, is there any way you can link to it from here? The toolkit you're proposing is different than SystemT, and as I recall, in addition to better performance, it also has some features that make it different than just using regexMatch and regexMatchPerlin the SPL standard toolkit (eg, reading the regexes from a file?).

from administration.

leongor avatar leongor commented on June 14, 2024

@rrea This toolkit is performance oriented and based on one of the fastest regex libraries called RE2. Addtionally, it's designed to be stateful allowing compiling regex once and running it multiple times boosting performance even more.
@hildrum You're right, we used the toolkit to handle a very long regex read from a blacklist file, but we did it by using FileSource.
Actually, adding optional parameter to read a regex from a file sounds as very nice idea!
There is an internal IBM link about our toolkits I have added in Ngrams toolkit discussion - once we have all the approvals we'll publish it on external site.

from administration.

mikespicer avatar mikespicer commented on June 14, 2024

+1 for this toolkit

from administration.

petenicholls avatar petenicholls commented on June 14, 2024

+1 on streamsx.regex

from administration.

leongor avatar leongor commented on June 14, 2024

I've uploaded the initial version to streamsx.regex repository and opened issues there.

from administration.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.