Coder Social home page Coder Social logo

grandmoff100 / regexfactory Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 4.0 154 KB

Dynamically construct regex patterns with native python representations

Home Page: https://regexfactory.readthedocs.io

License: GNU General Public License v3.0

Python 100.00%
python regex regex-pattern library regex-generator sphinx-documentation regular-expression

regexfactory's Introduction

RegexFactory

Dynamically construct python regex patterns.

Examples

Matching Initials

Say you want a regex pattern to match the initials of someones name.

import re
from regexfactory import Amount, Range


pattern = Amount(Range("A", "Z"), 2, 3)

matches = pattern.findall(
    "My initials are BDP. Valorie's are VO"
)

print(matches)
['BDP', 'VO']

Matching Hex Strings

Or how matching both uppercase and lowercase hex strings in a sentence.

import re
from regexfactory import *

pattern = Optional("#") + Or(
    Amount(
        Set(
            Range("0", "9"),
            Range("a", "f")
        ),
        6
    ),
    Amount(
        Set(
            Range("0", "9"),
            Range("A", "F")
        ),
        6
    ),

)

sentence = """
My favorite color is #000000. I also like 5fb8a0. My second favorite color is #FF21FF.
"""

matches = pattern.findall(sentence)
print(matches)
['#000000', '5fb8a0', '#FF21FF']

Matching URLs

Or what if you want to match urls in html content?

from regexfactory import *


protocol = Amount(Range("a", "z"), 1, or_more=True)
host = Amount(Set(WORD, DIGIT, '.'), 1, or_more=True)
port = Optional(IfBehind(":") + Multi(DIGIT))
path = Multi(
    RegexPattern('/') + Multi(
        NotSet('/', '#', '?', '&', WHITESPACE),
        match_zero=True
    ),
    match_zero=True
)
patt = protocol + RegexPattern("://") + host + port + path



sentence = "This is a cool url, https://github.com/GrandMoff100/RegexFactory/ "
print(patt)

print(patt.search(sentence))
[a-z]{1,}://[\w\d.]{1,}(?:\d{1,})?(/([^/#?&\s]{0,})){0,}
<re.Match object; span=(15, 51), match='https://github.com/GrandMoff100/RegexFactory/'>

The Pitch

This library is really good at allowing you to intuitively understand how to construct a regex expression. It helps you identify what exactly your regular expression is, and can help you debug it. This is library is also very helpful for generating regex expressions on the fly if you find uses for it. You can also extend this library by subclassing RegexPattern and add your own support for different regex flavors. Like generating regex expresison with Perl5 extensions.

There you have it. This library is intuitive, extensible, modular, and dynamic. Why not use it?

regexfactory's People

Contributors

frank113 avatar grandmoff100 avatar itsdrike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

regexfactory's Issues

Adding __hash__ operator to support set operation for RegexPattern

Hi @GrandMoff100 ,

I wonder if you can add hash operator to support set operation for RegexPattern.

The use case is as follows:

from regexfactory.chars import ANY, DIGIT
from regexfactory.pattern import escape
set([ANY, ANY, DIGIT, DIGIT, escape('apple'), escape('apple')]) 
>> {<RegexPattern '.'>, <RegexPattern '\d'>, <RegexPattern 'apple'>}

Without hash, the above situation will raise:

TypeError: unhashable type: 'RegexPattern'.

A simple additional __hash__ method to the RegexPattern class may solve the problem:

    def __hash__(self):
         return hash(self.regex)

I am willing to make this contribute, too. Is it possible to add this new feature to the near future release (maybe v1.0.1)?

Feature Suggestion: Add union / intersect / subset operators to the Regex Classes

Hi @GrandMoff100 ,

I would like to suggest a new feature for this package.
Since a regex is a union of multiple patterns, it can be considered a set. How about building union (or) / intersect (and) / is subset of (in) operators to each regex classes?

For example:

Range("1", "5") | Range("4", "8")
-> Range("1", "8")

Range("1", "5") & Range("4", "8")
-> Range("4", "5")

Range("3", "4") in Range("1", "7")
-> True

I am trying to build a regex inference engine. This feature will be beneficial for organizing the regex(es) when the size of the regex object list is large.
I wonder if it's possible to add this feature? I am willing to make this contribute.

Adding pytest

Hi @GrandMoff100 ,

Currently, I am trying to start building a regex generation project using your RegexFactory package. I would like to add some unittest to your package so that I can be more confident in using the package.

I wonder if I can fork this repo and contribute some unitests to it?

Also, is there any thing I must notice when building the unittest?

Add License

Hi, I noticed that your repository doesn't have a license, which is actually a pretty big issue for everyone forking or perhaps even using your repository.

Why are licenses important

Whenever you publish essentially anything, in this case some code or static content created by you, you automatically gain copyright over that content. This copyright is respected in most developed countries and it imposes big legal restrictions on the content in this repository. Essentially, this means that while you decided to make your software source-available, it is not "open-sourced", because it doesn't have an open source license. This means that while you as the copyright owner allowed others to look at your source, you did not give legal permission to re-use this code or to alter it in any way.

This means that everybody who has forked your repository is actually legally breaking the copyright law. You should really consider adding an open sourced (ideally an FSF approved) license. Doing this will not affect your copyright over the code, it just gives everyone some additional permissions that wouldn't be there without it, a license like this is the only thing that allows others to make derivative projects (projects based on your code), or utilize parts of your code in other places.

License types

Learning about different licenses may be a bit confusing at first, and i would suggest you do your own research. I will describe some basic categories of these licenses and what they mean, but this is NOT a legal advice and again, you should always to your own research. Don't just blindly trust some random guy on the internet.

Copy-Left licenses

These licenses allow others to use your code and built upon it, but they require the use of the same license. This essentially enforces the code to always stay open-sourced as it explicitly forbids sublicensing under any non-copatible license (i.e. a license with different terms).

Most notably, this include the GPL licenses.

Permissive licenses

Permissive licenses give your contributors a lot more freedom because they allow sublicensing. They usually only require stating the original source of the code. This means that if you would decide to use a permissive license, anyone would be able to make a close-sourced project based on your code and even sell it, as long as there is a mention that some parts of this code-base were utilized in it. (If that license requires it)

Most notably, this includes the MIT license and the Apache 2.0 License, another popular set of licenses are the BSD licenses, most notably the BSD 2-Clause license.

Public domain licenses

There are also so called public domain licenses, these licenses are actually a subtype of permissive licenses, however when we talk about permissive licenses, we generally don't really refer to the public domain ones, which is why I separated this category as it is a bit special.

Public domain licenses essentially strip the copyright away and essentially it gives everybody rights to do pretty much anything with the code. They don't require any mentions of the original source for that code and obviously they allow sublicensing. These licenses dont actually "remove" your copyright over that code, they just allow anyome to do anything with the project, essentially disregarding the copyright alltogether.

Most notably, this includes the Unlicense license.

Picking your license

While you aren't required to add a license of any kind, it is a bit weird to have an open-sourced project without any license. At the moment, all of the contributors are actually breaking the copyright law which really isn't ideal. Even though I'm sure you don't have the intention to sue these people, you technically could even though they just wanted to contribute to your project. That's a bit unexpected for the contributors and it could discourage many people from contributing to or even using this project.

Some great websites;

  • There is a page from GitHub that may help you pick the correct license for your program.
  • You can also check out their post about licensing here.
  • Another wonderful site you should check out is https://tldrlegal.com, it can quickly show you the details about what license requires from others when they use your code, and what rights it gives them.

Limitations:
While you can license your code under any license whatsoever, if your code is utilizing some copy-left licensed parts (whether by a library-like dependency or directly by having bits of code written by others with that license), you will need to respect the terms of that license and license your code accordingly. This means that if you have a copy-left licensed dependency that you're utilizing in your project, you will need to adhere to it's terms or remove that dependency, otherwise you, as the owner of that code are yourself violating the copyright law by doing something prohibited by copyright and not explicitly allowed by the license of the software you depend on.

My personal preference:
In my opinion, to support open-source as much as possible, I usually go with GPL v3, this is because it doesn't allow your code to ever become proprietary since it requires for anyone distributing the software to also distribute the source code and the fulltext of the GPL license along with it. Since it doesn't allow sublicensing, others also can't just use a permissive license and have code from your code-base in those projects, while this may be a bit limiting if someone would like to re-use your project, they can always mark their project as GPL and use the code that way, therefore supporting the open-source community even more.

However for projects made as libraries, pure GPL may be too restrictive for some people since it will limit the amount of total users to only those willing to comply with the GPL license. For that reason whenever I write some code that people may end up importing, I also often choose LGPL, this license acts the same way as basic GPL, but when the code is used as a library, rather than a derivative work, it becomes permissive and allows sublicensing. This allows any project (even proprietary ones) to use your library, but if someone were making a fork of this project, they would have to keep it open-sourced and under LGPL. Note: GPL isn't LGPL compatible which means that if you do use LGPL, you will not be able to use GPLed code anywhere in your code-base.

But of course, this is your decision and everybody can have a different opinion about this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.