mesejo / trex Goto Github PK
View Code? Open in Web Editor NEWEfficient string matching with regular expressions
Home Page: https://trrex.readthedocs.io/en/latest/
License: MIT License
Efficient string matching with regular expressions
Home Page: https://trrex.readthedocs.io/en/latest/
License: MIT License
Add missing documentation detailing how to integrate trrex with different libraries. The documentation should use the
pydata-sphinx-theme
See this stackoverflow question
Add experiments comparing the efficiency of trrex vs union-regex vs flashtext
The names left and right for parameters are not very informative, change the names to prefix and suffix
Hello Developer,
Below is the query sent and it is generating extra ")".
code:
import trrex as tx
tx.make(['IBS(.|)TECH','IBS(.|)SOL','IBS(.|)TEHK'], prefix=r"\b(", suffix=r")")
output='\b(IBS(.(?:|)SOL|*|)TE(?:HK|CH)))'
Please let me know if any further information is required on the same.
Thanks,
Rangam
There could be a parameter to replace the whitespace character for usage in multi-token keywords.
First, great package. Love using it! Makes my life much easier, and the speed is phenomenal!
I find that it works with a list of words, such as ['Love', 'Hello', ' Book',...]
Does it work with a list of regex patterns?
For example
regex_patterns = [
r'(?!^\d+)(?=.*)(\b\d+$\b)' #Remove any numbers that end string i.e "SHELL 545436"
,r'^\bSQ\b' #Remove the "SQ" if it starts the string ie. "SQ NORDSTROM"
,r'(?!\b(ST|^\w{1,2})\b$)\b\w{1,2}\b$' #Remove any words that are one or two chars at end of string i.e. "Burger King CA" -- except if they = "ST"
]
tx_maker = tx.make(entity_final_clean_regex_patterns, prefix=r"", suffix=r"")
#clean column
df['clean'] = df.STRING_COLUMN.str.replace(tx_maker, "", regex=True)
Sort in order of decreasing length to match the largest output first
What is happening?
Currently we escape every character of the input words even if they do not need escaping.
What should happen?
Remove this behavior because it prevents the end user from adding patterns and also it makes the processing slow.
Document how to escape regex characters.
The compile function offers a vanilla wrapper for using with the re Python module, is better to remove this function and then add it later if needed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.