Comments (6)
I would welcome a pull request for this, but the cache(s) should not be static.
Consider extracting the code from the CucumberExpression
constructor to a CucumberExpressionFactory
. This factory could then be instantiated in the ExpressionFactory
with a cache instance. This means that once the ExpressionFactory
goes out of scope, and no more expressions are created, the cache will be garbage collected.
Note that the RegularExpression
also create a TreeRegex
. The same optimization may be useful there.
Please also update the documentation for the TreeRegex
and GroupBuilder
to note that they should be thread-safe.
from cucumber-expressions.
Ah. Please disregard my previous comment.
The ExpressionFactory
is instantiated with a ParameterTypeRegistry
which contains scenario specific state. While we could share the cache inside a Runner
, I need to think about this. At some point it becomes easier to remove cucumber-java8
and then refactor cucumber-core
instead.
from cucumber-expressions.
I don't see the issue with ParameterTypeRegistry
: the proposed solution only cache things that are "really constant" (i.e. that are not impacted by cucumber-java8
).
I don't think the static cache will cause a real memory usage issue because on my project with ~150 stepdefs and ~400 test scenarios, the TreeRegex
cache has only 117 elements and the escapedTexts
cache has only 205 elements. However, if you want to replace the static caching, the ExpressionFactory
could create a Runner
which holds the TreeRegex
cache. This Runner
would be passed to RegularExpression
and CucumberExpression
constructors (small API change), with the advantage to have only one TreeRegex
cache instead of two (one in RegularExpression
and CucumberExpression
). The Runner
could hold the escapedTexts
as well.
Caching the TreeRegex
in RegularExpression
makes the constructor 4 times faster (JMH benchmark: uncached=7'000 ops/s, cached=28'000 ops/s). On my project with 150 stepdefs and 400 test scenarios, this makes no impact because the RegularExpression.<init>
was contributing to only 0.34% of the total CPU time. But this may be different with other projects.
Good point on thread-safety. TreeRegex
looks thread-safe. GroupBuilder.add()
method is not thread-safe because it calls ArrayList.add()
, but I don't think this is an issue as GroupBuilder.add()
is only called when the GroupBuilder
is constructed in TreeRegex.createGroupBuilder()
. Maybe the GroupBuilder
could be modified to remove the add()
method with a Builder pattern in order to ensure thread-safety (TreeRegex.createGroupBuilder()
could be moved to this Builder for a better separation of concerns).
from cucumber-expressions.
I don't see the issue with ParameterTypeRegistry : the proposed solution only cache things that are "really constant" (i.e. that are not impacted by cucumber-java8).
I was hoping to keep a non-static cache in the ExpressionFactory
and then retain a reference to the factory in Cucumber-JVM. But this isn't feasible right now. While I'm not too worried about leaking memory this way, not having statics anywhere makes it much easier to reason about (concurrent) code.
And while this proposal does make steps towards cucumber/cucumber-jvm#2035 it would become mostly obsolete the moment we drop cucumber-java8
and build every cucumber expression exactly once. So I would rather spend time working towards improvement that don't become obsolete in the future.
Would it be possible to create an optimization for escapeRegex
that doesn't depend on caching?
from cucumber-expressions.
I'll try to optimize escapeRegex
without caching (I'm thinking about a custom String.replace(String,String)
method, but this would require more effort and the code would be more complex than a simple regexp).
from cucumber-expressions.
I've refactored the escapeRegex
method with two different methods inspired by String.replace(String,String)
:
Method | Description | ops/s |
---|---|---|
escapeRegex0 | The original method CucumberExpression.escapeRegex |
1'200'000 |
escapeRegex1 | Escaping character by character | 8'300'000 |
escapeRegex2 | Escaping characters by blocs | 6'300'000 |
The escapeRegex1
is 7 times faster than the original version and is implemented as follows:
/**
* List of characters to be escaped.
* The last char is '}' with index 125, so we need only 126 characters.
*/
public static final boolean[] CHAR_TO_ESCAPE = new boolean[126];
static {
CHAR_TO_ESCAPE['^'] = true;
CHAR_TO_ESCAPE['$'] = true;
CHAR_TO_ESCAPE['['] = true;
CHAR_TO_ESCAPE[']'] = true;
CHAR_TO_ESCAPE['('] = true;
CHAR_TO_ESCAPE[')'] = true;
CHAR_TO_ESCAPE['{'] = true;
CHAR_TO_ESCAPE['}'] = true;
CHAR_TO_ESCAPE['.'] = true;
CHAR_TO_ESCAPE['|'] = true;
CHAR_TO_ESCAPE['?'] = true;
CHAR_TO_ESCAPE['*'] = true;
CHAR_TO_ESCAPE['+'] = true;
CHAR_TO_ESCAPE['\\'] = true;
}
public static String escapeRegex1(String text) {
int length = text.length();
StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < CHAR_TO_ESCAPE.length && CHAR_TO_ESCAPE[currentChar]) {
sb.append('\\');
}
sb.append(currentChar);
}
return sb.toString();
}
The code is pretty simple and elegant.
The escapeRegex1
is implemented as follows:
// CHAR_TO_ESCAPE ommited for brivety (same as above)
public static String escapeRegex2(String text) {
int length = text.length();
StringBuilder sb = new StringBuilder(length);
int blocStart=0;
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < CHAR_TO_ESCAPE.length && CHAR_TO_ESCAPE[currentChar]) {
if (i > blocStart) {
// flush previous block
sb.append(text, blocStart, i);
}
sb.append('\\');
sb.append(currentChar);
blocStart=i+1;
}
}
if (length > blocStart) {
// flush remaining characters
sb.append(text, blocStart, length);
}
return sb.toString();
}
The code is a bit more complex and slower than escapeRegex1
.
The full CucumberExpression
benchmark is now (the ops/s cannot be compared with the ones from the original post because this is not on the same machine):
Benchmark | cached calls to escapeRegex | cached TreeRegex creation | ops/s |
---|---|---|---|
CucumberExpressionBenchmark.createExpression0 | no | no | 97 |
CucumberExpressionBenchmark.createExpression1 | yes | no | 101 |
CucumberExpressionBenchmark.createExpression2 | no | yes | 113 |
CucumberExpressionBenchmark.createExpression3 | yes | yes | 119 |
CucumberExpressionBenchmark.createExpression4 | no (optimized escapeRegex1 ) |
no | 104 |
So the escapeRegex1
lead to a total 7% improvement over the curent implementation, in the same range as the variant with cached calls to escapeRegex
. This is not significant right now (as we escape the same regexp several times), but may become significant the day we get rid of cucumber-java8
.
Should I do a PR with only this optimization with escapeRegex1
(i.e. without caching the TreeRegex
and without caching calls to escapeRegex
) ?
from cucumber-expressions.
Related Issues (20)
- Remove Python example illustrating how to define parameter types HOT 2
- more powerful optional text HOT 12
- Allow free-form text in optionals HOT 1
- Expression for an argument list HOT 4
- Improve Expression creation performance HOT 2
- Use action-publish-pypi in cucumber-expressions
- Optimize Cucumber expression creation in .Net implementation
- Action Required: Fix Renovate Configuration
- Golang big.Float is not a big decimal HOT 12
- Feature: named capture groups HOT 13
- Missing source for source map
- RegEx "expressions" and defineParameterType feature file examples ?
- Add the ability to ignore a cucumber expression parameter just like a non capturing regex HOT 12
- Nested capture groups are not passed to custom param definition transformer HOT 8
- `end` as a method name / prop can cause some issues HOT 4
- LICENSE file is empty HOT 1
- Typo in error message for use of alternation inside an optional
- Unable to parse negative number in Norwegian HOT 11
- Released Ruby gem is missing LICENSE file. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cucumber-expressions.