tdunning / log-synth Goto Github PK
View Code? Open in Web Editor NEWGenerates more or less realistic log data for testing simple aggregation queries.
License: Apache License 2.0
Generates more or less realistic log data for testing simple aggregation queries.
License: Apache License 2.0
Hello,
I created a folder "~/git/target/" and I copied the synth file in it. I also created a schema.synth file with the following informations:
[
{"name":"id", "class":"id"},
{"name":"name", "class":"name", "type":"first_last"},
{"name":"gender", "class":"string", "dist":{"MALE":0.5, "FEMALE":0.5, "OTHER":0.02}},
{"name":"address", "class":"address"},
{"name":"first_visit", "class":"date", "format":"MM/dd/yyyy"}
]
All other files of the project are stored in "~/git/" folder.
When I execute the command : ./target/synth -count 500 -schema schema.synth
I get the following error:
Java.lang.ClassNotFoudException : com.mapr.synth.Synth
Can you guys help me solve this issu as I need to stress test my ELK stack.
Thanks
Hi, there! While compiling log-synth, maven reported failures when performing tests, which prevented it from completing the building process. For example, when testing ZipSampler, an error was produced when trying to convert a string like "99,9999" to double by using Java's Double.parseDouble(). I realized it was happening because of "locale issues". I am in Brazil, using Xubuntu 18.04. In my terminals, the default locale is "pt_BR.UTF-8". Here, a comma (",") is used as decimal separator (instead of "."). In ZipSampler, for example, an Exception was being raised because of line 262:
return accept(Double.parseDouble(latitude), Double.parseDouble(longitude));
Lines 256 and 257 are as follows:
String longitude = location.get("longitude").asText();
String latitude = location.get("latitude").asText();
Due to my locale, asText() method produces floating point strings using "," as decimal separator. In order to get the code compiled, I executed
export LC_ALL=en_US.utf8
export LANG=en_US.utf8 (just in case)
on a terminal, ran
mvn clean
mvn package
and compilation was successfully finished.
If there is an easier way to solve this issue, please let me know. If not, I'd suggest to set locale to "en_US-UTF-8" on testing classes.
Also, if anything more than "mvn package" is required for compilation, please let me know. I could not find compilation instructions on log-synth documentation.
Thanks!
Hi,
Was using log-synth with another project that uses guava and ran into some dependency mismatches. It'd be nice to have a newer guava.
Here's the changes I made to update log-synth. Happy to submit a PR if there isn't another reason to keep gauva back.
People think it will take a complex object such as emitted by ZipSampler and promote the fields to top-level.
The current FlattenSampler needs to be moved to be ArrayFlattener and an object flattener put in its place.
Sometimes we would like to have transaction histories that represent plausible user actions. The simplest might be {login, do stuff, logout}* where you can only do stuff while logged in.
One simple way to simulate something like this is to have a state machine where transitions from state to state are selected stochastically based on specified transition probabilities.
The suggested syntax for the schema for such a generator would be something like this:
{
"name": "history",
"class": "markov-state-machine",
"transitions": {
"init": {"start":1}
"start": {
"abort": 0.1,
"progress": 0.8,
"finish": 0.1
},
"abort": {
"start": 1
},
"progress": {
"abort": 0.1,
"progress": 0.8,
"finish": 0.1
},
"finish": {
"start": 1
}
}
}
Of course, the transition probabilities may not add up to 1 so they should be normalized.
The sample result would be a list of transactions much like the common-point-of-compromise generator produces.
One suggested enhancement for this is to add a new class that provides a random distribution between a set of pre-defined strings. For example, UHC has one column in their tables "active_flg" to define whether a particular record/patient is active. This field is either "Y" or "N". It's not possible to match the filters in their query with the current string generation classes (address and name) without modifying the output data.
Rather than take a formatted date and convert it back to millis or seconds since epoch, it is handy sometimes to just generate the millis since epoch timestamp.
For JSON the flatten method seems to work OK;
Here's schema file:
[
{
"name": "z",
"class": "flatten",
"value": { "class": "zip", "fields": "latitude, longitude"}
}
]
w/ JSON:
{"z-longitude":"-85.96","z-latitude":"39.35"}
{"z-longitude":"-74.63","z-latitude":"44.97"}
when I specify CSV:
null
null
null
I tried verbose=true/false, same thing.
[vagrant@node1 you-suck]$ ~/log-synth/synth -count $((RANDOM % 10)) -schema ~/you-suck/test/preso-ratings.schema -format json
{"name":"Nicole","timestamp":"2014-11-03 18:40:12","slide_title":"slide3","rating":"sweet!"}
{"name":"Paula","timestamp":"2014-11-03 18:43:51","slide_title":"slide1","rating":"meh."}
F 1 0.0 0 0.0 0.000
The last line prevents me using the synth'd JSON directly when redirecting the output to a file - the last line of output is not parseable as JSON. I think a simple solution would be to print that final line of logging information to stderr instead of stdout, so I can redirect to a file without filtering.
If that sound reasonable I'm happy to try it out in a fork and submit a pull request.
Use case, if the context is helpful: I am playing with spark streaming with textFileStream, and I'm using log-synth to generate data periodically in new files that will be batched up and processed by spark.
Which fields a user selects should not cause a fault. Currently this can happen if the user doesn't preserve both latitude and longitude when using some kind of geographical limitation.
I used following code to generate data
{"name":"br", "class":"browser"}
it generate data which I required but in my case I need some more complex cases like
if br==IE then set os=Win
so my output will look like this
{"br":"IE","os":"Win"},{"br":"Chrome"}]
how I should pass condition
I'm on Windows 10 64-bit version 1607 OS Build 14393.187, using jdk1.8.0_102:
mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T17:41:47+01:00)
Maven home: C:\tools\apache-maven-3.3.9
Java version: 1.8.0_102, vendor: Oracle Corporation
Java home: C:\Program Files\Java\jdk1.8.0_102\jre
Default locale: en_GB, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"
I'm on commit 79f16ac which is the last commit at the time I've made a git clone.
When I do a mvn package
, the test com.mapr.synth.samplers.VectorSamplerTest fails with the following message :
-------------------------------------------------------------------------------
Test set: com.mapr.synth.samplers.VectorSamplerTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.513 sec <<< FAILURE!
testVector(com.mapr.synth.samplers.VectorSamplerTest) Time elapsed: 0.511 sec <<< FAILURE!
java.lang.AssertionError: expected:<0.0> but was:<-0.3083262151155839>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:443)
at org.junit.Assert.assertEquals(Assert.java:512)
at com.mapr.synth.samplers.VectorSamplerTest.testVector(VectorSamplerTest.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
I'm using the following schema definition with class type "int" to generate random ints:
[
{"name":"id", "class":"id"},
{"name":"size", "class":"int", "min":10, "max":99}
]
I get the following exception using this schema:
[root@rhl-n1 log-synth]# java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Synth -count 1 -schema schema
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'int' into a subtype of [simple type, class org.apache.drill.synth.FieldSampler]
at [Source: schema; line: 3, column: 23]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:701)
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:155)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:98)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:82)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:107)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:228)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:203)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:23)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:1903)
at org.apache.drill.synth.SchemaSampler.(SchemaSampler.java:34)
at org.apache.drill.synth.Synth.main(Synth.java:30)
It'd be nice to be able to directly import log-synth generated TSV into a metastore table without having to remove the quotes first.
That isn't kosher JSON, of course, even though some tools may accept it.
Hi, there are multiple versions of org.codehaus.woodstox:stax2-api in log-synth. As shown in the following dependency tree, according to Maven's "nearest wins" strategy, only org.codehaus.woodstox:stax2-api:3.1.1 can be loaded, org.codehaus.woodstox:stax2-api:4.2 will be shadowed.
However, several methods defined in shadowed version org.codehaus.woodstox:stax2-api:4.2 are referenced by client project via com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.10 but missing in the actually loaded version org.codehaus.woodstox:stax2-api:3.1.1.
For instance, the following missing method(defined in org.codehaus.woodstox:stax2-api:4.2) are actually referenced by log-synth, which will introduce a runtime error(i.e., "NoSuchMethodError") into log-synth.
org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object) is invoked by log-synth via the following path:
Invocation path------
<com.mapr.synth.Synth$ReportingWorker: java.lang.Integer call()> log-synth\target\classes
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void createStartElem(java.lang.String,java.lang.String,java.lang.String,boolean)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeNamespace(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void outputAttribute(java.lang.String,java.lang.String,java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: java.lang.String findOrCreateAttrPrefix(java.lang.String,java.lang.String,com.ctc.wstx.dom.DOMOutputElement)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.sw.OutputElementBase: java.lang.String getExplicitPrefix(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.evt.MergedNsContext: java.lang.String getPrefix(java.lang.String)> Repositories\org\codehaus\woodstox\stax2-api\3.1.1\stax2-api-3.1.1.jar
<com.ctc.wstx.sr.InputElementStack: java.util.Iterator getPrefixes(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.util.DataUtil: java.util.Iterator singletonIterator(java.lang.Object)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object)>
Use configuration <dependencyManagement> to unify the version of library org.codehaus.woodstox:stax2-api to be 4.1 in the pom file.
Change dependency com.fasterxml.jackson.dataformat:jackson-dataformat-xml from 2.9.10 to 2.10.0.pr1.Because the newer version com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.10.0.pr1 does not invoke the above missing methods, such changing cn solve the problem.
This repair will introduce the following new dependencies:
jakarta.activation:jakarta.activation-api:1.2.1
jakarta.xml.bind:jakarta.xml.bind-api:2.3.2
Please let me know which solution do you prefer? I can submit a PR to fix it.
Thank you very much for your attention.
Best regards,
[INFO] log-synth:log-synth:jar:0.1-SNAPSHOT
[INFO] +- me.lemire.integercompression:JavaFastPFOR:jar:0.0.13:compile
[INFO] +- org.apache.mahout:mahout-math:jar:0.9:compile
[INFO] | +- org.apache.commons:commons-math3:jar:3.2:compile
[INFO] | +- (com.google.guava:guava:jar:16.0:compile - omitted for conflict with 27.1-jre)
[INFO] | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.6.6)
[INFO] +- com.clearspring.analytics:stream:jar:2.5.0:test
[INFO] | \- it.unimi.dsi:fastutil:jar:6.5.7:test
[INFO] +- com.carrotsearch.randomizedtesting:randomizedtesting-runner:jar:2.1.11:compile
[INFO] | \- junit:junit:jar:4.10:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.6.6:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.6.6:runtime
[INFO] | +- (org.slf4j:slf4j-api:jar:1.6.6:runtime - omitted for duplicate)
[INFO] | \- log4j:log4j:jar:1.2.17:runtime
[INFO] +- args4j:args4j:jar:2.0.23:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.0.pr1:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0.pr1:compile
[INFO] | \- com.fasterxml.jackson.core:jackson-core:jar:2.10.0.pr1:compile
[INFO] +- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.4:compile
[INFO] | +- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] | \- org.codehaus.woodstox:stax2-api:jar:3.1.1:compile
[INFO] | \- (javax.xml.stream:stax-api:jar:1.0-2:compile - omitted for duplicate)
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-xml:jar:2.9.10:compile
[INFO] | +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.10:compile
[INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | | +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | | \- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] | +- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] | \- com.fasterxml.woodstox:woodstox-core:jar:5.3.0:compile
[INFO] | \- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] +- org.freemarker:freemarker:jar:2.3.21:compile
[INFO] +- com.google.guava:guava:jar:27.1-jre:compile
[INFO] | +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] | +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] | +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] | +- org.checkerframework:checker-qual:jar:2.5.2:compile
[INFO] | +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] | +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] | \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] +- stax:stax-api:jar:1.0.1:compile
[INFO] +- org.processing:core:jar:3.0b6:compile
[INFO] \- com.tdunning:t-digest:jar:3.0:compile
The idea is that you would register a schema with a startSampler
call. This call would return an ID.
Then you could ask for samples with getSample
. An alternative could be getRealTimeSample
which would delay the next sample until enough time has passed that a specific field has a time in the past. You would pass in the ID you got from startSampler
. Each sampler should have a buffer so multiple clients can request samples in parallel and each would wait for the next sample in turn (for real-time samples). In any case, thread safety constraints should be observed ... some samplers might be run in many threads.
Eventually, you would call stopStream
to free up resources.
An alternative API would have one call startStream
where you would give a schema and any realtime constraint and results would be streamed back. This is safer in terms of deallocating resources automatically but slightly more complex.
This can be handy to transform or restructure or filter data. A common use is to run a model against parameters selected by the main sampling.
Hello, I would like to know if there is any class/schema which can be used to generate nested JSON (sample as below)
Example :
{
"name": {
"first": "David",
"last": "Joseph"
},
"student": {
"details": {
"school.id": "1010",
"school.name": "Brooklands"
}
},
"version": "1.0.1"
}
Thanks in advance,
Deepti
When using the name field in this schema (missing the type
field in the name class):
[
{"name": "id", "class": "id"},
{"name": "name", "class": "name"}
]
log-synth throws a NullPointerException
:
log-synth(master ✗) java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar com.mapr.synth.Synth -schema users-schema-test.json -format CSV
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at com.mapr.synth.Synth.main(Synth.java:114)
Caused by: java.lang.NullPointerException
at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:80)
at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:24)
at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
at com.mapr.synth.Synth$ReportingWorker.generateFile(Synth.java:207)
at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:171)
at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:125)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
This should throw an error asking that the type be provided, or better yet, a default type should be provided for you, say FIRST_LAST
?
Hi, cool project! I see the README file mentioning a web-log.md but I can't seem to locate it in the repo.
I was wondering if it’s possible to run the application constantly, rolling the file every X minutes (in conjunction with fluentd)
with “random” bursts
my aim is to mimic even more realistic data pipelines
Hi Ted,
Here's the error I get:
$ ./synth -schema msg/clickstream.json
id,user_id,program_id,timestamp,last_five_actions,action,device,br,la,st,os
Exception in thread "main" java.lang.IllegalArgumentException: Cannot flatten type class com.fasterxml.jackson.databind.node.ObjectNode
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:33)
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:12)
at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
at com.mapr.synth.Synth.main(Synth.java:34)
Here's the schema, which is pretty much just lifted from the example in README.md, except for the fact I'm using the lookup class for the value of the "base" key:
[
{"name":"id", "class":"id"},
{"name":"user_id", "class": "foreign-key", "size": 100000 },
{"name":"program_id", "class": "foreign-key", "size": 125 },
{"name":"timestamp", "class": "date", "format": "yyyy-MM-dd HH:MM:ss.SS", "start": "2014-08-12 00:00:00.00" },
{"name":"last_five_actions", "class": "flatten", "value":
{
"class": "sequence", "length": 5, "base": {
"class": "lookup", "file": "msg/actions.csv"
}
}
},
{"name":"action", "class":"string", "dist":{
"play":21, "stop":19, "pause":16, "ff": 12, "rw": 7, "replay": 2}
},
{"name":"device", "class":"string", "dist":{
"large":25, "phone":45, "tablet":25, "other": 5}
},
{"name":"br", "class":"browser"},
{"name":"la", "class":"language"},
{"name":"st", "class":"state"},
{"name":"os", "class":"os"}
]
I'm proceeding without the lookup stuff, since that produces data that I think is good enough for my purpose.
My attempt to run
java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main 1M log users
results in two empty files, "1M" and "log"
if I add "count" option, it throws NullPointerException
java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main -count 1M log users
Usage: -count <number>G|M|K [ -users number ] [-format JSON|LOG|CSV ] log-file user-profiles
Exception in thread "main" java.lang.NullPointerException
at org.apache.drill.synth.Main.main(Main.java:36)
Ubuntu 12.04
Java 1.7.0_45
Hi there,
I am trying to generate events data, like below. There are two events : started and ended.
One condition to consider while generating data is that timestamp of start event should be less than end event. How to configure that using log-synth?
{
"actor": {
"name": "student"
},
"task": {
"id": "assignment"
},
"status": {
"id": "ended"
},
"context": {
"extensions": {
"user-id": "11111",
"username": "XYZ",
"currentassignmentid": "2",
"institutionname": "ABC"
}
},
"timestamp": "2016-06-20 03:00:00",
"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"version": "1.0.1"
}
{
"actor": {
"name": "student"
},
"task": {
"id": "assignment"
},
"status": {
"id": "started"
},
"context": {
"extensions": {
"user-id": "11111",
"username": "XYZ",
"currentassignmentid": "2",
"institutionname": "ABC"
}
},
"timestamp": "2016-06-20 01:00:00",
"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"version": "1.0.1"
}
I see this test failure running 'mvn package'. This from surefire-reports folder in the ...ZipSamplerTest.txt file. No other failures.
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.9 sec <<< FAILURE!
testZips(com.mapr.synth.samplers.ZipSamplerTest) Time elapsed: 8.899 sec <<< FAILURE!
java.lang.AssertionError: expected:<-90.88465> but was:<-85.52415105800087>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:441)
at org.junit.Assert.assertEquals(Assert.java:510)
at com.mapr.synth.samplers.ZipSamplerTest.testZips(ZipSamplerTest.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.