tdunning / log-synth Goto Github PK

View Code? Open in Web Editor NEW

255.0 255.0 93.0 3.36 MB

Generates more or less realistic log data for testing simple aggregation queries.

License: Apache License 2.0

Shell 0.03% Java 99.24% R 0.71% 1C Enterprise 0.02%

log-synth's People

Contributors

Stargazers

Watchers

Forkers

sscdotopen allenday apsaltis boorad cneundorf justin2061 antony-a1 mattyb149 dunzo angusws trastle deorus tongqqiu roboojack jsunmapr fwilckens borisveis pratimdas mailmahee parrottsquawk wugology codeaudit totekp smarthi nmadhire gorillatester mr-justin narayana-glassbeam quantiumtechnology shashank64 jmukherjee 4sp1r3 harite minhqnguyen pb-pravin jatin7 dieselnexr panelion chulhyuncho dumoulma jcrutsinger pandeyaah saggarsunil vicenteg andrew-svds fausto-svds ulfandreasson yogidevendra larry-svds lukasz-bielinski iandow lucentcosmos joyeshmishra msellamitn plawson pdeyhim mamtavn watercityflame kbalde sudheer0553 skhurana333 rayzkaunda smakireddy zhang920714 devapkd clabra saeednusri dmoore247 aahmed-se ogfunkycold jamesagada yassinebouabdallaoui jaideepmurkute romanumero championjared rasyadhs anish97ind etsangsplk gtrevg arvindshmicrosoft christian-sattler mohamedelqdusy bing-ok kurhula soapwaster hadimc pedramnavid jaideepjoshi ecerritos

log-synth's Issues

Error Java.lang.ClassNotFoudException : com.mapr.synth.Synth

Hello,
I created a folder "~/git/target/" and I copied the synth file in it. I also created a schema.synth file with the following informations:

[
   {"name":"id", "class":"id"},
   {"name":"name", "class":"name", "type":"first_last"},
   {"name":"gender", "class":"string", "dist":{"MALE":0.5, "FEMALE":0.5, "OTHER":0.02}},
   {"name":"address", "class":"address"},
   {"name":"first_visit", "class":"date", "format":"MM/dd/yyyy"}
]

All other files of the project are stored in "~/git/" folder.

When I execute the command : ./target/synth -count 500 -schema schema.synth I get the following error:

Java.lang.ClassNotFoudException : com.mapr.synth.Synth

Can you guys help me solve this issu as I need to stress test my ELK stack.

Thanks

"Locale" issues when compiling log-synth

Hi, there! While compiling log-synth, maven reported failures when performing tests, which prevented it from completing the building process. For example, when testing ZipSampler, an error was produced when trying to convert a string like "99,9999" to double by using Java's Double.parseDouble(). I realized it was happening because of "locale issues". I am in Brazil, using Xubuntu 18.04. In my terminals, the default locale is "pt_BR.UTF-8". Here, a comma (",") is used as decimal separator (instead of "."). In ZipSampler, for example, an Exception was being raised because of line 262:

return accept(Double.parseDouble(latitude), Double.parseDouble(longitude));

Lines 256 and 257 are as follows:

String longitude = location.get("longitude").asText();
String latitude = location.get("latitude").asText();

Due to my locale, asText() method produces floating point strings using "," as decimal separator. In order to get the code compiled, I executed

export LC_ALL=en_US.utf8
export LANG=en_US.utf8 (just in case)

on a terminal, ran

mvn clean
mvn package

and compilation was successfully finished.

If there is an easier way to solve this issue, please let me know. If not, I'd suggest to set locale to "en_US-UTF-8" on testing classes.

Also, if anything more than "mvn package" is required for compilation, please let me know. I could not find compilation instructions on log-synth documentation.

Thanks!

gauva dependency is old

Hi,

Was using log-synth with another project that uses guava and ran into some dependency mismatches. It'd be nice to have a newer guava.

Here's the changes I made to update log-synth. Happy to submit a PR if there isn't another reason to keep gauva back.

vicenteg@685ff52

Flatten doesn't do what people expect

People think it will take a complex object such as emitted by ZipSampler and promote the fields to top-level.

The current FlattenSampler needs to be moved to be ArrayFlattener and an object flattener put in its place.

Would like Markov state machine generator

Sometimes we would like to have transaction histories that represent plausible user actions. The simplest might be {login, do stuff, logout}* where you can only do stuff while logged in.

One simple way to simulate something like this is to have a state machine where transitions from state to state are selected stochastically based on specified transition probabilities.

The suggested syntax for the schema for such a generator would be something like this:

{
    "name": "history",
    "class": "markov-state-machine",
    "transitions": {
        "init": {"start":1}
        "start": {
            "abort": 0.1,
            "progress": 0.8,
            "finish": 0.1
        },
        "abort": {
            "start": 1
        },
        "progress": {
            "abort": 0.1,
            "progress": 0.8,
            "finish": 0.1
        },
        "finish": {
            "start": 1
        }
    }
}

Of course, the transition probabilities may not add up to 1 so they should be normalized.

The sample result would be a list of transactions much like the common-point-of-compromise generator produces.

Add new class for predefined arbitrary string distribution

One suggested enhancement for this is to add a new class that provides a random distribution between a set of pre-defined strings. For example, UHC has one column in their tables "active_flg" to define whether a particular record/patient is active. This field is either "Y" or "N". It's not possible to match the filters in their query with the current string generation classes (address and name) without modifying the output data.

sometimes useful to synth event times as millis since epoch, rather than a formatted date

Rather than take a formatted date and convert it back to millis or seconds since epoch, it is handy sometimes to just generate the millis since epoch timestamp.

Flatten + zip w/ CSV generates Nulls

For JSON the flatten method seems to work OK;

Here's schema file:

[
{
"name": "z",
"class": "flatten",
"value": { "class": "zip", "fields": "latitude, longitude"}
}

]

w/ JSON:

{"z-longitude":"-85.96","z-latitude":"39.35"}
{"z-longitude":"-74.63","z-latitude":"44.97"}

when I specify CSV:

null
null
null

I tried verbose=true/false, same thing.

extra line at the end of synth'd output

[vagrant@node1 you-suck]$ ~/log-synth/synth -count $((RANDOM % 10)) -schema ~/you-suck/test/preso-ratings.schema -format json
{"name":"Nicole","timestamp":"2014-11-03 18:40:12","slide_title":"slide3","rating":"sweet!"}
{"name":"Paula","timestamp":"2014-11-03 18:43:51","slide_title":"slide1","rating":"meh."}
F   1   0.0 0   0.0 0.000

The last line prevents me using the synth'd JSON directly when redirecting the output to a file - the last line of output is not parseable as JSON. I think a simple solution would be to print that final line of logging information to stderr instead of stdout, so I can redirect to a file without filtering.

If that sound reasonable I'm happy to try it out in a fork and submit a pull request.

Use case, if the context is helpful: I am playing with spark streaming with textFileStream, and I'm using log-synth to generate data periodically in new files that will be batched up and processed by spark.

Zip sampler surprises with error when lat/long not preserved and geo selection is used

Which fields a user selects should not cause a fault. Currently this can happen if the user doesn't preserve both latitude and longitude when using some kind of geographical limitation.

how to add condition

I used following code to generate data

{"name":"br", "class":"browser"}

it generate data which I required but in my case I need some more complex cases like

if br==IE then set os=Win so my output will look like this

{"br":"IE","os":"Win"},{"br":"Chrome"}]

how I should pass condition

Test com.mapr.synth.samplers.VectorSamplerTest failed

I'm on Windows 10 64-bit version 1607 OS Build 14393.187, using jdk1.8.0_102:

mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T17:41:47+01:00)
Maven home: C:\tools\apache-maven-3.3.9
Java version: 1.8.0_102, vendor: Oracle Corporation
Java home: C:\Program Files\Java\jdk1.8.0_102\jre
Default locale: en_GB, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"

I'm on commit 79f16ac which is the last commit at the time I've made a git clone.

When I do a mvn package, the test com.mapr.synth.samplers.VectorSamplerTest fails with the following message :

-------------------------------------------------------------------------------
Test set: com.mapr.synth.samplers.VectorSamplerTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.513 sec <<< FAILURE!
testVector(com.mapr.synth.samplers.VectorSamplerTest)  Time elapsed: 0.511 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0.0> but was:<-0.3083262151155839>
    at org.junit.Assert.fail(Assert.java:93)
    at org.junit.Assert.failNotEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:443)
    at org.junit.Assert.assertEquals(Assert.java:512)
    at com.mapr.synth.samplers.VectorSamplerTest.testVector(VectorSamplerTest.java:103)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Unable to use class type "int" with log-synth

I'm using the following schema definition with class type "int" to generate random ints:

[
    {"name":"id", "class":"id"},
    {"name":"size", "class":"int", "min":10, "max":99}
]

I get the following exception using this schema:

[root@rhl-n1 log-synth]# java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Synth -count 1 -schema schema
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'int' into a subtype of [simple type, class org.apache.drill.synth.FieldSampler]
at [Source: schema; line: 3, column: 23]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:701)
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:155)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:98)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:82)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:107)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:228)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:203)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:23)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:1903)
at org.apache.drill.synth.SchemaSampler.(SchemaSampler.java:34)
at org.apache.drill.synth.Synth.main(Synth.java:30)

make quoting fields configurable on the synth command line

It'd be nice to be able to directly import log-synth generated TSV into a metastore table without having to remove the quotes first.

The old log generator produced unquoted keys in JSON output

That isn't kosher JSON, of course, even though some tools may accept it.

“NoSuchMethodErrors” due to multiple versions of org.codehaus.woodstox:stax2-api

Issue description:

Hi, there are multiple versions of org.codehaus.woodstox:stax2-api in log-synth. As shown in the following dependency tree, according to Maven's "nearest wins" strategy, only org.codehaus.woodstox:stax2-api:3.1.1 can be loaded, org.codehaus.woodstox:stax2-api:4.2 will be shadowed.

However, several methods defined in shadowed version org.codehaus.woodstox:stax2-api:4.2 are referenced by client project via com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.10 but missing in the actually loaded version org.codehaus.woodstox:stax2-api:3.1.1.

For instance, the following missing method(defined in org.codehaus.woodstox:stax2-api:4.2) are actually referenced by log-synth, which will introduce a runtime error(i.e., "NoSuchMethodError") into log-synth.

org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object) is invoked by log-synth via the following path:


Invocation path------
<com.mapr.synth.Synth$ReportingWorker: java.lang.Integer call()> log-synth\target\classes
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void createStartElem(java.lang.String,java.lang.String,java.lang.String,boolean)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeNamespace(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void outputAttribute(java.lang.String,java.lang.String,java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: java.lang.String findOrCreateAttrPrefix(java.lang.String,java.lang.String,com.ctc.wstx.dom.DOMOutputElement)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.sw.OutputElementBase: java.lang.String getExplicitPrefix(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.evt.MergedNsContext: java.lang.String getPrefix(java.lang.String)> Repositories\org\codehaus\woodstox\stax2-api\3.1.1\stax2-api-3.1.1.jar
<com.ctc.wstx.sr.InputElementStack: java.util.Iterator getPrefixes(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.util.DataUtil: java.util.Iterator singletonIterator(java.lang.Object)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object)>

Dependency tree----


[INFO] log-synth:log-synth:jar:0.1-SNAPSHOT
[INFO] +- me.lemire.integercompression:JavaFastPFOR:jar:0.0.13:compile
[INFO] +- org.apache.mahout:mahout-math:jar:0.9:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.2:compile
[INFO] |  +- (com.google.guava:guava:jar:16.0:compile - omitted for conflict with 27.1-jre)
[INFO] |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.6.6)
[INFO] +- com.clearspring.analytics:stream:jar:2.5.0:test
[INFO] |  \- it.unimi.dsi:fastutil:jar:6.5.7:test
[INFO] +- com.carrotsearch.randomizedtesting:randomizedtesting-runner:jar:2.1.11:compile
[INFO] |  \- junit:junit:jar:4.10:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.6.6:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.6.6:runtime
[INFO] |  +- (org.slf4j:slf4j-api:jar:1.6.6:runtime - omitted for duplicate)
[INFO] |  \- log4j:log4j:jar:1.2.17:runtime
[INFO] +- args4j:args4j:jar:2.0.23:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.0.pr1:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0.pr1:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.10.0.pr1:compile
[INFO] +- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.4:compile
[INFO] |  +- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  \- org.codehaus.woodstox:stax2-api:jar:3.1.1:compile
[INFO] |     \- (javax.xml.stream:stax-api:jar:1.0-2:compile - omitted for duplicate)
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-xml:jar:2.9.10:compile
[INFO] |  +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.10:compile
[INFO] |  |  +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  |  +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  |  \- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] |  \- com.fasterxml.woodstox:woodstox-core:jar:5.3.0:compile
[INFO] |     \- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] +- org.freemarker:freemarker:jar:2.3.21:compile
[INFO] +- com.google.guava:guava:jar:27.1-jre:compile
[INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:2.5.2:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] |  +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] +- stax:stax-api:jar:1.0.1:compile
[INFO] +- org.processing:core:jar:3.0b6:compile
[INFO] \- com.tdunning:t-digest:jar:3.0:compile

Build a server that gives out records in real-time

The idea is that you would register a schema with a startSampler call. This call would return an ID.

Then you could ask for samples with getSample. An alternative could be getRealTimeSample which would delay the next sample until enough time has passed that a specific field has a time in the past. You would pass in the ID you got from startSampler. Each sampler should have a buffer so multiple clients can request samples in parallel and each would wait for the next sample in turn (for real-time samples). In any case, thread safety constraints should be observed ... some samplers might be run in many threads.

Eventually, you would call stopStream to free up resources.

An alternative API would have one call startStream where you would give a schema and any realtime constraint and results would be streamed back. This is safer in terms of deallocating resources automatically but slightly more complex.

Add scripted post processor

This can be handy to transform or restructure or filter data. A common use is to run a model against parameters selected by the main sampling.

Schema to generate nested JSON

Hello, I would like to know if there is any class/schema which can be used to generate nested JSON (sample as below)

Example :

{
"name": {
"first": "David",
"last": "Joseph"
},
"student": {
"details": {
"school.id": "1010",
"school.name": "Brooklands"

	}
},
"version": "1.0.1"

}

Thanks in advance,
Deepti

When using the "name" class without the required "type" field there should be a clear error

When using the name field in this schema (missing the type field in the name class):

[
  {"name": "id", "class": "id"},
  {"name": "name", "class": "name"}
]

log-synth throws a NullPointerException:

log-synth(master ✗) java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar com.mapr.synth.Synth -schema users-schema-test.json -format CSV
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at com.mapr.synth.Synth.main(Synth.java:114)
Caused by: java.lang.NullPointerException
    at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:80)
    at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:24)
    at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
    at com.mapr.synth.Synth$ReportingWorker.generateFile(Synth.java:207)
    at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:171)
    at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:125)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

This should throw an error asking that the type be provided, or better yet, a default type should be provided for you, say FIRST_LAST?

Add XML formatting

web-log.md is missing

Hi, cool project! I see the README file mentioning a web-log.md but I can't seem to locate it in the repo.

Infinite data stream with volume peaks

I was wondering if it’s possible to run the application constantly, rolling the file every X minutes (in conjunction with fluentd)
with “random” bursts

my aim is to mimic even more realistic data pipelines

Problem flattening data

Hi Ted,

Here's the error I get:

$ ./synth -schema msg/clickstream.json
id,user_id,program_id,timestamp,last_five_actions,action,device,br,la,st,os
Exception in thread "main" java.lang.IllegalArgumentException: Cannot flatten type class com.fasterxml.jackson.databind.node.ObjectNode
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:33)
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:12)
at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
at com.mapr.synth.Synth.main(Synth.java:34)

Here's the schema, which is pretty much just lifted from the example in README.md, except for the fact I'm using the lookup class for the value of the "base" key:

[
{"name":"id", "class":"id"},
{"name":"user_id", "class": "foreign-key", "size": 100000 },
{"name":"program_id", "class": "foreign-key", "size": 125 },
{"name":"timestamp", "class": "date", "format": "yyyy-MM-dd HH:MM:ss.SS", "start": "2014-08-12 00:00:00.00" },

{"name":"last_five_actions", "class": "flatten", "value":
{
"class": "sequence", "length": 5, "base": {
"class": "lookup", "file": "msg/actions.csv"
}
}
},

{"name":"action", "class":"string", "dist":{
"play":21, "stop":19, "pause":16, "ff": 12, "rw": 7, "replay": 2}
},

{"name":"device", "class":"string", "dist":{
"large":25, "phone":45, "tablet":25, "other": 5}
},

{"name":"br", "class":"browser"},
{"name":"la", "class":"language"},
{"name":"st", "class":"state"},
{"name":"os", "class":"os"}
]

I'm proceeding without the lookup stuff, since that produces data that I think is good enough for my purpose.

empty "log" file created

My attempt to run

    java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main 1M log users

results in two empty files, "1M" and "log"

if I add "count" option, it throws NullPointerException

java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main -count 1M log users
Usage: -count <number>G|M|K [ -users number ] [-format JSON|LOG|CSV ] log-file user-profiles
Exception in thread "main" java.lang.NullPointerException
    at org.apache.drill.synth.Main.main(Main.java:36)

Ubuntu 12.04
Java 1.7.0_45

Generate dependent data

Hi there,
I am trying to generate events data, like below. There are two events : started and ended.
One condition to consider while generating data is that timestamp of start event should be less than end event. How to configure that using log-synth?

{
	"actor": {
		"name": "student"
	},
	"task": {
		"id": "assignment"
	},
	"status": {
		"id": "ended"
	},
	"context": {
		"extensions": {
			"user-id": "11111",
			"username": "XYZ",
			"currentassignmentid": "2",
			"institutionname": "ABC"
		}
	},
	"timestamp": "2016-06-20 03:00:00",
	"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	"version": "1.0.1"
}

{
	"actor": {
		"name": "student"
	},
	"task": {
		"id": "assignment"
	},
	"status": {
		"id": "started"
	},
	"context": {
		"extensions": {
			"user-id": "11111",
			"username": "XYZ",
			"currentassignmentid": "2",
			"institutionname": "ABC"
		}
	},
	"timestamp": "2016-06-20 01:00:00",
	"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	"version": "1.0.1"
}

Build test failure in ZipSamplerTest

I see this test failure running 'mvn package'. This from surefire-reports folder in the ...ZipSamplerTest.txt file. No other failures.

Test set: com.mapr.synth.samplers.ZipSamplerTest

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.9 sec <<< FAILURE!
testZips(com.mapr.synth.samplers.ZipSamplerTest) Time elapsed: 8.899 sec <<< FAILURE!
java.lang.AssertionError: expected:<-90.88465> but was:<-85.52415105800087>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:441)
at org.junit.Assert.assertEquals(Assert.java:510)
at com.mapr.synth.samplers.ZipSamplerTest.testZips(ZipSamplerTest.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)