Coder Social home page Coder Social logo

log-synth's People

Contributors

andrew-svds avatar asubmissions avatar dependabot[bot] avatar iandow avatar namato avatar tdunning avatar vicenteg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

log-synth's Issues

Error Java.lang.ClassNotFoudException : com.mapr.synth.Synth

Hello,
I created a folder "~/git/target/" and I copied the synth file in it. I also created a schema.synth file with the following informations:

[
   {"name":"id", "class":"id"},
   {"name":"name", "class":"name", "type":"first_last"},
   {"name":"gender", "class":"string", "dist":{"MALE":0.5, "FEMALE":0.5, "OTHER":0.02}},
   {"name":"address", "class":"address"},
   {"name":"first_visit", "class":"date", "format":"MM/dd/yyyy"}
]

All other files of the project are stored in "~/git/" folder.

When I execute the command : ./target/synth -count 500 -schema schema.synth I get the following error:

Java.lang.ClassNotFoudException : com.mapr.synth.Synth

Can you guys help me solve this issu as I need to stress test my ELK stack.

Thanks

"Locale" issues when compiling log-synth

Hi, there! While compiling log-synth, maven reported failures when performing tests, which prevented it from completing the building process. For example, when testing ZipSampler, an error was produced when trying to convert a string like "99,9999" to double by using Java's Double.parseDouble(). I realized it was happening because of "locale issues". I am in Brazil, using Xubuntu 18.04. In my terminals, the default locale is "pt_BR.UTF-8". Here, a comma (",") is used as decimal separator (instead of "."). In ZipSampler, for example, an Exception was being raised because of line 262:

return accept(Double.parseDouble(latitude), Double.parseDouble(longitude));

Lines 256 and 257 are as follows:

String longitude = location.get("longitude").asText();
String latitude = location.get("latitude").asText();

Due to my locale, asText() method produces floating point strings using "," as decimal separator. In order to get the code compiled, I executed

export LC_ALL=en_US.utf8
export LANG=en_US.utf8 (just in case)

on a terminal, ran

mvn clean
mvn package

and compilation was successfully finished.

If there is an easier way to solve this issue, please let me know. If not, I'd suggest to set locale to "en_US-UTF-8" on testing classes.

Also, if anything more than "mvn package" is required for compilation, please let me know. I could not find compilation instructions on log-synth documentation.

Thanks!

gauva dependency is old

Hi,

Was using log-synth with another project that uses guava and ran into some dependency mismatches. It'd be nice to have a newer guava.

Here's the changes I made to update log-synth. Happy to submit a PR if there isn't another reason to keep gauva back.

vicenteg@685ff52

Flatten doesn't do what people expect

People think it will take a complex object such as emitted by ZipSampler and promote the fields to top-level.

The current FlattenSampler needs to be moved to be ArrayFlattener and an object flattener put in its place.

Would like Markov state machine generator

Sometimes we would like to have transaction histories that represent plausible user actions. The simplest might be {login, do stuff, logout}* where you can only do stuff while logged in.

One simple way to simulate something like this is to have a state machine where transitions from state to state are selected stochastically based on specified transition probabilities.

The suggested syntax for the schema for such a generator would be something like this:

{
    "name": "history",
    "class": "markov-state-machine",
    "transitions": {
        "init": {"start":1}
        "start": {
            "abort": 0.1,
            "progress": 0.8,
            "finish": 0.1
        },
        "abort": {
            "start": 1
        },
        "progress": {
            "abort": 0.1,
            "progress": 0.8,
            "finish": 0.1
        },
        "finish": {
            "start": 1
        }
    }
}

Of course, the transition probabilities may not add up to 1 so they should be normalized.

The sample result would be a list of transactions much like the common-point-of-compromise generator produces.

Add new class for predefined arbitrary string distribution

One suggested enhancement for this is to add a new class that provides a random distribution between a set of pre-defined strings. For example, UHC has one column in their tables "active_flg" to define whether a particular record/patient is active. This field is either "Y" or "N". It's not possible to match the filters in their query with the current string generation classes (address and name) without modifying the output data.

Flatten + zip w/ CSV generates Nulls

For JSON the flatten method seems to work OK;

Here's schema file:

[
{
"name": "z",
"class": "flatten",
"value": { "class": "zip", "fields": "latitude, longitude"}
}

]​

w/ JSON:

{"z-longitude":"-85.96","z-latitude":"39.35"}
{"z-longitude":"-74.63","z-latitude":"44.97"}

when I specify CSV:

null
null
null

I tried verbose=true/false, same thing.

extra line at the end of synth'd output

[vagrant@node1 you-suck]$ ~/log-synth/synth -count $((RANDOM % 10)) -schema ~/you-suck/test/preso-ratings.schema -format json
{"name":"Nicole","timestamp":"2014-11-03 18:40:12","slide_title":"slide3","rating":"sweet!"}
{"name":"Paula","timestamp":"2014-11-03 18:43:51","slide_title":"slide1","rating":"meh."}
F   1   0.0 0   0.0 0.000

The last line prevents me using the synth'd JSON directly when redirecting the output to a file - the last line of output is not parseable as JSON. I think a simple solution would be to print that final line of logging information to stderr instead of stdout, so I can redirect to a file without filtering.

If that sound reasonable I'm happy to try it out in a fork and submit a pull request.

Use case, if the context is helpful: I am playing with spark streaming with textFileStream, and I'm using log-synth to generate data periodically in new files that will be batched up and processed by spark.

how to add condition

I used following code to generate data

{"name":"br", "class":"browser"}

it generate data which I required but in my case I need some more complex cases like

if br==IE then set os=Win so my output will look like this

{"br":"IE","os":"Win"},{"br":"Chrome"}]

how I should pass condition

Test com.mapr.synth.samplers.VectorSamplerTest failed

I'm on Windows 10 64-bit version 1607 OS Build 14393.187, using jdk1.8.0_102:

mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T17:41:47+01:00)
Maven home: C:\tools\apache-maven-3.3.9
Java version: 1.8.0_102, vendor: Oracle Corporation
Java home: C:\Program Files\Java\jdk1.8.0_102\jre
Default locale: en_GB, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"

I'm on commit 79f16ac which is the last commit at the time I've made a git clone.

When I do a mvn package, the test com.mapr.synth.samplers.VectorSamplerTest fails with the following message :

-------------------------------------------------------------------------------
Test set: com.mapr.synth.samplers.VectorSamplerTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.513 sec <<< FAILURE!
testVector(com.mapr.synth.samplers.VectorSamplerTest)  Time elapsed: 0.511 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0.0> but was:<-0.3083262151155839>
    at org.junit.Assert.fail(Assert.java:93)
    at org.junit.Assert.failNotEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:443)
    at org.junit.Assert.assertEquals(Assert.java:512)
    at com.mapr.synth.samplers.VectorSamplerTest.testVector(VectorSamplerTest.java:103)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Unable to use class type "int" with log-synth

I'm using the following schema definition with class type "int" to generate random ints:

[
    {"name":"id", "class":"id"},
    {"name":"size", "class":"int", "min":10, "max":99}
]

I get the following exception using this schema:

[root@rhl-n1 log-synth]# java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Synth -count 1 -schema schema
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'int' into a subtype of [simple type, class org.apache.drill.synth.FieldSampler]
at [Source: schema; line: 3, column: 23]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:701)
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:155)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:98)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:82)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:107)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:228)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:203)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:23)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2796)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:1903)
at org.apache.drill.synth.SchemaSampler.(SchemaSampler.java:34)
at org.apache.drill.synth.Synth.main(Synth.java:30)

“NoSuchMethodErrors” due to multiple versions of org.codehaus.woodstox:stax2-api

Issue description:

Hi, there are multiple versions of org.codehaus.woodstox:stax2-api in log-synth. As shown in the following dependency tree, according to Maven's "nearest wins" strategy, only org.codehaus.woodstox:stax2-api:3.1.1 can be loaded, org.codehaus.woodstox:stax2-api:4.2 will be shadowed.

However, several methods defined in shadowed version org.codehaus.woodstox:stax2-api:4.2 are referenced by client project via com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.10 but missing in the actually loaded version org.codehaus.woodstox:stax2-api:3.1.1.

For instance, the following missing method(defined in org.codehaus.woodstox:stax2-api:4.2) are actually referenced by log-synth, which will introduce a runtime error(i.e., "NoSuchMethodError") into log-synth.

org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object) is invoked by log-synth via the following path:


Invocation path------
<com.mapr.synth.Synth$ReportingWorker: java.lang.Integer call()> log-synth\target\classes
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeStartElement(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void createStartElem(java.lang.String,java.lang.String,java.lang.String,boolean)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void writeNamespace(java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: void outputAttribute(java.lang.String,java.lang.String,java.lang.String,java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.dom.WstxDOMWrappingWriter: java.lang.String findOrCreateAttrPrefix(java.lang.String,java.lang.String,com.ctc.wstx.dom.DOMOutputElement)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.sw.OutputElementBase: java.lang.String getExplicitPrefix(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.evt.MergedNsContext: java.lang.String getPrefix(java.lang.String)> Repositories\org\codehaus\woodstox\stax2-api\3.1.1\stax2-api-3.1.1.jar
<com.ctc.wstx.sr.InputElementStack: java.util.Iterator getPrefixes(java.lang.String)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<com.ctc.wstx.util.DataUtil: java.util.Iterator singletonIterator(java.lang.Object)> Repositories\com\fasterxml\woodstox\woodstox-core\5.3.0\woodstox-core-5.3.0.jar
<org.codehaus.stax2.ri.SingletonIterator: org.codehaus.stax2.ri.SingletonIterator create(java.lang.Object)>

Suggested fixing solutions:

  1. Use configuration <dependencyManagement> to unify the version of library org.codehaus.woodstox:stax2-api to be 4.1 in the pom file.

  2. Change dependency com.fasterxml.jackson.dataformat:jackson-dataformat-xml from 2.9.10 to 2.10.0.pr1.Because the newer version com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.10.0.pr1 does not invoke the above missing methods, such changing cn solve the problem.
    This repair will introduce the following new dependencies:
    jakarta.activation:jakarta.activation-api:1.2.1
    jakarta.xml.bind:jakarta.xml.bind-api:2.3.2

Please let me know which solution do you prefer? I can submit a PR to fix it.

Thank you very much for your attention.
Best regards,

Dependency tree----


[INFO] log-synth:log-synth:jar:0.1-SNAPSHOT
[INFO] +- me.lemire.integercompression:JavaFastPFOR:jar:0.0.13:compile
[INFO] +- org.apache.mahout:mahout-math:jar:0.9:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.2:compile
[INFO] |  +- (com.google.guava:guava:jar:16.0:compile - omitted for conflict with 27.1-jre)
[INFO] |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.6.6)
[INFO] +- com.clearspring.analytics:stream:jar:2.5.0:test
[INFO] |  \- it.unimi.dsi:fastutil:jar:6.5.7:test
[INFO] +- com.carrotsearch.randomizedtesting:randomizedtesting-runner:jar:2.1.11:compile
[INFO] |  \- junit:junit:jar:4.10:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.6.6:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.6.6:runtime
[INFO] |  +- (org.slf4j:slf4j-api:jar:1.6.6:runtime - omitted for duplicate)
[INFO] |  \- log4j:log4j:jar:1.2.17:runtime
[INFO] +- args4j:args4j:jar:2.0.23:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.0.pr1:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0.pr1:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.10.0.pr1:compile
[INFO] +- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.4:compile
[INFO] |  +- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  \- org.codehaus.woodstox:stax2-api:jar:3.1.1:compile
[INFO] |     \- (javax.xml.stream:stax-api:jar:1.0-2:compile - omitted for duplicate)
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-xml:jar:2.9.10:compile
[INFO] |  +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.10:compile
[INFO] |  |  +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  |  +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  |  \- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.10:compile - omitted for conflict with 2.10.0.pr1)
[INFO] |  +- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] |  \- com.fasterxml.woodstox:woodstox-core:jar:5.3.0:compile
[INFO] |     \- (org.codehaus.woodstox:stax2-api:jar:4.2:compile - omitted for conflict with 3.1.1)
[INFO] +- org.freemarker:freemarker:jar:2.3.21:compile
[INFO] +- com.google.guava:guava:jar:27.1-jre:compile
[INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:2.5.2:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] |  +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] +- stax:stax-api:jar:1.0.1:compile
[INFO] +- org.processing:core:jar:3.0b6:compile
[INFO] \- com.tdunning:t-digest:jar:3.0:compile

Build a server that gives out records in real-time

The idea is that you would register a schema with a startSampler call. This call would return an ID.

Then you could ask for samples with getSample. An alternative could be getRealTimeSample which would delay the next sample until enough time has passed that a specific field has a time in the past. You would pass in the ID you got from startSampler. Each sampler should have a buffer so multiple clients can request samples in parallel and each would wait for the next sample in turn (for real-time samples). In any case, thread safety constraints should be observed ... some samplers might be run in many threads.

Eventually, you would call stopStream to free up resources.

An alternative API would have one call startStream where you would give a schema and any realtime constraint and results would be streamed back. This is safer in terms of deallocating resources automatically but slightly more complex.

Add scripted post processor

This can be handy to transform or restructure or filter data. A common use is to run a model against parameters selected by the main sampling.

Schema to generate nested JSON

Hello, I would like to know if there is any class/schema which can be used to generate nested JSON (sample as below)

Example :

{
"name": {
"first": "David",
"last": "Joseph"
},
"student": {
"details": {
"school.id": "1010",
"school.name": "Brooklands"

	}
},
"version": "1.0.1"

}

Thanks in advance,
Deepti

When using the "name" class without the required "type" field there should be a clear error

When using the name field in this schema (missing the type field in the name class):

[
  {"name": "id", "class": "id"},
  {"name": "name", "class": "name"}
]

log-synth throws a NullPointerException:

log-synth(master ✗) java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar com.mapr.synth.Synth -schema users-schema-test.json -format CSV
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at com.mapr.synth.Synth.main(Synth.java:114)
Caused by: java.lang.NullPointerException
    at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:80)
    at com.mapr.synth.samplers.NameSampler.sample(NameSampler.java:24)
    at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
    at com.mapr.synth.Synth$ReportingWorker.generateFile(Synth.java:207)
    at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:171)
    at com.mapr.synth.Synth$ReportingWorker.call(Synth.java:125)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

This should throw an error asking that the type be provided, or better yet, a default type should be provided for you, say FIRST_LAST?

web-log.md is missing

Hi, cool project! I see the README file mentioning a web-log.md but I can't seem to locate it in the repo.

Infinite data stream with volume peaks

I was wondering if it’s possible to run the application constantly, rolling the file every X minutes (in conjunction with fluentd)
with “random” bursts

my aim is to mimic even more realistic data pipelines

Problem flattening data

Hi Ted,

Here's the error I get:

$ ./synth -schema msg/clickstream.json
id,user_id,program_id,timestamp,last_five_actions,action,device,br,la,st,os
Exception in thread "main" java.lang.IllegalArgumentException: Cannot flatten type class com.fasterxml.jackson.databind.node.ObjectNode
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:33)
at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:12)
at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69)
at com.mapr.synth.Synth.main(Synth.java:34)

Here's the schema, which is pretty much just lifted from the example in README.md, except for the fact I'm using the lookup class for the value of the "base" key:

[
{"name":"id", "class":"id"},
{"name":"user_id", "class": "foreign-key", "size": 100000 },
{"name":"program_id", "class": "foreign-key", "size": 125 },
{"name":"timestamp", "class": "date", "format": "yyyy-MM-dd HH:MM:ss.SS", "start": "2014-08-12 00:00:00.00" },

{"name":"last_five_actions", "class": "flatten", "value":
{
"class": "sequence", "length": 5, "base": {
"class": "lookup", "file": "msg/actions.csv"
}
}
},

{"name":"action", "class":"string", "dist":{
"play":21, "stop":19, "pause":16, "ff": 12, "rw": 7, "replay": 2}
},

{"name":"device", "class":"string", "dist":{
"large":25, "phone":45, "tablet":25, "other": 5}
},

{"name":"br", "class":"browser"},
{"name":"la", "class":"language"},
{"name":"st", "class":"state"},
{"name":"os", "class":"os"}
]

I'm proceeding without the lookup stuff, since that produces data that I think is good enough for my purpose.

empty "log" file created

My attempt to run

    java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main 1M log users

results in two empty files, "1M" and "log"

if I add "count" option, it throws NullPointerException

java -cp target/log-synth-0.1-SNAPSHOT-jar-with-dependencies.jar org.apache.drill.synth.Main -count 1M log users
Usage: -count <number>G|M|K [ -users number ] [-format JSON|LOG|CSV ] log-file user-profiles
Exception in thread "main" java.lang.NullPointerException
    at org.apache.drill.synth.Main.main(Main.java:36)

Ubuntu 12.04
Java 1.7.0_45

Generate dependent data

Hi there,
I am trying to generate events data, like below. There are two events : started and ended.
One condition to consider while generating data is that timestamp of start event should be less than end event. How to configure that using log-synth?

{
	"actor": {
		"name": "student"
	},
	"task": {
		"id": "assignment"
	},
	"status": {
		"id": "ended"
	},
	"context": {
		"extensions": {
			"user-id": "11111",
			"username": "XYZ",
			"currentassignmentid": "2",
			"institutionname": "ABC"
		}
	},
	"timestamp": "2016-06-20 03:00:00",
	"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	"version": "1.0.1"
}

{
	"actor": {
		"name": "student"
	},
	"task": {
		"id": "assignment"
	},
	"status": {
		"id": "started"
	},
	"context": {
		"extensions": {
			"user-id": "11111",
			"username": "XYZ",
			"currentassignmentid": "2",
			"institutionname": "ABC"
		}
	},
	"timestamp": "2016-06-20 01:00:00",
	"id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	"version": "1.0.1"
}

Build test failure in ZipSamplerTest

I see this test failure running 'mvn package'. This from surefire-reports folder in the ...ZipSamplerTest.txt file. No other failures.


Test set: com.mapr.synth.samplers.ZipSamplerTest

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.9 sec <<< FAILURE!
testZips(com.mapr.synth.samplers.ZipSamplerTest) Time elapsed: 8.899 sec <<< FAILURE!
java.lang.AssertionError: expected:<-90.88465> but was:<-85.52415105800087>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:441)
at org.junit.Assert.assertEquals(Assert.java:510)
at com.mapr.synth.samplers.ZipSamplerTest.testZips(ZipSamplerTest.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.