Coder Social home page Coder Social logo

netflix / hollow Goto Github PK

View Code? Open in Web Editor NEW
1.2K 377.0 208.0 23.49 MB

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

License: Apache License 2.0

Shell 0.03% Java 99.80% CSS 0.08% JavaScript 0.03% C++ 0.02% Pawn 0.01% Makefile 0.02%

hollow's Introduction

Hollow Logo

Hollow

Build Status Join the chat at https://gitter.im/Netflix/hollow NetflixOSS Lifecycle Download

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access. Read more.

Documentation is available at http://hollow.how.

Getting Started

We recommend jumping into the quick start guide β€” you'll have a demo up and running in minutes, and a fully production-scalable implementation of Hollow at your fingertips in about an hour. From there, you can plug in your data model and it's off to the races.

Get Hollow

Release binaries are available from Maven Central and jCenter.

GroupID/Org ArtifactID/Name Latest Stable Version
com.netflix.hollow hollow 5.1.3

In a Maven .pom file:

    ...
    <dependency>
            <groupId>com.netflix.hollow</groupId>
            <artifactId>hollow</artifactId>
            <version>5.1.3</version>
    </dependency>
    ...

In a Gradle build.gradle file:

    ...
    compile 'com.netflix.hollow:hollow:5.1.3'
    ...

Release candidate binaries, matching the -rc\.* pattern for an artifact's version, are available from the jCenter oss-candidate repository, which may be declared in a build.gradle file:

    ...
    repositories {
        maven {
            url 'https://dl.bintray.com/netflixoss/oss-candidate/'
        }
    }
    ...

Get Support

Hollow is maintained by the Platform Data Technologies team at Netflix. Support can be obtained directly from us or from fellow users through Gitter or by opening an issue in this project.

Generating the Docs

To view the docs locally you can just make site-serve, this will start the MkDocs server at http://127.0.0.1:8000/. You can also run make site-build to build the site locally and make site-deploy to deploy it to Github.

MkDocs runs with python, the Makefile via the venv task should take care of setting the Python's virtualenv for the site tasks. It does assume that virtualenv is available as a command and it also assume that we are targeting python3. Installing Python3 is out of the scope, check your OS package manager. For example, in Mac you can use homebrew to install python3 or anaconda3.

LICENSE

Copyright (c) 2016 Netflix, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

hollow's People

Contributors

aburghelea avatar aidarko avatar akhaku avatar berngp avatar blekit avatar brettwooldridge avatar danielthomas avatar davesu avatar dkoszewnik avatar duro1 avatar dvehar avatar eduardoramirez avatar helmsdown avatar jorel-master-of-scheduling avatar kination avatar kineshsatiya avatar lkancode avatar locasity avatar mbrilnetflix avatar michaelmuske avatar nayanika-u avatar paulsandoz avatar piotrauguscik avatar rpalcolea avatar ryangardner avatar stevenewald avatar sunjeet avatar vanjadardic avatar workeatsleep avatar zielu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hollow's Issues

Pinning Consumers - integration

Hi @toolbear,

I was wondering if there are plans to introduce the "Pinning Consumers" mechanism as a default in Hollow.

I think it would be nice if we could know when a consumer is pinned and expose this via the metrics or the API. something like consumer.isPinned() and/or consumer.getPinnedVersion(). This could help to triage issues when you have tens/hundreds of consumers with the same dataset but because of the nature of eventual consistency, a few of them could be stuck.

I don't have a clear idea on how this could be accomplish, my guess is that the AnnouncementWatcher could be modified to support the pinning mechanism and HollowConsumer expose the information, similar to:

public long getCurrentVersionId() {
        return updater.getCurrentVersionId();
}

Something like

public long getPinnedVersion() {
        return announcementWatcher.getPinnedVersion();
} 

It would be up to the users to store the pinned version in the announcementWatcher or read it from their database/blob storage every time they need this. getPinnedVersion should be another abstract method in AnnouncementWatcher.

From the HollowProducer perspective, I think Hollow could provide a versionPinner in the builder that takes an VersionPinner (new interface). e.g.

HollowProducer producer = HollowProducer.withPublisher(new FakeBlobPublisher())
                                                .withAnnouncer(new MyAnnouncer(tmpFolder))
                                                .withVersionPinner(new MyVersionPinner());

Then, probably the HollowProducer could have a pinVersion method that takes the version number as an argument and invokes versionPinner.pin(long pinnedVersion). This way users could programmatically pin version from the producer either by background jobs based on their own business rules or as simple as expose the pinVersion as a HTTP endpoint.

Also it could have the unpin.

thoughts?

APIGenerator NPE

how.hollow.consumer.api.APIGenerator.java occur NPE when listFiles return null value.
(apiCodeFolder doen't exist)

for(File f : apiCodeFolder.listFiles())
     f.delete();

maybe add some code like this

if (!apiCodeFolder.exists()) {
     apiCodeFolder.mkdirs();
}

Sorry, this issue is not on this project, I posted on wrong project :-), pls invalid it!

Access previous data in cycle run that have not published yet

I'm facing an issue related to using Hollow. Because I'm working on legacy code and changing the current in-house caching system with Hollow, the current caching logic is very long and complicated, so while caching data, it accesses data from previous steps to calculate data for next step. I need to access the previous data I have put to Hollow but have published yet. Can I do this? I know I can organize code to achieve this but this source code does not have any unit test and I'm don't want to change so much. Anyone know about this please help me.

Thanks

Warn or fail when @HollowHashKey is used on unsupported types

The @HollowHashKey annotation is supported on Map and Set types only. However, it's possible to annotate any type and the client API generator will silently ignore it. As it seems reasonable from examples in the docs to, say, apply it to a List this can result in confusion or churn later than need be. At least generate a warning when detecting the annotation on a type other than Map or Set, perhaps with it being a hard failure as an optional config (or fail by default with a mechanism to opt-out).

Don't trigger dataUpdated event when 0 sized delta applied.

I store a state in my application. I want to update it only when the data changes. My initial intention was to use HollowUpdateListener and its dataUpdated event. Then update the state only when the event is triggered but it turns out that the event doesn't depend on data content itself and gets triggered all the time. I would expect that when Hollow applies 0 size delta (doesn't change anything) when it won't trigger it. Any other ways how to know that delta doesn't change data?

Unexpected/incorrect callback order for HollowConsumer.RefreshListener during init

We have a RefreshListener registered with the HollowConsumer. We are implementing snapshotUpdateOccurred() and deltaUpdateOccurred(), whereby we act upon add/modify/delete of objects based on the populatedOrdinals/previousOrdinals.

When starting from a clean state (no objects in hollow), everything works as expected. The server produces a snapshot and announces it, then the client processes it and calls snapshotUpdateOccurred(). The next change on the server produces a delta, which is announced, and the client processes it and calls deltaUpdateOccurred(). So far, so good.

The trouble occurs when we restart the client.

We construct the HollowConsumer and call triggerAsyncRefresh(). In order to restore the state from the locally persisted files, the consumer ends up calling HollowDataHolder.applyInitialTransitions(). That method ends up applying all of the transitions, starting with the snapshot then proceeding through the deltas, and only after all of that calls snapshotUpdateOccurred() with the final version arrived at (which is actually a delta, not a snapshot, btw).

At that point, we have missed our chance to process the transitions via populatedOrdinals/previousOrdinals, because the previousOrdinals only contains those of the final transition.

What we need is for the RefreshListener to be called for each transition, just as if they were occurring at runtime.

Feedback on experimental Producer API

@toolbear I thought I'd offer some thoughts on the new Producer API under development.

It looks pretty good in general except for one bit which is a show-stopper for my company's use. We persist [snapshots and deltas] in a database, not in files. So, the fact that Blob is a concrete class prevents the use of the new API.

If Blob was and interface, and Blob.withNamespace() replaced with a call to a user configurable factory, that would be perfect. Of course, it is perfectly reasonable to default to an implemention that is file-backed.

Maybe the existing HollowProducer constructor could utilize the default implementation, and a new constructor with one additional parameter (HollowProducer.BlobFactory) added? Something along the lines of:

public HollowProducer(
            Publisher publisher,
            Validator validator,
            Announcer announcer) {
    this(publisher, validator, announcer, new FileBackedBlobFactory());
}

public HollowProducer(
            Publisher publisher,
            Validator validator,
            Announcer announcer,
            BlobFactory blobFactory) {
    ...
}

I think, having not actually been able to utilitize the new API, that this may be the only change required to support our use-case.

HollowIncrementalProducer - Retry cycles [proposal]

Hi @toolbear,

We were doing some extra work with incremental producer. Currently, when a cycle fails... either by populate, announce or publish (https://github.com/Netflix/hollow/blob/master/hollow/src/main/java/com/netflix/hollow/api/producer/HollowProducer.java#L423), the incremental producer bubbles the Exception and the mutations ConcurrentHashMap keeps the objects for the next cycle. In the next cycle, you will likely pick up the new objects + failed objects in previous cycle.

While this is great, we were thinking in a scenario where your blob storage (S3 or GCS for this particular case) is down or with incidents.. we have seen incidents up to 3 hours last year, wouldn't be nice to have a retry logic option? This way if a cycle fails for a particular exception type, you could retry publishing your snapshot or delta. The main driver for this is to prevent mutations from growing for hours if something is wrong with the blob storage or just in between failed cycles.

Something along the lines of:

HollowIncrementalProducer.withProducer(producer)
   .withRetryConfig(myRetryConfig)
   .withThreadsPerCPU(1.0d)

where RetryConfig:

class RetryConfig {
   boolean enabled 
   long timeBetweenRetries
   int numberOfRetries
}

Then runCycle (https://github.com/Netflix/hollow/blob/master/hollow/src/main/java/com/netflix/hollow/api/producer/HollowIncrementalProducer.java#L76) could have some logic to read the RetryConfig and setup something simple as retries with sleep and 'x' number of retries.

While it would be nice to make it only for particular cases like publishing issues, Hollow bubbles only a RuntimeException (no sub-class). In order to avoid refactoring the runCycle internals or introducing custom exceptions (which I think could be nice in the future), this could be achieved only by bubbling the RuntimeException and catch it.

While all of this could be achieved outside of Hollow while wrapping the incrementalProducer.runCycle() in a retry logic block, we think it could be useful for others to provide this in the incremental producer.

Please let me know your thoughts

cc @adamkeyser

Compilation of generated files fails if data model object has `Byte`

If you try and generate an API for object that contains a Byte, the generated files to do not compile. Any generated files that to try reference the type (eg HashIndex.java) conflict between my.generated.package.Byte and java.lang.Byte, since we import my.generated.*.

A couple ways to fix this:

  • have similar treatment for Byte as we do for String, Integer, etc (include them in com.netflix.hollow.core.type and replace with HByte) #172
  • import the generated Byte classes directly, instead of using a star import (WIP)

IncrementalProducer doesn't delete orphans with new Types in deltas [bug/question?]

HollowIncrementalProducer doesn't cleanup orphan objects if the type was not present in the first snapshot (introduced a new Type in a delta).

turns out HollowReadStateEngine.typeStates doesn't add a type once you write a delta, only when snapshot.

here is wehre type states are added

protected void addTypeState(HollowTypeReadState typeState) {

and looks like

private void populateTypeStateSnapshot(DataInputStream is, HollowTypeReadState typeState) throws IOException {

So any time the incremental producer tries to delete orphan objects for types that are added in a delta, using:

HollowTypeReadState typeState = readState.getStateEngine().getTypeState(key.getType());

to retrieve the type and determine which ordinals to remove returns a null type.

I was thinking on adding stateEngine.addTypeState(typeState) in

typeState.applyDelta(is, schema, stateEngine.getMemoryRecycler());
but found out that multiple instances of HollowReadStateEngine are created and HollowIncrementalCyclePopulator it's using one that doesn't have the new types.

Any suggestion for fixing this bug? I think I'm a little lost on this portion of HollowReadStateEngine. While an initial snapshot should contain most of the possible hollow types, there could be a scenario where a new type is added later in a delta chain.

Compiler errors in IDE for Unsafe

In the hollow subproject I get a handful of compiler errors regarding the use of sun.misc.Unsafe. For example:

Access restriction: The type 'Unsafe' is not API
Access restriction: The method 'Unsafe.getLong(Object, long)
…

For me this happens after a pristine import into Eclipse. It may fail for other IDEs.

Automatic directory creation when generating the API

If I call the API generator for a sub package - the directory isn't generated.

E.g.

 String directory = "src/main/java/fortyrunner/generated";

  HollowAPIGenerator generator =
    new HollowAPIGenerator(
        "PrimaryTypes",                    /// A name for the API
        "fortyrunner.generated",  /// A package where the API will live
        writeEngine                    /// our state engine
      );
   generator.generateFiles(directory);

What's more, the generateFiles method throws an exception.

Exception in thread "main" java.io.FileNotFoundException: src/main/java/fortyrunner/generated/PrimaryTypes.java (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
	at java.io.FileWriter.<init>(FileWriter.java:90)
	at com.netflix.hollow.api.codegen.HollowAPIGenerator.generateFile(HollowAPIGenerator.java:130)

I can easily fix this by saying:

    String directory = "src/main/java/fortyrunner/generated";

    HollowAPIGenerator generator =
      new HollowAPIGenerator(
        "PrimaryTypes",                    /// A name for the API
        "fortyrunner.generated",  /// A package where the API will live
        writeEngine                    /// our state engine
      );

    Files.createDirectories(Paths.get(directory));

    generator.generateFiles(directory);

It's just a bit irritating if I want to write into a sub-package.

I may be missing something of course!

ArrayOutOfBoundException: -1 when searching by Hash Index

Hey guys! Would like to share with you my issue.

Description:

When I try to use hash index and then search by it I get ArrayOutOfBoundException

It is based on this sources

public class HollowHashIndexTest extends AbstractStateEngineTest {

index example:

HollowHashIndex index = new HollowHashIndex(readStateEngine, "TypeB", "", "b1.value");

Stack trace:

java.lang.ArrayIndexOutOfBoundsException: -1

	at com.netflix.hollow.core.memory.encoding.FixedLengthElementArray.getElementValue(FixedLengthElementArray.java:94)
	at com.netflix.hollow.core.memory.encoding.FixedLengthElementArray.getElementValue(FixedLengthElementArray.java:85)
	at com.netflix.hollow.core.read.engine.object.HollowObjectTypeReadStateShard.readString(HollowObjectTypeReadStateShard.java:216)
	at com.netflix.hollow.core.read.engine.object.HollowObjectTypeReadState.readString(HollowObjectTypeReadState.java:188)
	at com.netflix.hollow.core.read.HollowReadFieldUtils.fieldValueEquals(HollowReadFieldUtils.java:193)
	at com.netflix.hollow.core.index.HollowHashIndex.matchIsEqual(HollowHashIndex.java:165)
	at com.netflix.hollow.core.index.HollowHashIndex.findMatches(HollowHashIndex.java:103)
	at com.netflix.hollow.core.index.HollowHashIndexTest.testIndexingColonIncludedValuesThrowsArrayOutOfBound(HollowHashIndexTest.java:95)

Unit test:

I've created unit test in latest sources to reproduce it:

    @Test
    public void testIndexingColonIncludedValuesThrowsArrayOutOfBound() throws IOException {
        HollowObjectMapper mapper = new HollowObjectMapper(writeStateEngine);

        final String b1ContainsColon = "one:";
        mapper.add(new TypeB(null));

        roundTripSnapshot();

        HollowHashIndex index = new HollowHashIndex(readStateEngine, "TypeB", "", "b1.value");
        index.findMatches(b1ContainsColon);
    }

Observations:

  • if u delete colon symbol from b1ContainsColon variable the test passes;
  • if u fill new TypeB("some-value") with some value problem is gone as well.

Let me know if u need any other details.

Transient variables are not ignored

Can support for ignoring transient variables be added? Maybe a new annotation (@HollowTransient) is needed.

We have a Kotlin class that looks something like this:

data class Monitor(
   var instanceId : Int = -1,
   var setId : Int = -1,
   var hostId : Int = -1,
   var monitorName : String = "",
   var monitorType : String = "",
   var json : Map<String, Int> = emptyMap(),
   @Transient
   var isAccessLimited : Boolean = false)

Currently, isAccessLimited is generated into the model:

String {
	string value;
}
Integer {
	int value;
}
MapOfStringToInteger Map<String,Integer> @HashKey(value);
Monitor {
	int instanceId;
	int setId;
	int hostId;
	String monitorName;
	String monitorType;
	MapOfStringToInteger properties;
	boolean isAccessLimited;
}

producer restore (question)

After doing a restore on the producer side, is there anyway to get the data from the producer (back into my domain objects)?

For example, if my app allows CRUD operations on a List<Person> that it holds in memory. And my producer writes that out (upon any change to my List<Person> I give the list to hollow to write/produce). All is fine.

But now, when I restart my app, I do a producer restore, so the producer is back up-to-date, but is there anyway to load my List<Person> back up? Or do I have to create a consumer (in my producer project) that can consume things back (only on startup) just to get my List<Person> hydrated again.

Because if I don't load my List<Person> back up, the next time a CRUD operation occurs, I'll basically be wiping out my data when I give the List<Person> to the producer to write/produce.

Maybe that is a stupid question and makes no sense, but hopefully not...

Indices: better error when attempting to find matches with null keys

Given HollowHashIndex or a HollowMap calling findMatches or findKey/findValue respectively while passing in a null key value ultimately throws NullPointerException in SetMapKeyHasher. This is deep into internals and the NPE doesn't provide much context into which fieldpath was null.

Do better:

  • throw NPE sooner, as soon as we detect null and know which fieldpath it corresponds to
  • include the fieldpath in the message of the NPE
  • bonus: detect all nulls and include a list of the fieldpaths in the message of a single NPE thrown

Ultimately we should support finding matches by null keys, but that's a larger change.

Delta / Snapshot cleanup [question]

I brought this last month on Gitter:

Hi, curious about how people is managing historical of snapshots.
We’ve been running a hollow producer for about 2 months and hit 1.27 TiB of snapshots.

On that time 2 of the chat members replied with some ideas to clean this up: 3 lifecycle after few days of use or lambda on a timer to manage our s3 with a ttl on last updated and keeping the newest few snapshots.

While I like the idea of having something external like S3 lifecycle or a Google Cloud Function, there's always the risk of losing files since this might not be aware of the usage by Hollow.

I was wondering is there is a desire to have something built in the hollow producer to remove old deltas and snapshots. Allow the producer to remove files from the blob storage every time there's a new snapshot or delta and let users to set the numbers of files to keep or just let them plug their own implementation so they can determine if they want to keep 'x' number of files or based on dates.

Something in the lines of BlobStorageCleaner.

MyBlobStorageCleaner blobStorageCleaner = new MyBlobStorageCleaner();
HollowProducer producer = HollowProducer.withBlobStorageCleaner(blobStorageCleaner);

The BlobStorageCleaner could be

public abstract class BlobStorageCleaner {
   public void clean(Blob.Type blobType) {
        switch(blobType) {
        case SNAPSHOT:
            cleanSnapshots();
            break;
        case DELTA:
            cleanDeltas();
            break;
        case REVERSE_DELTA:
            cleanReverseDeltas();
            break;
        }
    }
   abstract void cleanSnapshots();
   abstract void cleanDeltas();
   abstract void cleanReverseDeltas();
}

And this could be triggered after fireArtifactPublish is triggered -> https://github.com/Netflix/hollow/blob/master/hollow/src/main/java/com/netflix/hollow/api/producer/HollowProducer.java#L476

The idea would be that developers could implement their own BlobStorageCleaner and decide how they want to remove their snapshots, deltas and reverse deltas. Perhaps you want to clean snapshots but you don;t want to clean deltas. Up to them

Data Model with inheritence

Hello,

  1. if i extend Movie class with some abstract class which has primary key column name id, how do i retrieve the data at the consumer using primary key? , even if i try to retrieve all movies , i do not see _getID in MovieHollow object.

  2. is there way to convert Hollow objects back to data model objects on consumer?

Ignore Set/Map reordering in Diff and History UIs

Reordered elements in a Set or Map will register as a difference, but then are presented in the UI as having no actual differences. Don't count Set or Map reordering as a difference at all to reduce noise in the diff UIs.

Hollow Object to JSON directly

Can anyone help me with this issue? I would like to serialize the object I get from Hollow to JSON directly instead of converting it to POJO class and use Jackson to serialize it.
Or do we have any way to get the POJO class after findMatch() method?

Thank you so much

StackOverflow in GenericHollowIterable

When I tried using Iterable returned from objects method in GenericHollowSet I got SOError. Returned GenericHollowIterable in next() operation calls its own method, not passed iterator's one. I was using Generic Object API.

HollowIncrementalProducer - Execution Stats [Proposal]

Hi @toolbear

While working with HollowIncrementalProducer, I found that it would be nice if the result of runCycle had more information than the version. I believe users could use some stats like recordsAddedOrModified and recordsRemoved

What are your thoughts on introducing a HollowIncrementalProducerExecution object that would look like this (or HollowIncrementalProducerCycleResult?):

class HollowIncrementalProducerExecution {
   long version
   long recordsAddedOrModified
   long recordsRemoved
   long timestamp
   Status status // SUCCESS, FAIL -> could be a boolean too
}

This would help users to set some metrics or alerting based on the behavior. While Validators are a potential candidate for this there are two blockers for this:

  1. The validators are triggered by Validator.validate which is a void, so in case you need metrics, you need to bake your metrics or alerting into the validator (not a problem of course).

  2. Validator only has access to the ReadState so the detail on added/modified gets lost, while you can determine how many were added or deleted based on cardinality, the modifications detail is not available or probably yes?, only the incremental producer could know the number of adds/modified and deletes on a easy fashion because it's responsible for modifying the mutations object.

  3. I don't think a Validator should be implemented just for getting metrics on the dataset.

Don't know if this is something that you could see as useful for Hollow users. We do expose a lot of metrics, we currently track how many objects we add/modify/delete on each cycle. While the validators are helpful to fail a cycle if we want to drop for example... 3% of our data, the metrics are helping us to visualize how our dataset evolves over time.

thoughts?

Confuse on using HollowIncrementalProducer

Hi @dkoszewnik
Can you help me explain this case?
I've tried to rewrite addOrModify method in HollowIncrementalProducer like this

public void addOrModify(Object obj) {
    RecordPrimaryKey pk = newState.getObjectMapper().extractPrimaryKey(obj);
    int oldOrdinal = getOrdinalRecord(newState.getPriorState().getStateEngine(), pk);

    int newOrdinal = newState.add(obj);
    if (oldOrdinal != -1 && newOrdinal != oldOrdinal) {
        delete(pk);
    }
}

Just run deleting at the final step.
Meaning I want to avoid store objects in mutations map because it cost a lot of memory. But when I did that, the data is messed up like fields become a primary key. Could you help me explain this or do you have a solution to avoid store so many objects in mutation map?

Thanks

HollowIncrementalProducer - include first cycle [proposal]

While HollowIncrementalProducer serves of a purpose of delta processing, it requires a existing state in order to work.

Wonder if there is a desire for allow creating a HollowIncrementalProducer that also supports writing the first state if no previous available.

What we do in our side, is basically run a cycle with another Populator what only adds objects to the state:

This is an example on our side, ParallelExecution is irrevelant for this example.

public class CyclePopulator implements HollowProducer.Populator {

  private final Collection<Object> objList;
  private final ParallelExecution parallelExecution;

  CyclePopulator(Collection<Object> objList) {
    this(objList, 1.0d);
  }

  CyclePopulator(Collection<Object> objList, double threadsPerCpu) {
    this.objList = objList;
    this.parallelExecution = new ParallelExecution(threadsPerCpu);
  }

  @Override
  public void populate(HollowProducer.WriteState newState) throws Exception {
    parallelExecution.execute(objList, (Object obj) -> newState.add(obj));
  }
}

We basically decide if we use CyclePopulator or HollowIncrementalCyclePopulator based on the existence of a previous state while restoring or not.

Hollow Metrics

Hi,

Currently we’re running background jobs with quartz or simple future scheduling on spring and ratpack to get metrics such as domain count object, heap used, refresh failed and so on and expose them via Prometheus. This has helped us to build grafana dashboards (screenshot attached)

screen shot 2017-08-06 at 8 36 52 pm

I'm opening this issue because I think it would be nice to have metrics out of the box from hollow and to see if the following could fit on your design principles and contribute with it:

  1. Introduce a HollowMetrics object as part of HollowClientUpdater to keep metrics for things such as domain object count by type (readEngine.typeStates), heap usage (typeState.getApproximateHeapFootprintInBytes()), current version, refresh succeeded/failed. Every time a refresh fails or it's completed, the metrics would be updated.

  2. Introduce a HollowMetricsCollector interface: This could be part of the Consumer API ex. HollowConsumer.withMetricsCollector(myMetricsCollector). Provide this as part of the HollowClientUpdater and invoke a method ex. collect when an update happens (refresh fails or succeeds -

    /
    refreshListener.refreshFailed(beforeVersion, getCurrentVersionId(), version, th);
    ). This way, users could include their own mechanism to use those metrics such as writing to logs or exposing via JMX but that shouldn't be part of Hollow.

All of the above could be part of Hollow core, in addition to this I think it would be nice to have a hollow-metrics module (similar to hollow-ui-explorer and others). This module could have default implementations for HollowMetricsCollector using Micrometer (http://micrometer.io/)... think SLF4J, but for application metrics. Micrometer allows to expose metrics via Prometheus, datadog, atlas, influx, graphite and others. This could be part of a second iteration.

your thoughts?

Switch public APIs from File to Path

  • for all methods that accept java.io.File create an overload that accepts java.nio.Path
  • mark java.io.File methods as @Deprecated, use file.asPath() to delegate to new methods
  • use idiomatic Path and Paths code for opening files for reading/writing, interacting with directories, etc.

Stack Overflow 'followed tag' is too generic.

Hi, the Hollow guidelines mention Stack Overflow questions tagged 'hollow' will be monitored. Unfortunately this doesn't really follow the naming scheme for platform-specific products or the guidelines for overly-broad tags. I have replaced the only two questions with the hollow tag with the more detailed 'netflix-hollow' tag; please update your documentation to reflect this change.

And also please recommend an edit to the tag and tag-wiki so that they can contain descriptions for those using the tags.

HollowAnnouncementWatcher should implement an interface

Currently, the HollowClient constructor accepts an instance of HollowAnnouncementWatcher. Because HollowAnnouncementWatcher is an abstract class that unconditionally constructs an internal Executors.newFixedThreadPool instance, this does not scale to deployments where there are hundreds or thousands of HollowClient instances in a single VM.

While it would be ideal if HollowAnnouncementWatcher were merely an interface, like HollowBlobRetriever and other types passed to the HollowClient constructor, that would be a breaking change for existing code.

Maybe introduce a new interface type, have HollowAnnouncementWatcher implement that type, and have the HollowClient accept that type. That would preserve compatibility for existing clients, while allowing alternative implementations.

EDIT: For context, the use case is multi-tenancy, where each tenant (possibly thousands) has their own data model instances with independent namespaces, snapshots, and delta chains.

HollowIncrementalProducer refactor [proposal]

I'd like to take a shot at refactoring the HollowIncrementalProducer implementation in order to accomplish two things:

  1. remove the mutations field in order to avoid storing objects multiple times in memory
  2. modify the runCycle method so that it matches the usage pattern of the regular HollowProducer.runCycle

Date issue in generated api

added java.util.Date to Movie object, and started producer, i am getting the below error.

java.lang.IllegalArgumentException: Attempting to write unexpected class! Expected class sun.util.calendar.BaseCalendar$Date but object was class java.util.Date
at com.netflix.hollow.core.write.objectmapper.HollowObjectTypeMapper.write(HollowObjectTypeMapper.java:117)
at com.netflix.hollow.core.write.objectmapper.HollowObjectTypeMapper$MappedField.copy(HollowObjectTypeMapper.java:275)

Build tools plugin for generating consumer api

UPDATE
Build plugins are developing as a standalone projects:

Hi! When I was generating consumer api, the first thought I had - there must be a gradle/maven plugin for doing that. I think that executing or embedding java code to generate part of source code is not very convenient. So if the team will find this useful too, I would like to participate and create them.

Plugin will be configurable with:

  • apiClassName
  • packageName
  • packagesToScan

I see plugin's core part something like that:

HollowWriteStateEngine writeEngine = new HollowWriteStateEngine();
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);
Collection<Class<?>> datamodelClasses = retrieveClasses();
for (Class<?> clazz : datamodelClasses) {
    mapper.initializeTypeState(clazz);
}
String directory = buildDirectoryPath(packageName);
new HollowAPIGenerator(apiClassName, packageName, writeEngine).generateFiles(directory);

The main design question is how to indicate which classes should be added (how retrieveClasses method will collect them), I see 3 possible ways to do that:

  • specify packagesToScan in buildscript and task will add all classes from these packages
  • same as previous, but add only classes that is annotated with some new annotation - @HollowEntity ?
  • annotations only

I don't like the 3rd one at all, 1st one seems the easiest, but may be not very comfortable for end-users, I like the 2nd one, but it needs new annotation, which should be placed in com.netflix.hollow:hollow artifact. I guess there might be some restrictions about this option.

What do you think about it?

Reading deltas: Not an issue just a question

Hi,

I would like to read only deltas. I have created a custom refresh listener class where I could get delta updates.

Using blob retriever I could get a input stream on delta file. But am unable to parse that. Would you have any idea on how to do that?

I created a reader and then applied delta on a state engine but that still gives me all the records together. My objective is to get only deltas. Would like to know if there is any way to do that.

Thanks

It's not possible to start a consumer application when blob store directory is empty

When you are trying to start an application and create indexes and your blob store dir is empty the following exceptions are thrown:

  • for primary key index:
java.lang.NullPointerException
	at com.netflix.hollow.api.client.HollowClientUpdater.getStateEngine(HollowClientUpdater.java:166)
	at com.netflix.hollow.api.consumer.HollowConsumer.getStateEngine(HollowConsumer.java:246)
	at [your-api-package].[IndexName]PrimaryKeyIndex.<init>([IndexName]PrimaryKeyIndex.java:15)
  • for hash index:
java.lang.NullPointerException
	at com.netflix.hollow.api.client.HollowClientUpdater.getAPI(HollowClientUpdater.java:170)
	at com.netflix.hollow.api.consumer.HollowConsumer.getAPI(HollowConsumer.java:260)
	at [your-api-package].[ApiName]APIHashIndex.<init>([ApiName]APIHashIndex.java:25)

Both exceptions are related to HollowClientUpdater#hollowDataHolder field which is null in that case.

Detecting changes

I know that I can figure out ordinals that were added or removed on the client by intersecting the BitSets, etc. But how do I find out what Hollow objects changed but retained the same ordinal? I need to act on all changes, and iterating all of the objects and comparing it to the operational state that is based off of them does not scale.

NullPointer - Hollow History UI

Hi guys, I'm getting a NullPointerException when I access Hollow History UI.

I'm using hollow-reference-implementation (hollow lib version: 2.6.8).

Complete stacktrace:

java.lang.NullPointerException at com.netflix.hollow.tools.history.keyindex.HollowHistoryTypeKeyIndex.queryIndexedFields(HollowHistoryTypeKeyIndex.java:218) at com.netflix.hollow.history.ui.pages.HistoryQueryPage.typeQueryKeyMatches(HistoryQueryPage.java:65) at com.netflix.hollow.history.ui.pages.HistoryQueryPage.setUpContext(HistoryQueryPage.java:47) at com.netflix.hollow.history.ui.pages.HistoryPage.render(HistoryPage.java:54) at com.netflix.hollow.history.ui.HollowHistoryUI.handle(HollowHistoryUI.java:151) at com.netflix.hollow.history.ui.jetty.HollowHistoryHandler.handle(HollowHistoryHandler.java:38) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745)

Everything works fine with Hollow Explorer UI.

Maintain CHANGELOG.md

Can we start updating the CHANGELOG.md at each release?

It would be really nice to know what was changed so users can make an informed decision about upgrading without reading the commit history. I've got a change log on a one-man-project running for several years, updating it at release is not a barrel of fun but typically only takes 5-10 minutes.

HollowIncrementalProducer question

I have a question about the HollowIncrementalProducer. I'll give an example to use as the basis of my question; albeit somewhat artificial.

Given these classes...

class Movie {
   int movieId;
   String title;
   Set<Actor> actors;
}

class Actor {
  int actorId;
  String firstName;
  String lastName;
}

We would like to produce deltas based on incremental changes, rather than fully loading everything from the source of truth every time. We are currently implementing this by hand, using the low-level API -- created before either HollowProducer or HollowIncrementalProducer existed. But we would love to move to the HollowIncrementalProducer due to its simplicity.

The behavior we are looking for is:

  • Deletion of an Actor results in the removal from all Movie.actors sets.
  • Deletion of a Movie removes both the Movie and its actors set, but does not remove the referenced Actor objects if they are referenced by other Movies.
  • Ideally, but not required, is the removal of an Actor if it is no longer referenced by any Movie (i.e. no orphans).

Is this scenario currently supported by the HollowIncrementalProducer?

Problem with BigDecimal serialization

It looks like serialization of BigDecimal fields is corrupted. Since BigDecimal, for not inflated values, stores its value in transient field, it is not serialized when added to objectMapper. Is it possible to somehow add custom serialization of particular types or rather we should define field in class with different type?

Hollow Documentation

Are there plans to open source the docs in http://hollow.how? It would be nice to contribute with things like the incremental producer, google cloud storage and other examples

NPE for fields with types: Collection, Iterable etc

I'm not sure what should be correct behaviour in this case, but NPE is there worst one - you have to debug application to find our what was the reason. Maybe just a IllegalArgumentException with information that only List, Set and Map types are supported. Or maybe you can storage all Collections or Iterables the same way as List is serialized.

Delta chain - when announcement fails - consumer gets confused [bug/question]

Hi @toolbear

Writing some tests around scenarios that we could face, for example when announcement fails for a delta state:

  def 'incremental failure on announcer'() {
    setup:
    config.producerConfig.incremental = true
    HollowProducer.Announcer spyAnnouncer = Spy(config.producerConfig.announcer.get().class, constructorArgs: [config.getDefaultRootDirectory()])
    def localProducer = getProducer(new Module() {
      @Override
      void configure(Binder binder) {
        binder.bind(HollowProducer.Announcer.class).toInstance(spyAnnouncer)
      }
    })
    localProducer.hollowProducer
    consumer = getConsumer()
    List<Map> dataSet = (1..2).collect { createRandomAutomobile() }

    when: 'initial snapshot cycle succeeds'
    List<Automobile> automobiles = dataSet.collect { autoJson ->
      parseAutomobile(autoJson)
    }
    localProducer.runCycle(automobiles)
    consumer.triggerRefresh()

    then: 'data should be there'
    consumer.allDomainObjects.size() == 2

    when: 'delta cycle with fails on announce'
    List<Automobile> redAutomobiles = dataSet.collect { autoJson ->
      autoJson.trim = 'red'
      parseAutomobile(autoJson)
    }
    localProducer.runCycle(redAutomobiles)

    then: 'announcer should throw exception'
    1 * spyAnnouncer.announce(_) >> {
      throw new RuntimeException('ouch') }
    thrown(RuntimeException)

    when:
    consumer.triggerRefresh()

    then: 'consumer should reflect old data and trim should stay yellow'
    consumer.allDomainObjects.size() == 2
    consumer.allDomainObjects.collect { it.getObject('trim') }.unique().first().toString() == 'yellow'

    when: 'delta cycle with addition'
    List<Automobile> purpleAutomobiles = dataSet.collect { autoJson ->
      autoJson.trim = 'purple'
      parseAutomobile(autoJson)
    }
    long version = localProducer.runCycle(purpleAutomobiles)
    consumer.triggerRefresh()

    then: 'addition should be there'
    consumer.allDomainObjects.size() == 2
    consumer.allDomainObjects.collect { it.getObject('trim') }.unique().first().toString() == 'purple'
  }

In this case what we observe is that Hollow will generate deltas from version 20180125180054001 to 20180125180054002 and 20180125180054001 to 20180125180054003. Because 20180125180054002 failed, the 3rd cycle will try to write a delta from "1" to "3".

screen shot 2018-01-25 at 12 01 51 pm

Then if trigger a refresh in the consumer, since the latest announced version is "20180125180054003", it will try to apply that, however, HollowUpdatePlanner determines that the next deltaDestinationVersion should be "20180125180054002"

long deltaDestinationVersion = deltaPlan.destinationVersion(currentVersion);

Looks like destinationVersion in HollowUpdatePlan only knows about 1 transition:

screen shot 2018-01-25 at 12 06 25 pm

In this case when triggerRefresh happens, it will go to version 20180125180054002 and if you do another triggerRefresh it will fail because there is no update plan from 20180125180054002 to 20180125180054003.

Our workaround for now is to restart the producer that way it creates a snapshot once it's restored.

We don't know if this is an issue or if we are doing something wrong. Also, wonder if we should not fail when announcement fails. Wonder if it makes sense to commit a change and then announce?

. Guess the consumers will pick up the "missing delta" on the next announcement since the file should be available.

Wonder if this would also be a use case for HollowStateDeltaPatcherTest

Any thoughts?

`destinationPath` field is not set via `HollowAPIGenerator.Builder()`

destinationPath field is not set via builder. Repro:

    val stateEngine  = new HollowWriteStateEngine
    val objectMapper = new HollowObjectMapper(stateEngine)
    objectMapper.initializeTypeState(classOf[YourClass])
    val file = new File(
      "core/src/main/java/" + "com.adform.dsp.pricing.data.customstate.api.generated".replace('.', '/'))

    val generator = new HollowAPIGenerator.Builder()
      .withAPIClassname("API")
      .withPackageName("com.example.api.generated")
      .withDataModel(stateEngine)
      .withDestination(file.toPath) <-- we set it here
      .build

    generator.generateSourceFiles() <-- but get java.lang.NullPointerException where destinationPath is null

Consumer data TTL?

Hi,
I was wondering if there's a way to add TTL (time-to-live) for the data cached by the consumer? Meaning, let's say the data should be updated every 24h by the producer. If the producer failed, I'd rather the consumers will drop the data, let say after 30h. Is there a way to do that? Can this be easily added?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.