greantech / atomeventstore Goto Github PK

View Code? Open in Web Editor NEW

117.0 117.0 14.0 27.22 MB

A server-less .NET Event Store based on the Atom syndication format

License: MIT License

C# 95.45% F# 4.29% PowerShell 0.07% Shell 0.18%

atomeventstore's People

Contributors

Stargazers

Watchers

Forkers

sitereactor saarcohen josephwoodward drunkybard bsrking alexsugak yreynhout meungblut wrumsby dhilip89 ddd-cqrs-es struthersdan iqmetrix

atomeventstore's Issues

Erratic Test in FifoEventsTests

I just got this unit test error when running all tests in Visual Studio:

Test 'Grean.AtomEventStore.UnitTests.FifoEventsTests.SutYieldsPagedEvents(dummyResolver: Grean.AtomEventStore.UnitTests.TestEventTypeResolver, dummySerializer: Grean.AtomEventStore.XmlContentSerializer, dummyInjectedIntoSut: Grean.AtomEventStore.AtomEventsInMemory, dummyId: urn:uuid:dad86e31-7e00-48e5-9feb-1f0d9c29c247, writer: Grean.AtomEventStore.AtomEventObserver1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX], sut: Grean.AtomEventStore.FifoEvents1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX], eventGenerator: Ploeh.AutoFixture.Generator1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX])' failed: System.InvalidOperationException : Sequence contains no elements at System.Linq.Enumerable.Single[TSource](IEnumerable1 source)
AtomLink.cs(105,0): at Grean.AtomEventStore.AtomLink.ReadFrom(XmlReader xmlReader)
AtomFeed.cs(207,0): at Grean.AtomEventStore.AtomFeed.b__1(XPathNavigator x)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext() at System.Linq.Enumerable.SingleOrDefault[TSource](IEnumerable1 source, Func2 predicate) FifoEvents.cs(62,0): at Grean.AtomEventStore.FifoEvents1.ReadNext(AtomFeed page)
FifoEvents.cs(36,0): at Grean.AtomEventStore.FifoEvents1.<>c__DisplayClass2.<GetEnumerator>b__0() at System.Threading.Tasks.Task1.InnerInvoke()
at System.Threading.Tasks.Task.Execute()

Re-running the tests produced an all green result, so this indicates an Erratic Test (or, a subtle bug somewhere in AtomEventStore).

Pull ConventionBasedSerializerOfComplexImmutableClasses out of AtomEventStore

The ConventionBasedSerializerOfComplexImmutableClasses IContentSerializer has too many implicit assumptions to be a robust part of AtomEventStore itself.

It should be pulled out of AtomEventStore, and into a separate satellite library.

Possibility of using custom URNs

Hello!

I am currently working on a project where I would like to have an event stream per "subject" (or "topic") which is identified by its unique name. The topic itself doesn't have any additional information except its events.
So, naturally, I would like to have this name as an Id of the event stream.

If forced to use uuid IRIs I will have to add additional storage for storing topic name-uuid mappings.
Which means additional step for creating new "topic" as well as additional lookup (to read topic's uuid) when reading events.
So I was thinking maybe I could use custom URNs, e.g. urn:x-mycustomnid:topicname

Do you think this is a valid use case? Is it a good idea to have AtomEventStore support custom URNs?

TypeResolutionTable

Whenever I use DataContractContentSerializer or XmlContentSerializer, I find myself implementing the required ITypeResolver in more or less the same way:

public class UserTypeResolver : ITypeResolver
{
    public Type Resolve(string localName, string xmlNamespace)
    {
        switch (xmlNamespace)
        {
            case "urn:grean:samples:user-sign-up":
                switch (localName)
                {
                    case "user-created":
                        return typeof(UserCreated);
                    case "email-verified":
                        return typeof(EmailVerified);
                    case "email-changed":
                        return typeof(EmailChanged);
                    default:
                        throw new ArgumentException(
                            "Unknown local name.",
                            "localName");
                }
            default:
                throw new ArgumentException(
                    "Unknown XML namespace.",
                    "xmlNamespace");
        }
    }
}

It would be nice with a reusable class where you can simply supply the values; something like this would be nice:

var resolver = new TypeResolutionTable(
    new TypeResolutionEntry("urn:grean:samples:user-sign-up", "user-created", typeof(UserCreated)),
    new TypeResolutionEntry("urn:grean:samples:user-sign-up", "email-verified", typeof(EmailVerified)),
    new TypeResolutionEntry("urn:grean:samples:user-sign-up", "email-changed", typeof(EmailChanged)));

TypeResolutionTable should obviously implement ITypeResolver.

Concurrent writers for Azure Blob Storage

First off, let me start by noting that, yes, this suggestion is seemingly at odds with the lock-free design goal, but please bear with me a bit before discarding the idea entirely.

The thing is that, for some use cases, the Single Writer limitation hinders the scalability of the design: The writer process may not be able to keep up with the rate of updates in the system.

One such case is a not-entirely-fictional multi-tenant infrastructure SaaS system operated by Grean. We have a situation with a high enough frequency of temporally concurrent updates for the writer to keep up with, but at the same time, the per-event-stream contention frequency is quite low (but not ignorable).

This has the interesting side-effect of making the Single Writer behave a lot like a global mutex in the system, which leads me back to the opening comment of this issue: For this use case, the system as-is does not really exhibit lock-free behaviour anyway.

Also, after having this system in production for about 6 months, we must conclude that operating a Single Writer process in Azure is not entirely predictable.
Following the same principles as described here, we have found that it is not entirely uncommon for the Azure Scheduler to either skip a ping entirely, or ping too early. That leads to periods of time where the Single Writer process is down, and commands pile up in the queue accordingly - making it even harder for the writer to keep up afterwards.

So, we are currently testing a prototype version of a concurrent writers implementation with a CQRS-style service. The implementation relies on ETag's (for consistent reads during processing of individual commands), on short-lived blob leases (15 seconds) for locking and on a 'self-healing' command processing implementation (so commands can be safely retried, even in the event of partially succeding updates to the underlying storage).

If the real-world tests deem it fit for production, we are considering contributing this feature to AtomEventStore, as a separate add-on along the lines of the AzureBlob add-on. It would, however, be an FSharp implementation - which hopefully wouldn't make anyone frown upon it.

I'm submitting this issue now in the hope of getting feedback from other users of AtomEventStore on such a potential feature - particularly for critical review of the concept, but also to hear from others who may find it useful.

Scan an assembly in order to create a XmlContentSerializer

Now that XmlContentSerializer has a CreateTypeResolver method, it's possible to create an instance like this:

var serializer = new XmlContentSerializer(
    XmlContentSerializer.CreateTypeResolver(
        typeof(UserCreated).Assembly));

This is better than having to manually create an instance of TypeResolutionTable, but now that the building blocks are in place, it'd be nice with an even easier short-cut:

XmlContentSerializer serializer =
    XmlContentSerializer.Scan(typeof(UserCreated).Assembly);

In the above example, this method is called Scan, but other alternatives are possible:

Scan
Create
CreateSerializer
CreateContentSerializer

Portable Class Library

Given the server-less nature of AtomEventStore, and given that it can use a normal file system as a storage mechanism, it seems that perhaps it would also be a good fit on mobile devices, as an alternative to a 'mobile' database implementation.

If that makes sense, perhaps AtomEventStore should be a Portable Class Library (PCL). It may simply be my naïvety that makes me consider this, but I understand that while PCL has been a difficult beast to tame previously, the technology should have matured in the last couple of years.

Writing this, I also don't know if it's possible to migrate AtomEventStore to a PCL, but it may be worth investigating.

If it can be done without introducing breaking changes, it sounds as a Pareto improvement. If it can be done, but only with breaking changes, we need to consider the options. One of those options may be to not make AtomEventStore a PCL at all.

AtomEventsInMemory should be thread-safe

The AtomEventsInMemory class isn't thread-safe, but since it may conceivably be accessed by multiple readers and writers, it should be.

LifoEvents

The FifoEvents<T> class exists to provide First-In, First-Out enumeration of an event stream. Likewise, a LifoEvents<T> class should be added to provide Last-In, First-Out enumeration of an event stream.

Reverse method on event readers.

Both FifoEvents<T> and LifoEvents<T> should have a Reverse method.

FifoEvents<T>.Reverse should return the equivalent LifoEvents<T> instance.
LifoEvents<T>.Reverse should return the equivalent FifoEvents<T> instance.

Update of last link should silently fail

As discussed in #48, when a new page is created, the second of two update operations may fail, which means that the index document's last link becomes stale. The system is still 'almost consistent' in the sense that all other links point to correct documents, apart from the last link in the index document. This link points to an existing document, but that document is no longer the most recent document.

However, if the update operation of the index throws an exception, a client may interpret that as a failure to write an event to storage, and thus may retry the operation. This would be an error, because the event has been written, and thus, a duplicate event would be written if the operation is retried.

For that reason, while the first update operation (of the new previous page) must succeed or throw an exception, the second update (of the index page) must silently fail if an error occurs.

Storage as a doubly linked list

As we've discussed off-line, it would be beneficial if we could treat the Atom storage model as a doubly linked list.

Current implementation

Right now, it's only a unidirectional linked list, starting at the most recent entry.

Advantages

This implementation has the advantages that it's optimised for writing, and for reading only the latest entries:

It's optimised for writing because the identity of an event sequence is defined by the 'index', which is the ID of the page containing the most recent entries. Thus, when a writer appends a new entry, it can append the new entry directly to the index. This is an O(1) operation.
It's optimized for reading latest entries because a reader can start with the most recent entry, and then move back in time until it encounters an entry it's already seen. If done frequently, this will often be a fast operation too, because a reader doesn't have to move particularly far back.

Disadvantages

The current implementation has the disadvantage that it's expensive to read the entire event sequence. The current iterator starts with the most recent entry and moves back in time. Unless the client breaks out of the loop prematurely, iteration continues until the first/oldest entry is reached.

However, when entries are events, events must be played back in the correct order when interpreted. This means that a client must keep all events in memory, and only when iteration stops can it reverse the sequence and play back the events. This doesn't scale, because for long sequences, the client will run out of memory before it reaches the oldest entry.

If we can move in both directions, it would be more flexible, and enable more scenarios.

A bit of Atom nomenclature

Before we discuss alternatives, let's review the Atom nomenclature, in order to be sure that we're on the same page.

An Atom Feed logically consists of an arbitrary number of Atom Entries.

An Atom Entry contains the data we store; in our case events (or, normally, changesets of events).

Most recent entries appear first in the Feed.

While a Feed is logically only a single, unbroken (ordered) collection of Entries, in practice, a Feed is made up of a number of XML documents (or pages). The documents can link to each other using Atom Links.

An Atom Link is an XML element with some important attributes:

href is the address (URI) of the link
rel is the relationship type of the link. It indicates what a client can expect to find at that address; e.g. the previous page in a collection of documents.

Atom defines some common rel values:

self indicates the resource itself
previous indicates the immediately previous page. Depending on your perspective, you can think of 'previous' as indicating something that was created previously in time, or as the page you previously looked at. The last of these two definitions is ambiguous, as it depends on your order of traversal, so in AtomEventStore, we use the definition that previous indicates pages (and entries) that were previously created, and thus, represents older data.
next is the opposite of previous. In AtomEventStore, we use the definition that next indicates pages (and entries) that are newer than the current page.
first, by the same token, indicates the first (and thereby oldest) page ever written as part of the event sequence.
last is the opposite of first. In AtomEventStore, it indicates the newest page in the event sequence.

Alternative 1

As we've also discussed off-line, if we let the ID of the sequence be the oldest (first) page, it would be possible to append new pages as entries are added.

Assuming a page size of n, as long as there are less than n entries, we can simply update the Feed document. As this is an update operation on a single document, we can protect this operation against concurrency issues by using optimistic concurrency (e.g. ETags in HTTP APIs).

When we're about to add element number n + 1, we create a new Feed document and write the new Entry in the new new document. We also add previous and first links, pointing to the first page. When that write operation completes, we update the first page by adding a next link, pointing to the new document. This update operation is also protected with optimistic concurrency, so if the update fails, we consider the entire operation failed, and the client would have to retry. This would leave the newly created page orphaned, but a background job could clean it up later.

When we're about to add element number n * 2 + 1, almost the same set of operations happen again:

Create a new Feed document and write the new Entry in the new document. Also add a previous link, pointing to the previous page, and a first link, pointing to the first page.
Update the previous page by adding a next link that points to the new document. This update is again protected by optimistic concurrency.

This process can be repeated for every entry number n * m + 1, where m is the number of pages.

Notice that while all pages have a first link, no pages have a last link.

Advantages

The advantages of this alternative is that a client can replay events in the order they originally happened, because iteration will start at first and follow next links until there are no more entries.

It's also possible for a client to bookmark the latest entry it's seen, and then iterate only through those entries that are newer than the last entry it knew about.

Disadvantages

However, as AtomEventStore doesn't surface the addresses of each Feed document, a bookmarking feature would have to be implemented as a stateful Iterator that remembers the most recent Entry it served last.

In off-line discussions, we've called such a class a ForwardReadOnlyCursor<T>. The first time a client enumerates all items, it starts at first and follows next links until it has enumerated all entries. It then stores the Entry's ID, as well as the containing Feed ID in an internal variable.

The next time a client enumerates the ForwardReadOnlyCursor<T>, it starts where it left off before, and again keeps moving until it's reached the most recent Entry. Again, it stores the most recent Entry's ID, as well as the container Feed ID.

In order to be thread-safe, it must ensure that concurrent clients don't enumerate the same items, so that adds extra complexity.

In addition to that, it's currently unclear to me what would happen if a client starts to enumerate, but breaks off before the most recent Entry is reached (e.g. by invoking the First() LINQ method).

Basically, this design violates CQS, which indicates to me that something is wrong with the design.

Another disadvantage is that because last is unknown, write operations become inefficient. In order to write to the last page, a client would have to start at first and then follow all next links until it reaches the most recent page.

Again, a writer could be stateful and remember where it last wrote, so that it only has to follow next links until it reaches the most recent page, but that still adds complexity.

Alternative 2

This suggestion builds on Alternative 1, but attempts to also maintain a last link on first.

It could work like this:

Assume, again, that n indicates the page size. For the first n Entries, a writer simply updates the first page, using optimistic concurrency. The first page contains a last link, pointing to itself.

When a writer has to add Entry number n + 1, it creates a new document like in alternative 1. When it updates the first page, not only does it add a next link, but it also updates the last link to point to the new document. This is still a 'create' operation followed by a single 'update' operation, so it's still safe.

When a writer has to add Entry number n * 2 + 1, it create a new document like in alternative 1. It both updates the previous document by adding a next link, but then it also updates the first document by updating its last link.

This is one 'create' operation followed by two 'update' operations. While both updates are individually protected by optimistic concurrency, they are not enlisted in a distributed transaction. Thus, there's a risk that one of them might fail. What happens if this is the case?

The write operations happen in strict sequence: 'create', 'update', 'update'. The next operation will only be attempted if the previous operation succeeded.

If the 'create' operation fails, no harm is done. The Entry is not written to persistent storage, and the client would have to retry.
If the first 'update' operation fails, the new Feed document is orphaned. While the Entry is written to persistent storage, no links point to it. The client must retry, and a background job can delete the orphaned file.
If the second 'update' operation fails, the new Entry is written, and all next and previous links are consistent. However, the last link in the first document may not be pointing to the most recent Feed page - or what? Assuming that the 'update' operation fails because of a concurrency conflict, what does a concurrency conflict indicate? If all clients follow this 'create', 'update', 'update' protocol, the only reason to update the first page is to update the *last link_. We know that the first part of the protocol ('create', 'update') guarantees that if these two operations succeed, all *next_ and previous links are consistent and valid, so the only problem is to ensure that the last link points to the correct page.

Let's further examine how concurrency conflicts can occur for the second update. As the page size grows, this becomes more and more unlikely, so consider the degenerate case where the page size is 1.

Imaging that we have already have 3 pages. Here are some scenarios:

Scenario 1

Writer1 creates a new document, page4
Writer1 updates page3 with a link to page4
Writer2 creates a new document, page5
Writer2 updates page4 with a link to page5
Writer1 updates page1 with a link to page4. This is possible, because page1 hasn't changed yet.
Writer2 attempts to update page1 with a link to page5. However, this fails because page1 was already updated. This is not OK, because it leaves the last link on page1 pointing to page4

Scenario 2

Writer1 creates a new document, page4
Writer1 updates page3 with a link to page4
Writer2 creates a new document, page5
Writer2 updates page4 with a link to page5
Writer2 updates page1 with a link to page5. This is possible, because page1 hasn't changed yet.
Writer1 attempts to update page1 with a link to page4. However, this fails because page1 was already updated. However, this is OK, because the last link in page1 already points to page5.

Scenario 3

Writer1 creates a new document, page4
Writer1 updates page3 with a link to page4
The client crashes. This is not OK, because it leaves the last link on page1 pointing to page3.

It seems to me that, while it's not possible to guarantee that the last link on the first page always points to the true last page, it'll always point to a real, existing page. Thus, AtomEventStore could adopt the pragmatic approach that in order to reach the most recent page, it can follow the last link, then attempt to follow next links, if any are present.

That does slightly bend the semantics of a last link, but would it work, or is there a flaw in my logic?

Advantages

The advantages of having doubly-linked lists with first and (best-effort) last links is that it becomes possible to offer stateless traversal in both directions.

In order to replay all events, a client starts at first (which is also the known 'address' of the feed), and then follows next links until all items have been enumerated.

In order to get only the events since a last known event, a client starts at first, follows the last link, further follows next links if any are present, then starts iterating from the most recent Entry backwards, following previous links.

In order to write a new Entry, a client starts at first, follows the last link, further follows next links if any are present, then writes the new Entry to the most recent page.

Disadvantages

This alternative slightly bends the semantics of what last means. Also, I need a second opinion on this, because I wonder if the system could get into some sort of race where last is never updated... I can't see how that would happen, but I don't feel certain that it can't.

My intuition is that as the page size grows, the probability of a concurrency conflict when updating the last link decreases, but if there's one thing I've learned about concurrency design, it's that you can't rely on probabilities.

Single Writer Pattern

Reading the documentation I found information that writes should be performed using Single Writer pattern. My understanding is that one of possible usages of AtomEventStore is persistence for Event Sourcing scenarios.
Assuming my understanding is correct it is not clear to me how to achieve that with:
a. single process/role writing to a store
b. multiple processes/roles writing to a store

Finally is it possible to achieve optimistic concurrency on aggregate version with AtomEventStore?

Convention-based type resolver for XML serializer

If you want to use the XmlContentSerializer, you must supply an appropriate ITypeResolver instance. Current options are:

Use TypeResolutionTable
Provide a custom implementation of ITypeResolver

It would be nice with an associated ITypeResolver implementation that can scan an assembly and find all types annotated with [XmlRoot] attributes, and pull the associated local and namespace names out of them.

Perhaps it should be an internal class, with a static factory method defined on XmlContentSerializer, in order to signal that this ITypeResolver only makes sense used together with XmlContentSerializer. Another option is to make it a nested public class under XmlContentSerializer.

In any case, it would need an Assembly object as input. It would then scan that Assembly for all matching classes.

AtomEventObserver<T> should attempt to repair stale last link

As described in #48, the update operation that updates the index document's last link may occasionally fail. When that happens, the last link becomes stale; it still points to an existing document, but no longer the last document (meaning: the newest document).

Subsequent write operations should attempt to detect this situation and remedy the situation.

Specifically, when AtomEventObserver<T>.OnNext or AtomEventObserver<T>.AppendAsync is invoked, it should check that the index document's last link points to a document that has no next link. If the document pointed to has a next link, it means that the last link points to a document that's no longer the last document. In this case, the method should recursively follow all next links until it finds a document that has no next link. This is the current last document, and the index document's last link should be updated with the address of that current last document.

Perhaps it may make sense to do this after completing #96. Also, the implementation for finding the true last document may be reused from the implementation of #97.

AtomEventObserver<T> should implement IObserver<T>

LifoEvents should look past last address

As discussed in #48, when events are written, the update of the index document's last link may fail on certain occasions.

Currently, LifoEvents<T> reads the index document's last link, and iterates backwards from there. However, before doing that, it should read in the document identified by the last link, and examine if that document has a next link. If it does, it should follow that next link, and recursively keep doing that until it reaches the true head of the event stream. That's the true last page in the event stream, and it should then enumerate the events backwards from that document.

It may be appropriate to wait with the implementation of this until #96 is done.

Scan an assembly in order to create a DataContractContentSerializer

Now that DataContractContentSerializer has a CreateTypeResolver method, it's possible to create an instance like this:

var serializer = new DataContractContentSerializer(
    DataContractContentSerializer.CreateTypeResolver(
        typeof(UserCreated).Assembly));

This is better than having to manually create an instance of TypeResolutionTable, but now that the building blocks are in place, it'd be nice with an even easier short-cut:

DataContractContentSerializer serializer =
    DataContractContentSerializer.Scan(typeof(UserCreated).Assembly);

In the above example, this method is called Scan, but other alternatives are possible:

Scan
Create
CreateSerializer
CreateContentSerializer

Logo

AtomEventStore should have a logo, to be used as icon for its NuGet package, etc.

I already have some vague concept ideas myself, but since I'm not that happy with it, I'd like to solicit other suggestions.

For the moment, then, I'm not going to share my own attempt, since I don't want to influence other people's creative efforts :)

Convention-based type resolver for Data Contract serializer

If you want to use the DataContractContentSerializer, you must supply an appropriate ITypeResolver instance. Current options are:

Use TypeResolutionTable
Provide a custom implementation of ITypeResolver

It would be nice with an associated ITypeResolver implementation that can scan an assembly and find all types annotated with [DataContract] attributes, and pull the associated local and namespace names out of them.

Perhaps it should be an internal class, with a static factory method defined on DataContractContentSerializer, in order to signal that this ITypeResolver only makes sense used together with DataContractContentSerializer. Another option is to make it a nested public class under DataContractContentSerializer.

In any case, it would need an Assembly object as input. It would then scan that Assembly for all matching classes.

greantech / atomeventstore Goto Github PK

atomeventstore's People

Contributors

Stargazers

Watchers

Forkers

atomeventstore's Issues

Current implementation

Advantages

Disadvantages

A bit of Atom nomenclature

Alternative 1

Advantages

Disadvantages

Alternative 2

Scenario 1

Scenario 2

Scenario 3

Advantages

Disadvantages

Recommend Projects

Recommend Topics

Recommend Org