greantech / atomeventstore Goto Github PK
View Code? Open in Web Editor NEWA server-less .NET Event Store based on the Atom syndication format
License: MIT License
A server-less .NET Event Store based on the Atom syndication format
License: MIT License
I just got this unit test error when running all tests in Visual Studio:
Test 'Grean.AtomEventStore.UnitTests.FifoEventsTests.SutYieldsPagedEvents(dummyResolver: Grean.AtomEventStore.UnitTests.TestEventTypeResolver, dummySerializer: Grean.AtomEventStore.XmlContentSerializer, dummyInjectedIntoSut: Grean.AtomEventStore.AtomEventsInMemory, dummyId: urn:uuid:dad86e31-7e00-48e5-9feb-1f0d9c29c247, writer: Grean.AtomEventStore.AtomEventObserver
1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX], sut: Grean.AtomEventStore.FifoEvents
1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX], eventGenerator: Ploeh.AutoFixture.Generator1[Grean.AtomEventStore.UnitTests.XmlAttributedTestEventX])' failed: System.InvalidOperationException : Sequence contains no elements at System.Linq.Enumerable.Single[TSource](IEnumerable
1 source)
AtomLink.cs(105,0): at Grean.AtomEventStore.AtomLink.ReadFrom(XmlReader xmlReader)
AtomFeed.cs(207,0): at Grean.AtomEventStore.AtomFeed.b__1(XPathNavigator x)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext() at System.Linq.Enumerable.SingleOrDefault[TSource](IEnumerable
1 source, Func2 predicate) FifoEvents.cs(62,0): at Grean.AtomEventStore.FifoEvents
1.ReadNext(AtomFeed page)
FifoEvents.cs(36,0): at Grean.AtomEventStore.FifoEvents1.<>c__DisplayClass2.<GetEnumerator>b__0() at System.Threading.Tasks.Task
1.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
Re-running the tests produced an all green result, so this indicates an Erratic Test (or, a subtle bug somewhere in AtomEventStore).
The ConventionBasedSerializerOfComplexImmutableClasses
IContentSerializer
has too many implicit assumptions to be a robust part of AtomEventStore itself.
It should be pulled out of AtomEventStore, and into a separate satellite library.
Hello!
I am currently working on a project where I would like to have an event stream per "subject" (or "topic") which is identified by its unique name. The topic itself doesn't have any additional information except its events.
So, naturally, I would like to have this name as an Id of the event stream.
If forced to use uuid IRIs I will have to add additional storage for storing topic name-uuid mappings.
Which means additional step for creating new "topic" as well as additional lookup (to read topic's uuid) when reading events.
So I was thinking maybe I could use custom URNs, e.g. urn:x-mycustomnid:topicname
Do you think this is a valid use case? Is it a good idea to have AtomEventStore support custom URNs?
Whenever I use DataContractContentSerializer
or XmlContentSerializer
, I find myself implementing the required ITypeResolver
in more or less the same way:
public class UserTypeResolver : ITypeResolver
{
public Type Resolve(string localName, string xmlNamespace)
{
switch (xmlNamespace)
{
case "urn:grean:samples:user-sign-up":
switch (localName)
{
case "user-created":
return typeof(UserCreated);
case "email-verified":
return typeof(EmailVerified);
case "email-changed":
return typeof(EmailChanged);
default:
throw new ArgumentException(
"Unknown local name.",
"localName");
}
default:
throw new ArgumentException(
"Unknown XML namespace.",
"xmlNamespace");
}
}
}
It would be nice with a reusable class where you can simply supply the values; something like this would be nice:
var resolver = new TypeResolutionTable(
new TypeResolutionEntry("urn:grean:samples:user-sign-up", "user-created", typeof(UserCreated)),
new TypeResolutionEntry("urn:grean:samples:user-sign-up", "email-verified", typeof(EmailVerified)),
new TypeResolutionEntry("urn:grean:samples:user-sign-up", "email-changed", typeof(EmailChanged)));
TypeResolutionTable
should obviously implement ITypeResolver
.
First off, let me start by noting that, yes, this suggestion is seemingly at odds with the lock-free design goal, but please bear with me a bit before discarding the idea entirely.
The thing is that, for some use cases, the Single Writer limitation hinders the scalability of the design: The writer process may not be able to keep up with the rate of updates in the system.
One such case is a not-entirely-fictional multi-tenant infrastructure SaaS system operated by Grean. We have a situation with a high enough frequency of temporally concurrent updates for the writer to keep up with, but at the same time, the per-event-stream contention frequency is quite low (but not ignorable).
This has the interesting side-effect of making the Single Writer behave a lot like a global mutex in the system, which leads me back to the opening comment of this issue: For this use case, the system as-is does not really exhibit lock-free behaviour anyway.
Also, after having this system in production for about 6 months, we must conclude that operating a Single Writer process in Azure is not entirely predictable.
Following the same principles as described here, we have found that it is not entirely uncommon for the Azure Scheduler to either skip a ping entirely, or ping too early. That leads to periods of time where the Single Writer process is down, and commands pile up in the queue accordingly - making it even harder for the writer to keep up afterwards.
So, we are currently testing a prototype version of a concurrent writers implementation with a CQRS-style service. The implementation relies on ETag's (for consistent reads during processing of individual commands), on short-lived blob leases (15 seconds) for locking and on a 'self-healing' command processing implementation (so commands can be safely retried, even in the event of partially succeding updates to the underlying storage).
If the real-world tests deem it fit for production, we are considering contributing this feature to AtomEventStore, as a separate add-on along the lines of the AzureBlob add-on. It would, however, be an FSharp implementation - which hopefully wouldn't make anyone frown upon it.
I'm submitting this issue now in the hope of getting feedback from other users of AtomEventStore on such a potential feature - particularly for critical review of the concept, but also to hear from others who may find it useful.
Now that XmlContentSerializer
has a CreateTypeResolver
method, it's possible to create an instance like this:
var serializer = new XmlContentSerializer(
XmlContentSerializer.CreateTypeResolver(
typeof(UserCreated).Assembly));
This is better than having to manually create an instance of TypeResolutionTable
, but now that the building blocks are in place, it'd be nice with an even easier short-cut:
XmlContentSerializer serializer =
XmlContentSerializer.Scan(typeof(UserCreated).Assembly);
In the above example, this method is called Scan
, but other alternatives are possible:
Given the server-less nature of AtomEventStore, and given that it can use a normal file system as a storage mechanism, it seems that perhaps it would also be a good fit on mobile devices, as an alternative to a 'mobile' database implementation.
If that makes sense, perhaps AtomEventStore should be a Portable Class Library (PCL). It may simply be my naïvety that makes me consider this, but I understand that while PCL has been a difficult beast to tame previously, the technology should have matured in the last couple of years.
Writing this, I also don't know if it's possible to migrate AtomEventStore to a PCL, but it may be worth investigating.
If it can be done without introducing breaking changes, it sounds as a Pareto improvement. If it can be done, but only with breaking changes, we need to consider the options. One of those options may be to not make AtomEventStore a PCL at all.
The AtomEventsInMemory
class isn't thread-safe, but since it may conceivably be accessed by multiple readers and writers, it should be.
The FifoEvents<T>
class exists to provide First-In, First-Out enumeration of an event stream. Likewise, a LifoEvents<T>
class should be added to provide Last-In, First-Out enumeration of an event stream.
Both FifoEvents<T>
and LifoEvents<T>
should have a Reverse
method.
FifoEvents<T>.Reverse
should return the equivalent LifoEvents<T>
instance.LifoEvents<T>.Reverse
should return the equivalent FifoEvents<T>
instance.As discussed in #48, when a new page is created, the second of two update operations may fail, which means that the index document's last link becomes stale. The system is still 'almost consistent' in the sense that all other links point to correct documents, apart from the last link in the index document. This link points to an existing document, but that document is no longer the most recent document.
However, if the update operation of the index throws an exception, a client may interpret that as a failure to write an event to storage, and thus may retry the operation. This would be an error, because the event has been written, and thus, a duplicate event would be written if the operation is retried.
For that reason, while the first update operation (of the new previous page) must succeed or throw an exception, the second update (of the index page) must silently fail if an error occurs.
As we've discussed off-line, it would be beneficial if we could treat the Atom storage model as a doubly linked list.
Right now, it's only a unidirectional linked list, starting at the most recent entry.
This implementation has the advantages that it's optimised for writing, and for reading only the latest entries:
The current implementation has the disadvantage that it's expensive to read the entire event sequence. The current iterator starts with the most recent entry and moves back in time. Unless the client breaks out of the loop prematurely, iteration continues until the first/oldest entry is reached.
However, when entries are events, events must be played back in the correct order when interpreted. This means that a client must keep all events in memory, and only when iteration stops can it reverse the sequence and play back the events. This doesn't scale, because for long sequences, the client will run out of memory before it reaches the oldest entry.
If we can move in both directions, it would be more flexible, and enable more scenarios.
Before we discuss alternatives, let's review the Atom nomenclature, in order to be sure that we're on the same page.
An Atom Feed logically consists of an arbitrary number of Atom Entries.
An Atom Entry contains the data we store; in our case events (or, normally, changesets of events).
Most recent entries appear first in the Feed.
While a Feed is logically only a single, unbroken (ordered) collection of Entries, in practice, a Feed is made up of a number of XML documents (or pages). The documents can link to each other using Atom Links.
An Atom Link is an XML element with some important attributes:
Atom defines some common rel values:
As we've also discussed off-line, if we let the ID of the sequence be the oldest (first) page, it would be possible to append new pages as entries are added.
Assuming a page size of n, as long as there are less than n entries, we can simply update the Feed document. As this is an update operation on a single document, we can protect this operation against concurrency issues by using optimistic concurrency (e.g. ETags in HTTP APIs).
When we're about to add element number n + 1, we create a new Feed document and write the new Entry in the new new document. We also add previous and first links, pointing to the first page. When that write operation completes, we update the first page by adding a next link, pointing to the new document. This update operation is also protected with optimistic concurrency, so if the update fails, we consider the entire operation failed, and the client would have to retry. This would leave the newly created page orphaned, but a background job could clean it up later.
When we're about to add element number n * 2 + 1, almost the same set of operations happen again:
This process can be repeated for every entry number n * m + 1, where m is the number of pages.
Notice that while all pages have a first link, no pages have a last link.
The advantages of this alternative is that a client can replay events in the order they originally happened, because iteration will start at first and follow next links until there are no more entries.
It's also possible for a client to bookmark the latest entry it's seen, and then iterate only through those entries that are newer than the last entry it knew about.
However, as AtomEventStore doesn't surface the addresses of each Feed document, a bookmarking feature would have to be implemented as a stateful Iterator that remembers the most recent Entry it served last.
In off-line discussions, we've called such a class a ForwardReadOnlyCursor<T>
. The first time a client enumerates all items, it starts at first and follows next links until it has enumerated all entries. It then stores the Entry's ID, as well as the containing Feed ID in an internal variable.
The next time a client enumerates the ForwardReadOnlyCursor<T>
, it starts where it left off before, and again keeps moving until it's reached the most recent Entry. Again, it stores the most recent Entry's ID, as well as the container Feed ID.
In order to be thread-safe, it must ensure that concurrent clients don't enumerate the same items, so that adds extra complexity.
In addition to that, it's currently unclear to me what would happen if a client starts to enumerate, but breaks off before the most recent Entry is reached (e.g. by invoking the First()
LINQ method).
Basically, this design violates CQS, which indicates to me that something is wrong with the design.
Another disadvantage is that because last is unknown, write operations become inefficient. In order to write to the last page, a client would have to start at first and then follow all next links until it reaches the most recent page.
Again, a writer could be stateful and remember where it last wrote, so that it only has to follow next links until it reaches the most recent page, but that still adds complexity.
This suggestion builds on Alternative 1, but attempts to also maintain a last link on first.
It could work like this:
Assume, again, that n indicates the page size. For the first n Entries, a writer simply updates the first page, using optimistic concurrency. The first page contains a last link, pointing to itself.
When a writer has to add Entry number n + 1, it creates a new document like in alternative 1. When it updates the first page, not only does it add a next link, but it also updates the last link to point to the new document. This is still a 'create' operation followed by a single 'update' operation, so it's still safe.
When a writer has to add Entry number n * 2 + 1, it create a new document like in alternative 1. It both updates the previous document by adding a next link, but then it also updates the first document by updating its last link.
This is one 'create' operation followed by two 'update' operations. While both updates are individually protected by optimistic concurrency, they are not enlisted in a distributed transaction. Thus, there's a risk that one of them might fail. What happens if this is the case?
The write operations happen in strict sequence: 'create', 'update', 'update'. The next operation will only be attempted if the previous operation succeeded.
Let's further examine how concurrency conflicts can occur for the second update. As the page size grows, this becomes more and more unlikely, so consider the degenerate case where the page size is 1.
Imaging that we have already have 3 pages. Here are some scenarios:
It seems to me that, while it's not possible to guarantee that the last link on the first page always points to the true last page, it'll always point to a real, existing page. Thus, AtomEventStore could adopt the pragmatic approach that in order to reach the most recent page, it can follow the last link, then attempt to follow next links, if any are present.
That does slightly bend the semantics of a last link, but would it work, or is there a flaw in my logic?
The advantages of having doubly-linked lists with first and (best-effort) last links is that it becomes possible to offer stateless traversal in both directions.
In order to replay all events, a client starts at first (which is also the known 'address' of the feed), and then follows next links until all items have been enumerated.
In order to get only the events since a last known event, a client starts at first, follows the last link, further follows next links if any are present, then starts iterating from the most recent Entry backwards, following previous links.
In order to write a new Entry, a client starts at first, follows the last link, further follows next links if any are present, then writes the new Entry to the most recent page.
This alternative slightly bends the semantics of what last means. Also, I need a second opinion on this, because I wonder if the system could get into some sort of race where last is never updated... I can't see how that would happen, but I don't feel certain that it can't.
My intuition is that as the page size grows, the probability of a concurrency conflict when updating the last link decreases, but if there's one thing I've learned about concurrency design, it's that you can't rely on probabilities.
Reading the documentation I found information that writes should be performed using Single Writer pattern. My understanding is that one of possible usages of AtomEventStore is persistence for Event Sourcing scenarios.
Assuming my understanding is correct it is not clear to me how to achieve that with:
a. single process/role writing to a store
b. multiple processes/roles writing to a store
Finally is it possible to achieve optimistic concurrency on aggregate version with AtomEventStore?
If you want to use the XmlContentSerializer
, you must supply an appropriate ITypeResolver
instance. Current options are:
TypeResolutionTable
ITypeResolver
It would be nice with an associated ITypeResolver
implementation that can scan an assembly and find all types annotated with [XmlRoot]
attributes, and pull the associated local and namespace names out of them.
Perhaps it should be an internal class, with a static factory method defined on XmlContentSerializer
, in order to signal that this ITypeResolver
only makes sense used together with XmlContentSerializer
. Another option is to make it a nested public class under XmlContentSerializer
.
In any case, it would need an Assembly
object as input. It would then scan that Assembly
for all matching classes.
As described in #48, the update operation that updates the index document's last link may occasionally fail. When that happens, the last link becomes stale; it still points to an existing document, but no longer the last document (meaning: the newest document).
Subsequent write operations should attempt to detect this situation and remedy the situation.
Specifically, when AtomEventObserver<T>.OnNext
or AtomEventObserver<T>.AppendAsync
is invoked, it should check that the index document's last link points to a document that has no next link. If the document pointed to has a next link, it means that the last link points to a document that's no longer the last document. In this case, the method should recursively follow all next links until it finds a document that has no next link. This is the current last document, and the index document's last link should be updated with the address of that current last document.
Perhaps it may make sense to do this after completing #96. Also, the implementation for finding the true last document may be reused from the implementation of #97.
As discussed in #48, when events are written, the update of the index document's last link may fail on certain occasions.
Currently, LifoEvents<T>
reads the index document's last link, and iterates backwards from there. However, before doing that, it should read in the document identified by the last link, and examine if that document has a next link. If it does, it should follow that next link, and recursively keep doing that until it reaches the true head of the event stream. That's the true last page in the event stream, and it should then enumerate the events backwards from that document.
It may be appropriate to wait with the implementation of this until #96 is done.
Now that DataContractContentSerializer
has a CreateTypeResolver
method, it's possible to create an instance like this:
var serializer = new DataContractContentSerializer(
DataContractContentSerializer.CreateTypeResolver(
typeof(UserCreated).Assembly));
This is better than having to manually create an instance of TypeResolutionTable
, but now that the building blocks are in place, it'd be nice with an even easier short-cut:
DataContractContentSerializer serializer =
DataContractContentSerializer.Scan(typeof(UserCreated).Assembly);
In the above example, this method is called Scan
, but other alternatives are possible:
AtomEventStore should have a logo, to be used as icon for its NuGet package, etc.
I already have some vague concept ideas myself, but since I'm not that happy with it, I'd like to solicit other suggestions.
For the moment, then, I'm not going to share my own attempt, since I don't want to influence other people's creative efforts :)
If you want to use the DataContractContentSerializer
, you must supply an appropriate ITypeResolver
instance. Current options are:
TypeResolutionTable
ITypeResolver
It would be nice with an associated ITypeResolver
implementation that can scan an assembly and find all types annotated with [DataContract]
attributes, and pull the associated local and namespace names out of them.
Perhaps it should be an internal class, with a static factory method defined on DataContractContentSerializer
, in order to signal that this ITypeResolver
only makes sense used together with DataContractContentSerializer
. Another option is to make it a nested public class under DataContractContentSerializer
.
In any case, it would need an Assembly
object as input. It would then scan that Assembly
for all matching classes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.