rkrzewski / akka-cluster-etcd Goto Github PK
View Code? Open in Web Editor NEWAkka cluster management using etcd
License: Other
Akka cluster management using etcd
License: Other
ActorSystem in EitherJunctionSpec
is not terminated.
To be honest, I don't know whether ActorSystem
s resources are finalised after when it's GCed. A quick Google search did not provide an answer for this question. @rkrzewski do you know the answer? :-)
Akka documentation recommends shutting down the system in tests in afterAll
block.
Resource (think: threads) leakage is a problem when running tests interactively in SBT.
Akka HTTP does not support redirects yet, akka/akka#15990
I've implemented a workaround (pl.caltha.akka.http.HttpRedirects
) because I was under impression that this was necessary to talk to etcd cluster: when connecting to a follower node, requests that require consistency guarantees are redirected to current master node. However, it turned out that etcd since 2.0 handles master communication internally and consistency guarantees are in effect on all operations. HTTP Redirect replies are never returned to the client.
No matter how fun and instructive implementing HttpRedirects flow was, it needs to go.
Idle timeouts are not yet implemented in Akka HTTP akka/akka#17732
Until then, a workaround using Flow.takeWithin
should be impelemented.
Example log from a follower node:
[INFO] [12/06/2015 10:31:51.875] [Main-akka.actor.default-dispatcher-3] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:31:52.031] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:31:52.163] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:31:53.102] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Welcome from [akka.tcp://[email protected]:2552]
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] seed nodes failed to respond in 60000 ms
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:32:52.381] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:32:52.404] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:32:52.419] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Trying to join seed nodes [akka.tcp://[email protected]:2552, akka.tcp://[email protected]:2552] when already part of a cluster, ignoring
Message joined the cluster
suggests that seedsJoin
timer should have been cancelled at ClusterDiscoveryActor#157 but apparently it still fires 60 seconds later.
The application should boot an empty cluster and keep running until JVM termination.
It should be packaged as a Docker container.
It should connect to etcd at host etcd
that will be mapped to actual server using Docker links or cluster service discovery mechanism.
Ref #15
EtcdClient
failures should be handledSince there are a few people interested in contributing, lets make sure the everyone is on the same page regarding IP right issues.
There's an ASL 2.0 LICENSE
file already, but to save ourselves potential trouble down the road, for example when donating the code as Akka contrib module, I propose do adopt http://developercertificate.org/ ceritified by adding Signed-off-by
header to all commit messages.
I also propose introducing https://www.sociomantic.com/blog/2014/05/git-triangular-workflow/#.Vhwzzx94ukA and peer review policy for non trivial changes, however the latter depends if we can agree on certain level of commitment.
Something to discuss on the upcoming meeting this Thursday. If somebody is interested in joining in person or over Skype, let me know via e-mail.
Currently retryMsg
is implemented using ActorSystem
scheduler. Using FSM timer will most likely be easier and more readable.
Currently SeedListActor
will advertise all cluster members as seeds. This is not practical for large clusters.
ClusterDiscoveryActor
instance on node that lost leader election should subscribe to cluster events and in case it's host node was appointed to be the new leader of Akka cluster should transition to leader role: register itself as leader in etcd and update the seed list according to it's best knowledge. This will ensure that new nodes will be able to use discovery properly after the original leader leaves the cluster.
akka/akka#20332 is slated for release in Akka 2.4.5
EtcdClientImpl
methods returning Future
should also be hardened against throwing exceptions on the caller's thread (as opposed to completing the future with a failure)
frontend_1 | [ERROR] [12/05/2015 23:00:23.491] [Main-akka.actor.default-dispatcher-20] [akka.tcp://[email protected]:2552/user/StreamSupervisor-3/flow-619-25-actorSubscriberSink] Incoming stream terminated with error
frontend_1 | java.util.concurrent.TimeoutException: No elements passed in the last 1 minute.
frontend_1 | at akka.stream.impl.Timers$IdleBidi$$anon$4.onTimer(Timers.scala:145)
frontend_1 | at akka.stream.stage.TimerGraphStageLogic.akka$stream$stage$TimerGraphStageLogic$$onInternalTimer(GraphStage.scala:939)
frontend_1 | at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 | at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 | at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:357)
frontend_1 | at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
frontend_1 | at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:292)
frontend_1 | at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
frontend_1 | at akka.actor.ActorCell.invoke(ActorCell.scala:495)
frontend_1 | at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
frontend_1 | at akka.dispatch.Mailbox.run(Mailbox.scala:224)
frontend_1 | at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
frontend_1 | at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
frontend_1 | at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
frontend_1 | at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
frontend_1 | at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
It seems that a stream of keepalive messages need to be merged into websocket handler ouput. There are two things that bother me though:
In order to make the demo applications more fun, we could implement a simple web application that would visualize current state of the cluster:
The application could be deployed on all nodes with frontend
role and accessed through LB mechanism appropriate for the cluster platform.
Leader entry created during election has a finite TTL and needs to be refreshed periodically.
Refresh operation should occur with an suitable margin before entry TTL elapses, to afford a couple retry attempts in case of temporary connectivity error.
An interesting case to consider is a split brain scenario that affects Akka cluster but not etcd cluster. In such situation two or more Akka cluster masters would contend to update the leader entry in etcd, which can be detected by using appropriate compareAndSet
assertion: when a leader node notices that contents of the leader entry changed into a "foreign" address during the refresh cycle, it can tell that something fishy is going on. The "disposessed leader" should continue attempts to reclaim the entry. When the operator eventually kills of the extra clusters, leader of the surviving cluster will be able to restore the disvoery entries to correct state.
Ref #15
The build.sbt miss the organization key. It makes it complex to publish on a maven repo.
Etcd-client was extracted to separate project https://github.com/maciej/etcd-client that is now published to Maven Central: me.maciejb.etcd-client
A few cleanups are in order:
Currently FlowBreaker
uses Flow.takeWhile
which will terminate the stream when (if) another input element becomes available. An alternative implementation is possible, using akka.stream.stage.AsyncStage
that would trigger stream termination immediately after Cancellable.cancel()
were called.
Stages are documented in http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-customize.html
Digging into akka.stream.impl.fusing.MapAsync
implemention would probalby be a good start.
I saw ⇒
is all around in the source code. Although it is ok to use ⇒
instead of =>
, I think it is better to use =>
instead as it is ASCII char. I can submit a PR for the changes if @rkrzewski agree.
Since Travis CI appears to fit the bill, lets give it a try.
closes #18
Ref #15
I'm wondering if the customer headers that etcd sends should be captured somewhere by the client.
X-Etcd-Cluster-Id: 7e27652122e8b2ae
X-Etcd-Index: 2007
X-Raft-Index: 2615
X-Raft-Term: 2
@rkrzewski what do you think?
There are API changes coming up, so it may turn out a non-trivial task.
Scaladoc formatting is different to JavaDoc.
A multiline ScalaDoc comment should look like this:
/**
* Some doc string.
*/
note the indentation after the first line.
Currently if writing the entry fails and subsequently the node in question is removed before etcd operation retry time elapses, write will be retried until successful, resulting in a useless seed entry.
Akka 2.4.1 has two MemberEvent
: MemberJoined
and MemberLeft
that have no indication in the demo application UI. MemberExited
that as already present before had no indication either.
Since the status changes may come in quick succession $mdToast
notifications are probably preferable to adding more icons to node tile.
[warn] [NOT FOUND ] com.typesafe.akka#akka-stream-testkit-experimental_2.11;2.0-M2!akka-stream-testkit-experimental_2.11.jar (0ms)
[warn] ==== jcenter: tried
[warn] https://jcenter.bintray.com/com/typesafe/akka/akka-stream-testkit-experimental_2.11/2.0-M2/akka-stream-testkit-experimental_2.11-2.0-M2.jar
The file is actually missing from bintray but:
Probably related issue reported against sbt: sbt/sbt#2217
Currently a new connection is opened for each etcd request.
Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely.
Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq())
does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.
Sometimes the first test in the suite times out. I've seen it a couple of times in local sbt builds, never happened when running tests from ScalaTest Eclipse plugin. Now Travis has seen it too: https://travis-ci.org/rkrzewski/akka-cluster-etcd/builds/85226165
HTTP redirects are going out (see #7) but I'd like to check if this happens because this is the first test executed, while JVM is still warming up and the timeout margins are too tight, or if there's a bigger issue lurking there
Requirements:
The build does not define the organization setting. It makes it publishing to a maven repo quite hard.
Low level errors like rerused connections, malformed HTTP reponses or JSON payloads cause the Futures returned by EtcdClient
to fail with various unrelated exceptions. This complicates error handling on client side, because API-level errors (repsesented as EtcdException
/ EtcdError
) and low level error need separate logic paths.
Errors to consider:
The readme.md only corresponds to the very first version.
The current docker-compose.yml renamed monitor to frontend and only allows a single instance.
Add docker-compose.yml
describing docker container dependencies (currently only a single version of etcd) for running tests.
Currently we'll to read etcd message from empty response which will fail and in turn complete the wait source with an error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.