rkrzewski / akka-cluster-etcd Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 14.0 227 KB

Akka cluster management using etcd

License: Other

Scala 100.00%

akka-cluster-etcd's People

Contributors

Stargazers

Watchers

Forkers

secful odd wpc009 mingchuno vpopescu everpeace ekuaibao stuartsaltzman toonsevrin fly0wing deepsignals modelga reireirei blaizedsouza

akka-cluster-etcd's Issues

ActorSystem in `EitherJunctionSpec` is not terminated

ActorSystem in EitherJunctionSpec is not terminated.

To be honest, I don't know whether ActorSystems resources are finalised after when it's GCed. A quick Google search did not provide an answer for this question. @rkrzewski do you know the answer? :-)
Akka documentation recommends shutting down the system in tests in afterAll block.

Resource (think: threads) leakage is a problem when running tests interactively in SBT.

README.md: add _Getting started_ section

Get rid of HTTP redirect handling

Akka HTTP does not support redirects yet, akka/akka#15990
I've implemented a workaround (pl.caltha.akka.http.HttpRedirects) because I was under impression that this was necessary to talk to etcd cluster: when connecting to a follower node, requests that require consistency guarantees are redirected to current master node. However, it turned out that etcd since 2.0 handles master communication internally and consistency guarantees are in effect on all operations. HTTP Redirect replies are never returned to the client.

No matter how fun and instructive implementing HttpRedirects flow was, it needs to go.

Handle HTTP response timeouts

Idle timeouts are not yet implemented in Akka HTTP akka/akka#17732
Until then, a workaround using Flow.takeWithin should be impelemented.

Followers get stuck in election / seeds joining cycle

Example log from a follower node:

[INFO] [12/06/2015 10:31:51.875] [Main-akka.actor.default-dispatcher-3] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:31:52.031] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:31:52.163] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:31:53.102] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Welcome from [akka.tcp://[email protected]:2552]
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] seed nodes failed to respond in 60000 ms
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:32:52.381] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:32:52.404] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:32:52.419] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Trying to join seed nodes [akka.tcp://[email protected]:2552, akka.tcp://[email protected]:2552] when already part of a cluster, ignoring

Message joined the cluster suggests that seedsJoin timer should have been cancelled at ClusterDiscoveryActor#157 but apparently it still fires 60 seconds later.

Implement minimal demo application

The application should boot an empty cluster and keep running until JVM termination.
It should be packaged as a Docker container.
It should connect to etcd at host etcd that will be mapped to actual server using Docker links or cluster service discovery mechanism.

Ref #15

EtcdOperationActor improvements

EtcdOperationActor should not attempt to handle server reply timeout, this needs to be handled in EtcdClient itself (#6)
unlimited number of retries should be supported (in some cases the program cannot proceed until certain etcd operation is successfully executed)
EtcdClient failures should be handled
tests are missing

Decide on and document contribution policy.

Since there are a few people interested in contributing, lets make sure the everyone is on the same page regarding IP right issues.

There's an ASL 2.0 LICENSE file already, but to save ourselves potential trouble down the road, for example when donating the code as Akka contrib module, I propose do adopt http://developercertificate.org/ ceritified by adding Signed-off-by header to all commit messages.

I also propose introducing https://www.sociomantic.com/blog/2014/05/git-triangular-workflow/#.Vhwzzx94ukA and peer review policy for non trivial changes, however the latter depends if we can agree on certain level of commitment.

Something to discuss on the upcoming meeting this Thursday. If somebody is interested in joining in person or over Skype, let me know via e-mail.

Implement Docker Compose demo

Use FSM timers in SeedListActor

Currently retryMsg is implemented using ActorSystem scheduler. Using FSM timer will most likely be easier and more readable.

Limit the size of seed list

Currently SeedListActor will advertise all cluster members as seeds. This is not practical for large clusters.

Leader role handover

ClusterDiscoveryActor instance on node that lost leader election should subscribe to cluster events and in case it's host node was appointed to be the new leader of Akka cluster should transition to leader role: register itself as leader in etcd and update the seed list according to it's best knowledge. This will ensure that new nodes will be able to use discovery properly after the original leader leaves the cluster.

Bump Akka to 2.4?

Implement Mesos / Marathon demo

Use Akka extension auto loading feature when it's released

akka/akka#20332 is slated for release in Akka 2.4.5

Review `EtcdClient` API with respect to `Future` and `ExecutionContext` usage patterns.

EtcdClientImpl methods returning Future should also be hardened against throwing exceptions on the caller's thread (as opposed to completing the future with a failure)

Websocket connection is terminated after 1 minute of no activity

frontend_1 | [ERROR] [12/05/2015 23:00:23.491] [Main-akka.actor.default-dispatcher-20] [akka.tcp://[email protected]:2552/user/StreamSupervisor-3/flow-619-25-actorSubscriberSink] Incoming stream terminated with error
frontend_1 | java.util.concurrent.TimeoutException: No elements passed in the last 1 minute.
frontend_1 |    at akka.stream.impl.Timers$IdleBidi$$anon$4.onTimer(Timers.scala:145)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic.akka$stream$stage$TimerGraphStageLogic$$onInternalTimer(GraphStage.scala:939)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 |    at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:357)
frontend_1 |    at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
frontend_1 |    at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:292)
frontend_1 |    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
frontend_1 |    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
frontend_1 |    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
frontend_1 |    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
frontend_1 |    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

It seems that a stream of keepalive messages need to be merged into websocket handler ouput. There are two things that bother me though:

I can't find the place where inactivity time may be configured for a websocket handler
how could I correlate an exception like this with a particular flow instance if I had hundreds of them in an application?

Update to Akka 2.4.1

Implement kubernetes demo

Implement cluster monitor module

In order to make the demo applications more fun, we could implement a simple web application that would visualize current state of the cluster:

AngularJS + Angular Material frontend
Websocket communication
Akka HTTP backend
visualisation of node roles, current leader
killing arbitrary nodes from UI

The application could be deployed on all nodes with frontend role and accessed through LB mechanism appropriate for the cluster platform.

Implement an actor responsible for maintaining leader entry

Leader entry created during election has a finite TTL and needs to be refreshed periodically.

Refresh operation should occur with an suitable margin before entry TTL elapses, to afford a couple retry attempts in case of temporary connectivity error.

An interesting case to consider is a split brain scenario that affects Akka cluster but not etcd cluster. In such situation two or more Akka cluster masters would contend to update the leader entry in etcd, which can be detected by using appropriate compareAndSet assertion: when a leader node notices that contents of the leader entry changed into a "foreign" address during the refresh cycle, it can tell that something fishy is going on. The "disposessed leader" should continue attempts to reclaim the entry. When the operator eventually kills of the extra clusters, leader of the surviving cluster will be able to restore the disvoery entries to correct state.

#15 cluster monitor application frontend

Ref #15

Add Organization to the build.sbt as it can be published

The build.sbt miss the organization key. It makes it complex to publish on a maven repo.

Migrate to me.maciejb.etcd-client

Etcd-client was extracted to separate project https://github.com/maciej/etcd-client that is now published to Maven Central: me.maciejb.etcd-client
A few cleanups are in order:

add dependency to sbt project
remove etcd-client sources from this repository
update docs
migrate outstanding issues

Improve `FlowBreaker` implementation

Currently FlowBreaker uses Flow.takeWhile which will terminate the stream when (if) another input element becomes available. An alternative implementation is possible, using akka.stream.stage.AsyncStage that would trigger stream termination immediately after Cancellable.cancel() were called.

Stages are documented in http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-customize.html
Digging into akka.stream.impl.fusing.MapAsync implemention would probalby be a good start.

change ⇒ to =>

I saw ⇒ is all around in the source code. Although it is ok to use ⇒ instead of =>, I think it is better to use => instead as it is ASCII char. I can submit a PR for the changes if @rkrzewski agree.

Set up a build on Travis CI

Since Travis CI appears to fit the bill, lets give it a try.

closes #18

#15 cluster monitor application backend

node role configuration from envrionment
starting HTTP server on nodes with "frontend" role
message classes + marshalling
request routing with WebSocket handler
actor reporting cluster state (WS source)
actor handing node kill commands (WS sink)
passing kill commands to indicated cluster nodes
hard kill: terminate VM immediately
soft kill: leave cluster, terminate actor system, terminate VM

Ref #15

etcd-client does not capture etcd custom headers

I'm wondering if the customer headers that etcd sends should be captured somewhere by the client.

X-Etcd-Cluster-Id: 7e27652122e8b2ae
X-Etcd-Index: 2007
X-Raft-Index: 2615
X-Raft-Term: 2

@rkrzewski what do you think?

Update to Akka streams / HTTP 2.0 when released

There are API changes coming up, so it may turn out a non-trivial task.

Improve ScalaDoc formatting

Scaladoc formatting is different to JavaDoc.

A multiline ScalaDoc comment should look like this:

/**
  * Some doc string.
  */

note the indentation after the first line.

When node is removed before SeedListActor register it etcd, adding operation must be cancelled.

Currently if writing the entry fails and subsequently the node in question is removed before etcd operation retry time elapses, write will be retried until successful, resulting in a useless seed entry.

Indicate node's Joining / Up / Leaving / Exiting states

Akka 2.4.1 has two MemberEvent: MemberJoined and MemberLeft that have no indication in the demo application UI. MemberExited that as already present before had no indication either.

Since the status changes may come in quick succession $mdToast notifications are probably preferable to adding more icons to node tile.

Travis builds failing due to dependencies missing from jcenter.bintray.com

Build #43

[warn]  [NOT FOUND  ] com.typesafe.akka#akka-stream-testkit-experimental_2.11;2.0-M2!akka-stream-testkit-experimental_2.11.jar (0ms)
[warn] ==== jcenter: tried
[warn]   https://jcenter.bintray.com/com/typesafe/akka/akka-stream-testkit-experimental_2.11/2.0-M2/akka-stream-testkit-experimental_2.11-2.0-M2.jar

The file is actually missing from bintray but:

it is available in Maven Central where Akka releases are actually published
sbt build configuration for this project does not alter default resolvers which do not include bintray at all

Probably related issue reported against sbt: sbt/sbt#2217

Use Akka HTTP connection pooling

Currently a new connection is opened for each etcd request.

Retry fetching seed list when none of the seeds seem to be responding

Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely.
Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq()) does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.

HttpRedirectsSpec is flaky

Sometimes the first test in the suite times out. I've seen it a couple of times in local sbt builds, never happened when running tests from ScalaTest Eclipse plugin. Now Travis has seen it too: https://travis-ci.org/rkrzewski/akka-cluster-etcd/builds/85226165

HTTP redirects are going out (see #7) but I'd like to check if this happens because this is the first test executed, while JVM is still warming up and the timeout margins are too tight, or if there's a bigger issue lurking there

Figure out a CI solution

Requirements:

Free for OSS projects
Github checks integration to allow PR validation process
Docker support: we need to spawn two containers sharing a network interface: one with etcd and another with sbt running project tests

Add Organization to the build.sbt as it can be published

The build does not define the organization setting. It makes it publishing to a maven repo quite hard.

Reify transport errors as `EtcdError` values

Low level errors like rerused connections, malformed HTTP reponses or JSON payloads cause the Futures returned by EtcdClient to fail with various unrelated exceptions. This complicates error handling on client side, because API-level errors (repsesented as EtcdException / EtcdError) and low level error need separate logic paths.

Errors to consider:

connection refused
connection reset (fails with ArrayIndexOutOfBoundsException deep in Akka streams code, might be worth asking/reporting upstream)
connection timeout
response timeout (Akka HTTP does not handle response timeout yet, AFAIK)
malformed HTTP response (rare, but someone might mistakenly connect to some random TCP service instead of etcd)
malformed JSON payload

rkrzewski / akka-cluster-etcd Goto Github PK

akka-cluster-etcd's People

Contributors

Stargazers

Watchers

Forkers

akka-cluster-etcd's Issues

Recommend Projects

Recommend Topics

Recommend Org