Coder Social home page Coder Social logo

akka-cluster-etcd's People

Contributors

everpeace avatar maciej avatar mingchuno avatar odd avatar omersadika avatar rkrzewski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

akka-cluster-etcd's Issues

ActorSystem in `EitherJunctionSpec` is not terminated

ActorSystem in EitherJunctionSpec is not terminated.

To be honest, I don't know whether ActorSystems resources are finalised after when it's GCed. A quick Google search did not provide an answer for this question. @rkrzewski do you know the answer? :-)
Akka documentation recommends shutting down the system in tests in afterAll block.

Resource (think: threads) leakage is a problem when running tests interactively in SBT.

Get rid of HTTP redirect handling

Akka HTTP does not support redirects yet, akka/akka#15990
I've implemented a workaround (pl.caltha.akka.http.HttpRedirects) because I was under impression that this was necessary to talk to etcd cluster: when connecting to a follower node, requests that require consistency guarantees are redirected to current master node. However, it turned out that etcd since 2.0 handles master communication internally and consistency guarantees are in effect on all operations. HTTP Redirect replies are never returned to the client.

No matter how fun and instructive implementing HttpRedirects flow was, it needs to go.

Followers get stuck in election / seeds joining cycle

Example log from a follower node:

[INFO] [12/06/2015 10:31:51.875] [Main-akka.actor.default-dispatcher-3] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:31:52.031] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:31:52.163] [Main-akka.actor.default-dispatcher-4] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:31:53.102] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Welcome from [akka.tcp://[email protected]:2552]
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] seed nodes failed to respond in 60000 ms
[INFO] [12/06/2015 10:32:52.346] [Main-akka.actor.default-dispatcher-16] [akka.tcp://[email protected]:2552/user/$a] starting election
[INFO] [12/06/2015 10:32:52.381] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] assuming Follower role
[INFO] [12/06/2015 10:32:52.404] [Main-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:2552/user/$a] joined the cluster
[INFO] [12/06/2015 10:32:52.419] [Main-akka.actor.default-dispatcher-20] [akka.cluster.Cluster(akka://Main)] Cluster Node [akka.tcp://[email protected]:2552] - Trying to join seed nodes [akka.tcp://[email protected]:2552, akka.tcp://[email protected]:2552] when already part of a cluster, ignoring

Message joined the cluster suggests that seedsJoin timer should have been cancelled at ClusterDiscoveryActor#157 but apparently it still fires 60 seconds later.

Implement minimal demo application

The application should boot an empty cluster and keep running until JVM termination.
It should be packaged as a Docker container.
It should connect to etcd at host etcd that will be mapped to actual server using Docker links or cluster service discovery mechanism.

Ref #15

EtcdOperationActor improvements

  • EtcdOperationActor should not attempt to handle server reply timeout, this needs to be handled in EtcdClient itself (#6)
  • unlimited number of retries should be supported (in some cases the program cannot proceed until certain etcd operation is successfully executed)
  • EtcdClient failures should be handled
  • tests are missing

Decide on and document contribution policy.

Since there are a few people interested in contributing, lets make sure the everyone is on the same page regarding IP right issues.

There's an ASL 2.0 LICENSE file already, but to save ourselves potential trouble down the road, for example when donating the code as Akka contrib module, I propose do adopt http://developercertificate.org/ ceritified by adding Signed-off-by header to all commit messages.

I also propose introducing https://www.sociomantic.com/blog/2014/05/git-triangular-workflow/#.Vhwzzx94ukA and peer review policy for non trivial changes, however the latter depends if we can agree on certain level of commitment.

Something to discuss on the upcoming meeting this Thursday. If somebody is interested in joining in person or over Skype, let me know via e-mail.

Use FSM timers in SeedListActor

Currently retryMsg is implemented using ActorSystem scheduler. Using FSM timer will most likely be easier and more readable.

Limit the size of seed list

Currently SeedListActor will advertise all cluster members as seeds. This is not practical for large clusters.

Leader role handover

ClusterDiscoveryActor instance on node that lost leader election should subscribe to cluster events and in case it's host node was appointed to be the new leader of Akka cluster should transition to leader role: register itself as leader in etcd and update the seed list according to it's best knowledge. This will ensure that new nodes will be able to use discovery properly after the original leader leaves the cluster.

Websocket connection is terminated after 1 minute of no activity

frontend_1 | [ERROR] [12/05/2015 23:00:23.491] [Main-akka.actor.default-dispatcher-20] [akka.tcp://[email protected]:2552/user/StreamSupervisor-3/flow-619-25-actorSubscriberSink] Incoming stream terminated with error
frontend_1 | java.util.concurrent.TimeoutException: No elements passed in the last 1 minute.
frontend_1 |    at akka.stream.impl.Timers$IdleBidi$$anon$4.onTimer(Timers.scala:145)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic.akka$stream$stage$TimerGraphStageLogic$$onInternalTimer(GraphStage.scala:939)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 |    at akka.stream.stage.TimerGraphStageLogic$$anonfun$akka$stream$stage$TimerGraphStageLogic$$getTimerAsyncCallback$1.apply(GraphStage.scala:929)
frontend_1 |    at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:357)
frontend_1 |    at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
frontend_1 |    at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:292)
frontend_1 |    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
frontend_1 |    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
frontend_1 |    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
frontend_1 |    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
frontend_1 |    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
frontend_1 |    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

It seems that a stream of keepalive messages need to be merged into websocket handler ouput. There are two things that bother me though:

  • I can't find the place where inactivity time may be configured for a websocket handler
  • how could I correlate an exception like this with a particular flow instance if I had hundreds of them in an application?

Implement cluster monitor module

In order to make the demo applications more fun, we could implement a simple web application that would visualize current state of the cluster:

  • AngularJS + Angular Material frontend
  • Websocket communication
  • Akka HTTP backend
  • visualisation of node roles, current leader
  • killing arbitrary nodes from UI

The application could be deployed on all nodes with frontend role and accessed through LB mechanism appropriate for the cluster platform.

Implement an actor responsible for maintaining leader entry

Leader entry created during election has a finite TTL and needs to be refreshed periodically.

Refresh operation should occur with an suitable margin before entry TTL elapses, to afford a couple retry attempts in case of temporary connectivity error.

An interesting case to consider is a split brain scenario that affects Akka cluster but not etcd cluster. In such situation two or more Akka cluster masters would contend to update the leader entry in etcd, which can be detected by using appropriate compareAndSet assertion: when a leader node notices that contents of the leader entry changed into a "foreign" address during the refresh cycle, it can tell that something fishy is going on. The "disposessed leader" should continue attempts to reclaim the entry. When the operator eventually kills of the extra clusters, leader of the surviving cluster will be able to restore the disvoery entries to correct state.

#15 cluster monitor application frontend

  • sbt-web plugin configuration
  • webjar dependendencies
  • angularjs application bootstrap code
  • screen layout
  • communication service managing WebSocket connection
  • grid list controller
  • grid tile directive
  • node role indication
  • leader node indication
  • kill node controls
  • indication of currently connected fronted node

Ref #15

Improve `FlowBreaker` implementation

Currently FlowBreaker uses Flow.takeWhile which will terminate the stream when (if) another input element becomes available. An alternative implementation is possible, using akka.stream.stage.AsyncStage that would trigger stream termination immediately after Cancellable.cancel() were called.

Stages are documented in http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-customize.html
Digging into akka.stream.impl.fusing.MapAsync implemention would probalby be a good start.

change ⇒ to =>

I saw is all around in the source code. Although it is ok to use instead of =>, I think it is better to use => instead as it is ASCII char. I can submit a PR for the changes if @rkrzewski agree.

#15 cluster monitor application backend

  • node role configuration from envrionment
  • starting HTTP server on nodes with "frontend" role
  • message classes + marshalling
  • request routing with WebSocket handler
  • actor reporting cluster state (WS source)
  • actor handing node kill commands (WS sink)
  • passing kill commands to indicated cluster nodes
  • hard kill: terminate VM immediately
  • soft kill: leave cluster, terminate actor system, terminate VM

Ref #15

Improve ScalaDoc formatting

Scaladoc formatting is different to JavaDoc.

A multiline ScalaDoc comment should look like this:

/**
  * Some doc string.
  */

note the indentation after the first line.

Indicate node's Joining / Up / Leaving / Exiting states

Akka 2.4.1 has two MemberEvent: MemberJoined and MemberLeft that have no indication in the demo application UI. MemberExited that as already present before had no indication either.

Since the status changes may come in quick succession $mdToast notifications are probably preferable to adding more icons to node tile.

Travis builds failing due to dependencies missing from jcenter.bintray.com

Build #43

[warn]  [NOT FOUND  ] com.typesafe.akka#akka-stream-testkit-experimental_2.11;2.0-M2!akka-stream-testkit-experimental_2.11.jar (0ms)
[warn] ==== jcenter: tried
[warn]   https://jcenter.bintray.com/com/typesafe/akka/akka-stream-testkit-experimental_2.11/2.0-M2/akka-stream-testkit-experimental_2.11-2.0-M2.jar

The file is actually missing from bintray but:

  • it is available in Maven Central where Akka releases are actually published
  • sbt build configuration for this project does not alter default resolvers which do not include bintray at all

Probably related issue reported against sbt: sbt/sbt#2217

Retry fetching seed list when none of the seeds seem to be responding

Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely.
Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq()) does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.

HttpRedirectsSpec is flaky

Sometimes the first test in the suite times out. I've seen it a couple of times in local sbt builds, never happened when running tests from ScalaTest Eclipse plugin. Now Travis has seen it too: https://travis-ci.org/rkrzewski/akka-cluster-etcd/builds/85226165

HTTP redirects are going out (see #7) but I'd like to check if this happens because this is the first test executed, while JVM is still warming up and the timeout margins are too tight, or if there's a bigger issue lurking there

Figure out a CI solution

Requirements:

  • Free for OSS projects
  • Github checks integration to allow PR validation process
  • Docker support: we need to spawn two containers sharing a network interface: one with etcd and another with sbt running project tests

Reify transport errors as `EtcdError` values

Low level errors like rerused connections, malformed HTTP reponses or JSON payloads cause the Futures returned by EtcdClient to fail with various unrelated exceptions. This complicates error handling on client side, because API-level errors (repsesented as EtcdException / EtcdError) and low level error need separate logic paths.

Errors to consider:

  • connection refused
  • connection reset (fails with ArrayIndexOutOfBoundsException deep in Akka streams code, might be worth asking/reporting upstream)
  • connection timeout
  • response timeout (Akka HTTP does not handle response timeout yet, AFAIK)
  • malformed HTTP response (rare, but someone might mistakenly connect to some random TCP service instead of etcd)
  • malformed JSON payload

Add docker-compose.yml

Add docker-compose.yml describing docker container dependencies (currently only a single version of etcd) for running tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.