Coder Social home page Coder Social logo

galaxy's People

Contributors

dafnap avatar hsestupin avatar pron avatar voidstarstar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

galaxy's Issues

set without getx in transaction does't work

The scenario I tried is this.
begin transaction

  1. root = getRoot
  2. set(root)
  3. commit

then from other node:
4.root = getRoot
5. get(root) ==> TIMEOUT

when I added before 2:
1.5: getX(root) everything worked fine.

In the logs I saw that the getX got to the first node, but handled moved to the PendingMessages probably by shouldHoldMessage. I think that the reason is that it was flagged as MODIFIED.

Upgrade to ConcurrentLinkedHashMap v1.4

The latest version has significant improvements when judged by a synthetic benchmark. Previous versions were optimized around real-world usage only, so the performance is likely comparable (e.g. as found by Cassandra). This release focused on synthetic workloads (benchmarks) and, depending on the application, may offer a real-world improvement.

ZooKeeper may think a node is dead before the node realizes

If ZK thinks a node is dead, the information may not spread to the cluster at once. The server may think the node has died and start serving its items to other nodes who've noticed the death, while the node itself thinks it's still alive and continue serving nodes that also think it's alive.

ConcurrentLinkedHashMap --> Caffeine when Java-8 based

When transitioning to requiring Java 8, please upgrade to Caffeine. You know the details, so I won't bore you with the elevator pitch.

ConcurrentLinkedHashMap changes will continue to be minimal, even more so now, and driven by requests from Java 6 users unable to upgrade. Caffeine is ideally the upgrade path for Guava cache users too, which due to Android cannot be significantly modified.

Consistency broken when stale reads are allowed after a new PUT

If node A owns items X and Y, and node B shares X but not Y, and then A modifies X and Y (resulting in INV X sent to B), and then ownership of Y transfers to node C, and then B gets (multicast) Y (from C), it will read Y's new value but still allow stale reads on X.

Solution: a PUT to a line whose node was previously unknown must purge all Is.

queue doesn't flush before cluster node is shutdown

when I sent 100 msgs from 100 parallel actors and then exit, the last msgs is not received by the receiver. I know that some of the msgs where postponed because the queue was full.
After I sleeped 500 ms, all the msgs were received. I guess that when the node shutdown the postponed msgs have not been sent

ZooKeeperDistributedTree#create fails sometimes

Method create(String node, boolean ephemeral) is not thread safe. Pattern

if (exists(path)) {
  return;
}
createNode(path);

doesn't work in distributed environment. To make this pattern thread-safe we have to catch and ignore KeeperException$NodeExistsException while creating a node.

AssertionError in nodeUpdated

 [peer2] 09:49:31.972 [RemoteCall: RemoteCall{org.gridkit.zerormi.RmiGateway$CounterAgent#remoteCall(Callable)}.4-EventThread]           zookeeper.ClientCnxn [ERROR] {} Error while calling watcher  java.lang.AssertionError
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeUpdated(DistributedBranchHelper.java:179)
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$200(DistributedBranchHelper.java:37)
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildUpdated(DistributedBranchHelper.java:69)
    [peer2]     at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$MyWatcher$1.process(ZooKeeperDistributedTree.java:332)
    [peer2]     at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61)
    [peer2]     at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
    [peer2]     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
    [peer2] 

Server-less configuration not fault-tolerant

The problem is that during item migration (getx-putx), the old owner needs to both transfer ownership to the requesting node (putx), as well as notify its slave (inv) that it is no longer the owner (of the item). If one of them (slave or peer) do not get the message and the old owner fails, the item may be lost (both peer and slave think they're not the owner) or become conflicted (both think they're the owner).

I suspect that doing this correctly would require a consensus protocol, which might make the whole thing not worthwhile. The only hope is that we could somehow take advantage of the fact that we know that the old owner has failed, i.e. that the node-switch event is received by everyone (we have consensus over that provided by ZooKeeper/JGroups).

exception on ip_port property

9:16:39.270 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [INFO   ] New node added: node-0000000004 
java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1342)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:432)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
19:16:39.274 #11    co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [INFO   ] Adding listener NODE node-0000000004 id: -1 on /co.paralleluniverse.galaxy/nodes/node-0000000004 
19:16:39.768 #11    co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [SEVERE ] Node /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port getData has failed! 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
    at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)

19:16:39.775 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [SEVERE ] Exception while reading control tree value. 
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at com.google.common.base.Throwables.propagate(Throwables.java:156)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:201)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
    at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
    ... 20 more

19:16:39.780 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [INFO   ] nodes: {node-0000000004=NODE node-0000000004 id: 1 ip_addr: /10.197.82.95 ip_slave_port: 8051, node-0000000000=NODE node-0000000000 id: 0 ip_addr: /10.197.57.247 

ack on send

The ack is sent when the cache gets the message and not after the call to the listener.
If the receiver is not fast enough there is no way to do backPressure this way.
The ack should be sent after the listener is finished successfully, and sometimes when the listener is async we should ack only upon the listener application explictlty ACKs that the message handling finished.

Question about message size

This is probably a stupid question with a simple answer, but I can't seem to figure out which knob to turn.

How do you send large messages without crashing? I either get a Channel exception: java.io.IOException Message too long error for the example below or the process seems to just hang for even larger sizes.

To reproduce the bug is simple. Using the distributed ping pong example, simply add a new field to the Message class with a large enough size:

final byte[] bar = new byte[102400];

I am using JGroups in case that is relevant.

Thanks!

Support node recovery after ZooKeeper SESSION_EXPIRED event

Hey, If one of the registered galaxy nodes receives SESSION_EXPIRED event from ZooKeeper client than all ephemeral info about this node will be deleted from ZooKeeper cluster. And moreover it breaks consistency of Galaxy cluster cause that particular node will think I am alive but other nodes will be pretty sure it's dead. This situation can be achieved quite easily if we set sessionTimeoutMs to small value like 500 ms. Anyway there should be a valid fail recovery strategy in case of ZooKeeper session expiration.

For more theoretical info about ZooKeeper internal you could read this https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A3

If you can give any advices where I should look forward in order to fix this issue may be I will try to do that. It seems like recreating all ephemeral nodes will be enough as a simpliest solution.

java.lang.IllegalArgumentException when using BerkeleyDB

java.lang.IllegalArgumentException: Data field for DatabaseEntry key cannot be null

com.sleepycat.je.utilint.DatabaseUtil.checkForNullDbt(DatabaseUtil.java:58)
com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1723)
co.paralleluniverse.galaxy.berkeleydb.BerkeleyDB.findAllocation(BerkeleyDB.java:298)
co.paralleluniverse.galaxy.core.MainMemory.handleMessageGet(MainMemory.java:164)
co.paralleluniverse.galaxy.core.MainMemory.receive(MainMemory.java:111)
co.paralleluniverse.galaxy.netty.TcpServerServerComm.receive(TcpServerServerComm.java:114)
co.paralleluniverse.galaxy.netty.AbstractTcpServer$4.messageReceived(AbstractTcpServer.java:137)
co.paralleluniverse.galaxy.netty.ChannelMessageNodeResolver.messageReceived(ChannelMessageNodeResolver.java:36)
co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:63)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

system.exit(0) doesn't exit

shutdownHooks prevent the exit:

2013-07-17 16:36:17

"Thread-2" - Thread t@11
   java.lang.Thread.State: TIMED_WAITING
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <1b8cfa0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1433)
    at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103)
    at org.jboss.netty.channel.socket.oio.OioDatagramChannelFactory.releaseExternalResources(OioDatagramChannelFactory.java:107)
    at co.paralleluniverse.galaxy.netty.UDPComm.shutdown(UDPComm.java:367)
    at co.paralleluniverse.common.spring.Component.destroy(Component.java:78)
    at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:211)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:498)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:474)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:442)
    - locked <6e670e34> (a java.util.LinkedHashMap)
    at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1066)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1040)
    at org.springframework.context.support.AbstractApplicationContext$1.run(AbstractApplicationContext.java:958)

   Locked ownable synchronizers:
    - None

"main" - Thread t@1
   java.lang.Thread.State: WAITING
    at java.lang.Object.wait(Native Method)
    - waiting on <65778857> (a org.springframework.context.support.AbstractApplicationContext$1)
    at java.lang.Thread.join(Thread.java:1258)
    at java.lang.Thread.join(Thread.java:1332)
    at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
    at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
    at java.lang.Shutdown.runHooks(Shutdown.java:123)
    at java.lang.Shutdown.sequence(Shutdown.java:167)
    at java.lang.Shutdown.exit(Shutdown.java:212)
    - locked <58218a30> (a java.lang.Class)
    at java.lang.Runtime.exit(Runtime.java:107)
    at java.lang.System.exit(System.java:960)
    at co.paralleluniverse.galaxy.example.PeerTKB.run(PeerTKB.java:304)
    at co.paralleluniverse.galaxy.example.Peer2.main(Peer2.java:9)

   Locked ownable synchronizers:
    - None

Add distributed counter service

Simply delegate to ZooKeeper and JGroups implementations.
Both already have an internal counter for allocating ref-ids.

See ZooKeeperCluster.java and JGroupsCluster.Java

Sent messages are not always received

I have a setup where many (50-1000) messages are sent from one Quasar actor to another.

I receive the following error on the recipient end:

68325 [WARN] UDPComm: Exception caught in channel [id: 0x5c10b3ac, 0.0.0.0/0.0.0.0:7052]:  java.lang.RuntimeException java.io.EOFException
java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.MessagePacketCodec.decode(MessagePacketCodec.java:54) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:59) ~[galaxy-1.4.jar:1.4]
    at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [netty-3.9.2.Final.jar:?]
    at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [netty-3.9.2.Final.jar:?]
    at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [netty-3.9.2.Final.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45-internal]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45-internal]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45-internal]
Caused by: java.io.EOFException
    at co.paralleluniverse.common.io.ByteBufferInputStream.readFully(ByteBufferInputStream.java:94) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$MSG.readNoHeader(Message.java:1404) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$LineMessage.read1(Message.java:665) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$1.read(Message.java:471) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:51) ~[galaxy-1.4.jar:1.4]
    ... 11 more

If I send many messages using the ping pong example, I receive a similar error:

11:37:30.624 �[33m[udpCommReceiveExecutor-3]                  netty.UDPComm [WARN ]�[m {} Exception caught in channel [id: 0x26fa984c, 0.0.0.0/0.0.0.0:7052]:  java.lang.RuntimeException java.io.EOFException java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.624 �[36m[udpCommReceiveExecutor-3]                  netty.UDPComm [DEBUG]�[m {} Exception caught in channel java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.625 �[32m[udpCommReceiveExecutor-3]                  netty.UDPComm [INFO ]�[m {} Channel exception: java.lang.RuntimeException java.io.EOFException 
11:37:30.625 �[36m[udpCommReceiveExecutor-3]                  netty.UDPComm [DEBUG]�[m {} Channel exception java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)

There are also many other logging messages, but I do not know if they are related.

I can try to come up with a short example if necessary.

B+ Tree implementation in Galaxy

Saw the high scalability blog post on Galaxy and got intrigued. You discuss a B+ tree implementation in the posting and I was wondering whether that implementation actually exists or is just a thought experiment?
Would be great to build on top of this and connect it to the Titan graph database.

PS: Sorry for spamming your issue tracker - could not find a mailing list or forum or such.

Server slaves

Because we might want more than a 2x replication, we shouldn't use Galaxy's backup mechanism, but rely on the underlying database's replication. The slave nodes should just do nothing, and start serving data once they're master.

Allow more than one peer slave

This will require some consensus algorithm for the slaves, so it might be quite complicated. The problem is that when the master dies, the slaves may not all be in the same state, and once a new master is elected, its slaves may7 not be in sync with it.

Perhaps a better idea (and who would want 3x hardware anyways?) is to have a new type of node. Nodes that aren't assigned a node id, and simply monitor the cluster for failures. Once a master dies and a slave takes over, one (how is it chosen?) of these passive nodes will become a new slave (it will have to replicate all the old data - how much burden would that put on the new master?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.