Coder Social home page Coder Social logo

Comments (6)

tarlano avatar tarlano commented on May 21, 2024

This SPOF behavior only occurs if I do not enable caching as a argument to my volume mount command.

Thus, the trade-off to enable caching seems to be HA with slower propagation of file changes vs. a SPOF and near synchronous file change propagation.

from infinit.

Dimrok avatar Dimrok commented on May 21, 2024

Hi @tarlano !

Sorry it took me so long to reply. I just asked @mefyl about the consensus, he's the expert.

However, I'm concerned with the last error (the handshake). When you say 3 devices cluster, how did you bootstrap every single node? Did you use import / export or pull / fetch? It's virtually impossible to have the id collision unless you manually copied your infinit's home.

from infinit.

mefyl avatar mefyl commented on May 21, 2024

Hi @tarlano. Here's my suspicion from what I see: your replication factor might be 1, in which case every block only has one copy, and shutting down the node that has the root block will indeed make the network unusable. Can you please check with infinit network export $NAME, looking for the replication-factor key ?

$ infinit network export infinit/company | jq '.consensus."replication-factor"'
3

Regarding the fact it works with caching, I suppose you're starting the volume, it works, then shut down the node with the root block, and it keeps working ? That would be normal, since the other nodes cached the root block, so the volume will be up until the cache is invalidated. If you start both other nodes without ever giving them access to the third with the root block, I predict it would not work.

But those are just guesses, let's first check that replication factor :)

from infinit.

tarlano avatar tarlano commented on May 21, 2024

Hi @Dimrok and @mefyl ,

Sorry for the late reply. I am not copying over infinit home and I am also not using the hub.

I currently have a 17 node cluster using kouncil and a caching setup.

The current caching setup is working with a replication value of 3, but since I would like the changes to be more synchronous, I will configure another cluster to use kelips and no caching, later on today to get the information you asked for.

Here are the details of my current caching cluster.

I am deploying four exported files via a deployment RPM. The exported files are for the user, silo, network, and volume respectively. Then I import the four files with a post install bash script in the RPM.

All devices are client/server devices. And all devices have the same user that is an admin with read+write to get around any passport issues.

#!/bin/bash

# export INFINIT_HOME and start from a clean slate
#

export INFINIT_HOME="/usr/share/infinit"
rm -rf $INFINIT_HOME
mkdir -p $INFINIT_HOME

# configure fuse
#

/bin/cp -f /usr/share/fuse/fuse.conf /etc/fuse.conf

# create the default infinit user
#

infinit user import --input /etc/infinit.d/infinit.user

# create the default 40GB infinit storage silo
#

infinit silo import --input /etc/infinit.d/infinit.silo

# create the default infinit network
#

infinit network import --as infinit-user --input /etc/infinit.d/infinit.network
infinit network link --as infinit-user --name infinit-network --storage infinit-silo

# create the default infinit volume
#

infinit volume import --as infinit-user --input /etc/infinit.d/infinit.volume

Here is the infinit.silo that I import on each device.

cat infinit.silo 
{"capacity":40000000000,"name":"infinit-silo","path":"/usr/share/infinit/.local/share/infinit/filesystem/blocks/infinit-silo","type":"filesystem"}

Here is the infinit.volume that I import on each device.

cat infinit.volume 
{"block_size":1024,"mount_options":{"fuse_options":["allow_other"]},"name":"infinit-user/infinit-volume","network":"infinit-user/infinit-network"}

After the deployment RPM is installed I start the node with the following config

environment=INFINIT_HOME="/usr/share/infinit",INFINIT_RDV=""
command=/opt/infinit/bin/infinit volume mount --as infinit-user --name infinit-volume --mountpoint /opt/ecs --allow-root-creation --port 13000 --cache --endpoints-file /etc/infinit.d/templates/endpoints --peer /etc/infinit.d/templates/peeraddresses.conf --port-file /etc/registrar/listenport

Note the peer file (peeraddresses.conf), from the --peer argument, is generated using consul template and contains all the IP addresses of all the other devices/peers in the network cluster. Here is an example with my IP's obfuscated.

cat /etc/infinit.d/templates/peeraddresses.conf                                                                                                                                      
                                                                                                                                                                                     
xxx.xxx.118.72:13000                                                                                                                                                                 
xxx.xxx.118.73:13000                                                                                                                                                                 
xxx.xxx.118.85:13000                                                                                                                                                                 
xxx.xxx.118.86:13000                                                                                                                                                                 
xxx.xxx.118.87:13000                                                                                                                                                                 
xxx.xxx.117.220:13000                                                                                                                                                                
xxx.xxx.117.222:13000                                                                                                                                                                
xxx.xxx.117.223:13000                                                                                                                                                                
xxx.xxx.68.223:13000                                                                                                                                                                 
xxx.xxx.68.224:13000                                                                                                                                                                 
xxx.xxx.68.225:13000                                                                                                                                                                 
xxx.xxx.68.226:13000                                                                                                                                                                 
xxx.xxx.68.217:13000                                                                                                                                                                 
xxx.xxx.68.218:13000                                                                                                                                                                 
xxx.xxx.68.219:13000                                                                                                                                                                 
xxx.xxx.68.220:13000                                                                                                                                                                 
xxx.xxx.68.221:13000        

The endpoints file (endpoints) is just the local devices endpoints.

Here is the output of the export of one of the running devices network. As you can see the replication-factor is 3. I have also obfuscated the keys.

infinit network export --as infinit-user --name infinit-network | python -m json.tool
{
    "admin_keys": {
        "group_r": [],
        "group_w": [],
        "r": [],
        "w": [
            {
                "rsa": "MII..<SNIP>"
            }
        ]
    },
    "consensus": {
        "eviction-delay": "10min",
        "replication-factor": 3,
        "type": "paxos"
    },
    "encrypt_options": {
        "encrypt_at_rest": true,
        "encrypt_rpc": true,
        "validate_signatures": true
    },
    "name": "infinit-user/infinit-network",
    "overlay": {
        "eviction_delay": "200min",
        "rpc_protocol": "tcp",
        "type": "kouncil"
    },
    "owner": {
        "rsa": "MII....<SNIP>"
    },
    "peers": [],
    "version": "0.8.0"
}

Thanks for your help!
Tony

from infinit.

tarlano avatar tarlano commented on May 21, 2024

@Dimrok and @mefyl ,

I want to update you both.

I didn't have the chance to switch the configuration, but even with the caching config I wrote about previously, I am still seeing the following error message in the error log for the infinit volume mount .... command detailed in my last reply.

[infinit.model.doughnut.Dock ] [dht::Dock::Connection(0x44a4700, 0x5c07897400 -> 0x45ad740200): tcp://172.17.0.1:13000] key exchange failed with 0x45ad74027ea7fdc02749965925f7685585fc152996d5000598e46437f3faca00: Handshake failed: incoming peer has same id as us: 0x5c0789748d5e58bd39aa288ea159a41b925a0bdc55d3d5ec5b129d957ac0b500

The interesting thing about 172.17.0.1 is that it's the bridge IP for the docker bridge interface. I will try to exclude this interface to see if that has any effect.

Have you seen anything along these lines in the past?

Tony

from infinit.

tarlano avatar tarlano commented on May 21, 2024

@Dimrok and @mefyl,

Can you also provide some insight into whether the use of my general infinit-user could be the issue?

As I said, reading and writing files across the filesystem on the networks is completely functional, but maybe the issue is in the DHT/network layer?

Tony

from infinit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.