Coder Social home page Coder Social logo

Comments (8)

kradalby avatar kradalby commented on August 16, 2024 1

t2.medium sounds a bit optimistic, its unclear if its too small for the headscale, or for the test clients:

The error mentioned would mean one or more of:

  • The node has gone away and its not taking the update
  • The node is reconnecting and the update is being sent to the "closed" version
  • The node did not accept the message fast enough

The problem here might be either that the Headscale machine does not have enough resources to maintain all of the connections, or the VMs running 100s of client does not have enough resources to run them all.

The machine used in #1656 is significantly larger, its probably a bit overspecced with the new alpha.
Have you tried the same with 0.22.3 (latest stable)? It is a lot more inefficient so might struggle more on a t2.medium.

from headscale.

jwischka avatar jwischka commented on August 16, 2024 1

Another important question is whether you are running sqlite or postgres. If sqlite try enabling wal, or switching to postgres. Sounds like it could be a concurrency issue.

from headscale.

dustinblackman avatar dustinblackman commented on August 16, 2024 1

Using Postgres I'm experiencing the same issue here using alpha 12 in a network of ~30 nodes, with a handful of ephemeral nodes coming in and out through the day. I've seen both regular users on laptops, and machines in the cloud be able to connect to Headscale, but then not be able to reach any other node in the network. Headscale outputs the same errors at stated at the beginning of the issue, though while digging through the new map session logic I'm unsure if the error and the issue is related. If I were to guess something is hanging in

case update, ok := <-m.ch:

I had the problem with a laptop connecting to a remote machine, so I had ran tailscale down && tailscale up on the remote machine, and it then fixed the problem. I'm betting there is an issue with connection recovery in the notifier, either to the node or database. I'll dig through logs later in the evening.

from headscale.

kradalby avatar kradalby commented on August 16, 2024

did you verify that there was a problem with the connections between nodes, or are you saying that you do not expect any errors?

from headscale.

nadongjun avatar nadongjun commented on August 16, 2024

did you verify that there was a problem with the connections between nodes, or are you saying that you do not expect any errors?

I verified that there are two issues in the latest version:

(1) When 600 users join a single Headscale server, the error "ERR update not sent, context cancelled..." occurs in Headscale.

(2) Some of the joined 600 users are in an offline status when checked with headscale node list.

There are no issues with connections between users who are in an online status.

from headscale.

nadongjun avatar nadongjun commented on August 16, 2024

I am currently using sqlite(without wal option). I will rerun the same tests on a higher performance instance using postgres.

from headscale.

kradalby avatar kradalby commented on August 16, 2024

Please try with WAL first.

from headscale.

kradalby avatar kradalby commented on August 16, 2024

WAL on by default for SQLite is coming in #1985.

I will close this issue as it is more of a performance/scaling thing than a bug. We have a couple of hidden tuning options, which together with WAL might be good content for a "performance" or "scaling" guide in the future.

from headscale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.