cloudflare / tableflip Goto Github PK
View Code? Open in Web Editor NEWGraceful process restarts in Go
License: BSD 3-Clause "New" or "Revised" License
Graceful process restarts in Go
License: BSD 3-Clause "New" or "Revised" License
did you mind to add example for grpc server upgrade?
I can catch syscall.SIGINT when I call upgrade(), but I can't catch syscall.SIGINT when upgrade() is called. What if I restart and shut down gracefully?
func main() {
upg, _ := tableflip.New(tableflip.Options{})
defer upg.Stop()
go func() {
sig := make(chan os.Signal, 1)
//
signal.Notify(sig, syscall.SIGHUP, os.Interrupt, syscall.SIGTERM)
for ch := range sig {
switch ch {
case syscall.SIGHUP:
err := upg.Upgrade()
if err != nil {
log.Fatal(err)
}
default:
upg.Interrupt()
}
}
}()
ln, err := upg.Listen("tcp", ":8080")
if err != nil {
log.Fatalln("Can't listen:", err)
}
defer func(ln net.Listener) {
_ = ln.Close()
}(ln)
count := 0
router := gin.Default()
router.GET("/", func(c *gin.Context) {
time.Sleep(5 * time.Second)
count++
c.String(http.StatusOK, strconv.Itoa(count))
})
server := http.Server{
Handler: router,
}
go func() {
if err := server.Serve(ln); err != http.ErrServerClosed {
log.Fatal("listen: ", err)
}
}()
// Listen must be called before Ready
if err := upg.Ready(); err != nil {
log.Fatal(err)
}
<-upg.Exit()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
log.Fatal("Server forced to shutdown:", err)
}
println("going to shutdown")
}
We currently have an issue when switching from localhost:9236
to 0.0.0.0:9236
: https://gitlab.com/gitlab-org/gitaly/-/issues/2521
The solution is to enable SO_REUSEPORT
on linux so the upgrading process can bind to the same port if need be.
There are situations in which is useful to detect the first invocation,
i.e. you may want to cleanup dangling unix sockets, but not during an upgrade.
We already have WaitForParent
and it can be exploited to get this information, but it looks so hacky...
func isUpgrade(u *tableflip.Upgrader) bool {
ctx, cancel := context.WithCancel(context.Background())
// we use a canceled context because WaitForParent returns immediately, without errors, only on the parent process
// an already expired context ensure us an immediate failure also inside the children process
cancel()
return u.WaitForParent(ctx) != nil
}
This can be easily implemented in Upgrader
with 1 line of code + tests.
Before anything, I'd like to thank you for this amazing repo.
I just want to say it's good to mention this fact (in readme.md) that reloading a systemd
service will not update service environment vars.
Having such a systemd unit file:
[Unit]
Description=Service using tableflip
[Service]
EnvironmentFile=/path/to/config-file
ExecStart=/path/to/binary -some-flag /path/to/pid-file
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/path/to/pid-file
By updating the config-file content and executing systemctl reload service
, the reloaded service will not get new/updated environment vars. The service should read its configs/envs itself.
I've been looking for some way to do this in Go for awhile. I also enjoyed reading the blog post and links to how other companies have dealt with this problem. Keep up the great work!
Hello,
Do you know if the lib would work on Windows ? with GOOS=windows ?
I think tableflip is a great tool. I am experimenting the tool with a TCP server which handles long-lived TCP connections. When an upgrade is started, the old process needs to wait until all the existing TCP connections are done before the old process can exit. It can take a long time to drain the long-lived connections. As a result, a further upgrade is blocked until the old process exits. I wonder if we can have a way to relax this constraint?
After deep review of this great library, I come with a problem on inherited connections, let's me explain with a example.
I figure that in golang, to access the file descriptor of a net.Conn or viceversa a fd dup is in place, and has it's explanation, This library has 2 scenarios:
The problems seems to be fds.go:300 replace f.used[key] = file by file.Close(), but PacketConn seems to have the same problem. For large number of connections, the duplication of unused file descriptors, can be a problem too.
The sample code, that listen on 8080, and writes to the socket every second, and closes it after 30s. If the connection is fresh the connection is closed and client receives TCP close, but if inherited the connection will hang.
package main
import (
"flag"
"fmt"
"log"
"net"
"os"
"os/signal"
"syscall"
"time"
"github.com/cloudflare/tableflip"
)
var stop = make(chan bool)
var done = make(chan bool)
func handleConn(conn net.Conn, upg *tableflip.Upgrader) {
ticker := time.NewTicker(time.Second)
timer := time.NewTimer(30 * time.Second)
for {
select {
case <-stop:
log.Printf("Updating...")
ticker.Stop()
timer.Stop()
c := conn.(tableflip.Conn)
upg.Fds.AddConn("tcp", "0", c)
conn.Close()
log.Printf("Done...")
done <- true
return
case t := <-ticker.C:
log.Printf("Tick: %+v", t)
conn.SetDeadline(time.Now().Add(time.Second))
conn.Write([]byte(fmt.Sprintf("It is not a mistake to think you can solve any major problems just with potatoes. [%d]\n", os.Getpid())))
case t := <-timer.C:
log.Printf("Clossing: %+v", t)
ticker.Stop()
timer.Stop()
conn.Close()
log.Printf("Closed conn")
return
}
}
}
func main() {
var (
listenAddr = flag.String("listen", "localhost:8080", "`Address` to listen on")
pidFile = flag.String("pid-file", "", "`Path` to pid file")
)
flag.Parse()
log.SetPrefix(fmt.Sprintf("%d ", os.Getpid()))
upg, err := tableflip.New(tableflip.Options{
PIDFile: *pidFile,
})
if err != nil {
panic(err)
}
defer upg.Stop()
// Do an upgrade on SIGHUP
go func() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGHUP)
for range sig {
stop <- true
log.Println("stopping service")
<-done
err := upg.Upgrade()
if err != nil {
log.Println("upgrade failed:", err)
}
}
}()
conn, err := upg.Fds.Conn("tcp", "0")
if err != nil {
log.Fatalln("Can't get conn:", err)
}
if conn != nil {
log.Printf("Inherited conn: %+v", conn.RemoteAddr())
go handleConn(conn, upg)
}
ln, err := upg.Fds.Listen("tcp", *listenAddr)
if err != nil {
log.Fatalln("Can't listen:", err)
}
go func() {
defer ln.Close()
log.Printf("listening on %s", ln.Addr())
for {
c, err := ln.Accept()
if err != nil {
log.Printf("Error on Accept: %+v", err)
return
}
go handleConn(c, upg)
}
}()
log.Printf("ready")
if err := upg.Ready(); err != nil {
panic(err)
}
<-upg.Exit()
log.Printf("exiting, done, :)")
}
I think the key of using tableflip is to replace net.Listen
with upg.Fds.Listen
, so that when Upgrade
the listening socket will be inherited by child.
But I've got errors like below when Upgrade
after the application has run for quite long in production environment.
{"level":"error","msg":"ListenAndServe err can't create new listener: listen tcp 0.0.0.0:8902: bind: address already in use","time":"2019-05-31T23:05:14+08:00"}
It seems that the parent has opened a listener on port 8902
but the child doesn't inherit that listener.
Any possible reason?
The Upgrade function is documented as:
Upgrade triggers an upgrade.
It also waits for response and returns error if child process fails, which might be good to spell out in the description since the verb 'trigger' gives an impression of a function call that will complete immediately
First off: I'm loving tableflip, I use it to livereload a dev server for my homegrown static site generator server https://github.com/jschaf/b2/blob/master/cmd/server/server.go#L121.
This is more of question than an issue. If this isn't a good spot for it, I'm good with closing it.
After a successful tableflip.Upgrade
, I'd like the new process to be the foreground process in a terminal or shell. The reason I want this is so I can forward SIGINT via ctrl-c in the terminal. Using systemd is a bit heavyweight for my simple local development use-case.
go run ./cmd/server
.systemd --user
in my case).In step 3, the new server should keep running in the foreground and continue writing stdout and stderr of the new process.
I'm not quite sure how to go about this. Would something like the following work?
tableflip.Upgrade
Alternately, maybe get the new PID from the PID file and do some exec magic?
Several people have asked how to integrate with http://supervisord.org/ It would be nice to have an example for this.
This is a bit of a stretch, but is it possible to somehow refresh the environment in between SIGHUP
s? It seems like you'd need to pass a config file in, but I'm wondering if there's some magic to potentially refresh os.Environ()
in between reloads.
For context, I'm looking to setup something like heroku where you have heroku config:set key value
, that would restart the app with a new environment, but it doesn't seem like there's anyway to do that without some special programming from the binary itself (e.g. load from this file)
I'm running a web process on FreeBSD. Using curl to connect over http2 when I call SIGHUP on the process it shuts down and drops any connections without waiting for them to complete.
If I do the same but use the curl flag '--http1.1' the running connections complete before shutdown is called.
Any idea why http2 would not wait for the connections to complete? While http1.1 connections would?
Thank you,
Hello, is https not supported?
go func(upg *tableflip.Upgrader) {
for {
select {
case <-upg.Exit():
fmt.Println("Exit111111111111111111111111111")
break
}
}
}(upg)
in this case , upg.Exit()
not triggered ?
package main
import (
"fmt"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
"github.com/cloudflare/tableflip"
)
// 當前程序的版本
const version = "v0.0.1"
func main() {
upg, err := tableflip.New(tableflip.Options{})
if err != nil {
panic(err)
}
defer upg.Stop()
// 爲了演示方便,爲程序啓動強行加入 1s 的延時,並在日誌中附上進程 pid
time.Sleep(time.Second)
log.SetPrefix(fmt.Sprintf("[PID: %d] ", os.Getpid()))
// 監聽系統的 SIGHUP 信號,以此信號觸發進程重啓
go func() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGHUP)
for range sig {
// 核心的 Upgrade 調用
err := upg.Upgrade()
if err != nil {
log.Println("Upgrade failed:", err)
}
}
}()
// 注意必須使用 upg.Listen 對端口進行監聽
ln, err := upg.Listen("tcp", ":8080")
if err != nil {
log.Fatalln("Can't listen:", err)
}
// 創建一個簡單的 http server,/version 返回當前的程序版本
mux := http.NewServeMux()
mux.HandleFunc("/version", func(rw http.ResponseWriter, r *http.Request) {
log.Println(version)
rw.Write([]byte(version + "\n"))
})
server := http.Server{
Handler: mux,
}
// 照常啓動 http server
go func() {
err := server.Serve(ln)
if err != http.ErrServerClosed {
log.Println("HTTP server:", err)
}
}()
if err := upg.Ready(); err != nil {
panic(err)
}
go func(upg *tableflip.Upgrader) {
for {
select {
case <-upg.Exit():
fmt.Println("Exit111111111111111111111111111")
break
}
}
}(upg)
time.Sleep(10 * time.Hour)
//<-upg.Exit()
}
Hello everyone!
Is there a way to share all the active connections between processes?
I see only one way, use Fds.Conn()
method, but i must know addr
in parent process for this.
It may be worth adding methods that will return all parent connections and listeners?
It would be amazing.
I did not use Go a lot, but I found that tests are failing for go 1.8 and go 1.10.x.
go 1.8: https://travis-ci.com/GabLeRoux/tableflip/jobs/153029874
Using Go 1.5 Vendoring, not checking for Godeps
4.57s$ go get -t -v ./...
github.com/pkg/errors (download)
github.com/cloudflare/tableflip (download)
github.com/pkg/errors
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed. Retrying, 2 of 3.
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed. Retrying, 3 of 3.
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed 3 times.
The command "go get -t -v ./..." failed and exited with 2 during .
go 1.10.x: https://travis-ci.com/GabLeRoux/tableflip/jobs/153029876
--- FAIL: TestFdsListen (0.00s)
fds_test.go:22: can't create new listener: listen unixgram : unknown network unixgram
I found this as part of #9 ✌️ I think it should be noted to the readme that this is only compatible with a few go versions.
can you give a example for unix socket?
Hi there, I'm wondering if this library can work alongside systemd's socket activation?
What I'm trying to do is basically use systemd's root access to listen on port 80 and pass that file descriptor into my Go application so my Go app doesn't have to use sudo.
Is this possible with tableflip? Does tableflip even make sense in this context? My thinking is that tableflip is would still be useful for doing the upgrade attempts.
Any feedback here would be greatly appreciated. Thanks!
supervisord requires non-daemonized processes, is there a way to work out?
Tableflip resolve the graceful process restart problem. However, how to integrate with systemd and supervisor?
Ubuntu Xenial (16.04) is the OS of choice, the command run is service <name> restart
. The systemd stops and starts the process on restart.
[Service]
User=www-data
Group=www-data
Type=simple
Restart=on-failure
RestartSec=5s
RuntimeDirectory=MA
ExecStart=/path/to/executable serve --port=4001 --pid-file=/var/run/MA/foo.pid
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/MA/foo.pid
After the trial and error the only way to do seamples upgrade is to also add ExecReload=/bin/kill -HUP $MAINPID
and then issue service <name> reload
.
Hi,
In an application I use cloudflare/tableflip lib.
I've got an issue when I try to build for windows/386:
# github.com/cloudflare/tableflip
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/env.go:13:2: cannot use syscall.CloseOnExec (type func(syscall.Handle)) as type func(int) in field value
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:36: not enough arguments in call to syscall.Syscall
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:37: undefined: syscall.SYS_FCNTL
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:60: undefined: syscall.F_DUPFD_CLOEXEC
Thanks
https://github.com/jpillora/overseer Has similar features
wondering if anyone has tried both .
how can i use it for rpcx restart ?
There is an unfortunate interaction between tableflip
upgrades and restarting journald
, see systemd/systemd#13708
The work around is to log to /run/systemd/journald/socket using SOCK_DGRAM. We should document this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.