Coder Social home page Coder Social logo

Comments (8)

johscheuer avatar johscheuer commented on June 26, 2024

I did a bit more debugging and it seems like the error code is not returned by the create database method and instead of a nil/null pointer an empty struct is returned, logged the value of outdb before and after the fdb_create_database method was called.

Before fdb_create_database outdb: <nil>
After fdb_create_database outdb: &{{{}}} error 0

I have to make some changes to the code next week as I currently don't have any trace events as the code throws the error before creating the trace file here: https://github.com/apple/foundationdb/blob/release-7.1/fdbclient/NativeAPI.actor.cpp#L2213. It seems like the error is swallowed here: https://github.com/apple/foundationdb/blob/release-7.1/fdbclient/ThreadSafeTransaction.cpp#L145-L164

from foundationdb.

gm42 avatar gm42 commented on June 26, 2024

If you observed an issue and used DNS entries in the cluster file you might have hit: apple/foundationdb#11222. Especially in the case when the coordinators are not reachable.

Commenting in reply from FoundationDB/fdb-kubernetes-operator#1949 (comment)
The issue was happening also without the DNS names feature, and even when using a single-process in-memory FDB. It is fairly reproducible on my end; I mitigated it by adding logic to use a wait group to track all the C.fdb_run_network() goroutines created, and then in the Go finalizer for the client connection wait for such wait group after calling C.fdb_stop_network(), but I do not exclude that better/more elegant solutions can be found.

from foundationdb.

johscheuer avatar johscheuer commented on June 26, 2024

If you observed an issue and used DNS entries in the cluster file you might have hit: apple/foundationdb#11222. Especially in the case when the coordinators are not reachable.

Commenting in reply from FoundationDB/fdb-kubernetes-operator#1949 (comment) The issue was happening also without the DNS names feature, and even when using a single-process in-memory FDB. It is fairly reproducible on my end; I mitigated it by adding logic to use a wait group to track all the C.fdb_run_network() goroutines created, and then in the Go finalizer for the client connection wait for such wait group after calling C.fdb_stop_network(), but I do not exclude that better/more elegant solutions can be found.

Do you mind to share the code? From what I see fdb_stop_network is not actively called in the FBD go bindings.

from foundationdb.

gm42 avatar gm42 commented on June 26, 2024

Sure! that function fdb_stop_network is indeed not actively called from the Go binding, but perhaps a destructor on the C binding calls it? I cannot explain this, but since the issue was happening only in test suites (which are opening many FDB client connections) what worked is the following diff and calling fdb.StopNetwork() in the TestMain of each test package, in order to guarantee a serialization of these calls:

--- a/foundationdb/bindings/go/src/fdb/fdb.go
+++ b/foundationdb/bindings/go/src/fdb/fdb.go
@@ -196,6 +196,7 @@ var networkStarted bool
 var networkMutex sync.Mutex
 
 var openDatabases map[string]Database
+var runningNetwork sync.WaitGroup
 
 func init() {
        openDatabases = make(map[string]Database)
@@ -205,9 +206,10 @@ func startNetwork() error {
        if e := C.fdb_setup_network(); e != 0 {
                return Error{int(e)}
        }
-
+       runningNetwork.Add(1)
        go func() {
                e := C.fdb_run_network()
+               runningNetwork.Done()
                if e != 0 {
                        log.Printf("Unhandled error in FoundationDB network thread: %v (%v)\n", C.GoString(C.fdb_get_error(e)), e)
                }
@@ -232,6 +234,25 @@ func StartNetwork() error {
        return startNetwork()
 }
 
+// StopNetwork stops the FoundationDB client global network thread and waits for the related goroutine to exit.
+// After stopping the network the networkStarted flag is left as true, so that network cannot be started again.
+// StopNetwork my be called more then once.
+// See also: https://github.com/apple/foundationdb/issues/3015
+func StopNetwork() {
+       waitForNetwork := false
+       {
+               networkMutex.Lock()
+               defer networkMutex.Unlock()
+               if networkStarted {
+                       C.fdb_stop_network()
+                       waitForNetwork = true
+               }
+       }
+       if waitForNetwork {
+               runningNetwork.Wait()
+       }
+}
+
 // DefaultClusterFile should be passed to fdb.Open to allow the FoundationDB C
 // library to select the platform-appropriate default cluster file on the current machine.
 const DefaultClusterFile string = ""

Related issue: #3015

from foundationdb.

gm42 avatar gm42 commented on June 26, 2024

Also related: FoundationDB/fdb-kubernetes-operator#1950

NOTE: the simplification I did here does not solve the issue, but finding a way to serialize fdb_run_network()/fdb_stop_network() will solve it.

from foundationdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.