Coder Social home page Coder Social logo

segfault about asprom HOT 8 CLOSED

Xaelias avatar Xaelias commented on July 24, 2024
segfault

from asprom.

Comments (8)

JustinVenus avatar JustinVenus commented on July 24, 2024 1

I think this will fix it @Xaelias ... I'm testing right now

diff --git a/namespaces.go b/namespaces.go
index f210f11..0683768 100644
--- a/namespaces.go
+++ b/namespaces.go
@@ -229,12 +229,14 @@ func (nc nsCollector) collect(conn *as.Connection, ch chan<- prometheus.Metric)
                log.Print(err)
                return
        }
-       for _, ns := range strings.Split(info["namespaces"], ";") {
-               nsinfo, err := as.RequestInfo(conn, "namespace/"+ns)
-               if err != nil {
-                       log.Print(err)
-                       continue
-               }
-               infoCollect(ch, cmetrics(nc), nsinfo["namespace/"+ns], ns)
+  if info != nil {
+         for _, ns := range strings.Split(info["namespaces"], ";") {
+                 nsinfo, err := as.RequestInfo(conn, "namespace/"+ns)
+                 if err != nil {
+                         log.Print(err)
+                         continue
+                 }
+                 infoCollect(ch, cmetrics(nc), nsinfo["namespace/"+ns], ns)
+    }
        }
 }```

from asprom.

alicebob avatar alicebob commented on July 24, 2024 1

I hope you managed to solve the connection troubles. Asprom should now behave better when a connection error occurs during a scape. Released as https://github.com/alicebob/asprom/releases/tag/1.3.2

Thanks for the help!

from asprom.

JustinVenus avatar JustinVenus commented on July 24, 2024

Still broken

from asprom.

alicebob avatar alicebob commented on July 24, 2024

Somehow there is a read error from aero:

Mar 02 17:51:09 aerospike_exporter[72033]: 2018/03/02 17:51:09 read tcp 127.0.0.1:49048->127.0.0.1:3000: i/o timeout

but asprom keeps reusing the connection. That seems to be a problem in the aero client library.

I tried a fix for the error handling in the connerr branch. When there is a connection error halfway a prometheus collect it will now stop the collect. That might solve the panic in asprom, but the real problem seems to be that the connection drops halfway the collect, though.

from asprom.

JustinVenus avatar JustinVenus commented on July 24, 2024

@alicebob I'll give your fixes a try thanks for looking.

from asprom.

JustinVenus avatar JustinVenus commented on July 24, 2024

So the connerr branch works around the segfault ... but then previously latched metrics disappear until the next successful interval which may mess up anyone doing rate() like functions.

Mar 02 18:47:32 asdb12we1.aws.sig aerospike_exporter[74022]: 2018/03/02 18:47:32 ###### End
Mar 02 18:47:45 asdb12we1.aws.sig systemd[1]: Stopping Prometheus Aerospike Monitor...
Mar 02 18:47:45 asdb12we1.aws.sig systemd[1]: Stopped Prometheus Aerospike Monitor.
Mar 02 20:19:59 asdb12we1.aws.sig systemd[1]: Started Prometheus Aerospike Monitor.
Mar 02 20:19:59 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:19:59 starting asprom. listening on :9145
Mar 02 20:20:05 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:05 read tcp 127.0.0.1:50310->127.0.0.1:3000: i/o timeout
Mar 02 20:20:09 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:09 read tcp 127.0.0.1:50312->127.0.0.1:3000: i/o timeout
Mar 02 20:20:10 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:10 read tcp 127.0.0.1:50316->127.0.0.1:3000: i/o timeout
^C
root@asdb12we1:~# while true; do sleep 10; (curl --silent localhost:9145/metrics | head) ;done
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 25
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 26
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 28
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_latency_read read latency histogram
# TYPE aerospike_latency_read gauge
aerospike_latency_read{namespace="profiledata",threshold=">1ms"} 0
aerospike_latency_read{namespace="profiledata",threshold=">64ms"} 0
aerospike_latency_read{namespace="profiledata",threshold=">8ms"} 0
# HELP aerospike_latency_write write latency histogram
# TYPE aerospike_latency_write gauge
aerospike_latency_write{namespace="mergelog",threshold=">1ms"} 0
aerospike_latency_write{namespace="mergelog",threshold=">64ms"} 0
aerospike_latency_write{namespace="mergelog",threshold=">8ms"} 0

from asprom.

Xaelias avatar Xaelias commented on July 24, 2024

The timeouts are on our end. Even the nodes seem to be timing out when talking to each others.
However, you still probably should try and handle the situation gracefully.
I'm reaching out to aerospike support on my end.

from asprom.

alicebob avatar alicebob commented on July 24, 2024

ok, good to know this does explain the problem. I'll see whether I can make either all of the metrics work, or none at all.

from asprom.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.