Comments (8)
I think this will fix it @Xaelias ... I'm testing right now
diff --git a/namespaces.go b/namespaces.go
index f210f11..0683768 100644
--- a/namespaces.go
+++ b/namespaces.go
@@ -229,12 +229,14 @@ func (nc nsCollector) collect(conn *as.Connection, ch chan<- prometheus.Metric)
log.Print(err)
return
}
- for _, ns := range strings.Split(info["namespaces"], ";") {
- nsinfo, err := as.RequestInfo(conn, "namespace/"+ns)
- if err != nil {
- log.Print(err)
- continue
- }
- infoCollect(ch, cmetrics(nc), nsinfo["namespace/"+ns], ns)
+ if info != nil {
+ for _, ns := range strings.Split(info["namespaces"], ";") {
+ nsinfo, err := as.RequestInfo(conn, "namespace/"+ns)
+ if err != nil {
+ log.Print(err)
+ continue
+ }
+ infoCollect(ch, cmetrics(nc), nsinfo["namespace/"+ns], ns)
+ }
}
}```
from asprom.
I hope you managed to solve the connection troubles. Asprom should now behave better when a connection error occurs during a scape. Released as https://github.com/alicebob/asprom/releases/tag/1.3.2
Thanks for the help!
from asprom.
Still broken
from asprom.
Somehow there is a read error from aero:
Mar 02 17:51:09 aerospike_exporter[72033]: 2018/03/02 17:51:09 read tcp 127.0.0.1:49048->127.0.0.1:3000: i/o timeout
but asprom keeps reusing the connection. That seems to be a problem in the aero client library.
I tried a fix for the error handling in the connerr
branch. When there is a connection error halfway a prometheus collect it will now stop the collect. That might solve the panic in asprom, but the real problem seems to be that the connection drops halfway the collect, though.
from asprom.
@alicebob I'll give your fixes a try thanks for looking.
from asprom.
So the connerr
branch works around the segfault ... but then previously latched metrics disappear until the next successful interval which may mess up anyone doing rate()
like functions.
Mar 02 18:47:32 asdb12we1.aws.sig aerospike_exporter[74022]: 2018/03/02 18:47:32 ###### End
Mar 02 18:47:45 asdb12we1.aws.sig systemd[1]: Stopping Prometheus Aerospike Monitor...
Mar 02 18:47:45 asdb12we1.aws.sig systemd[1]: Stopped Prometheus Aerospike Monitor.
Mar 02 20:19:59 asdb12we1.aws.sig systemd[1]: Started Prometheus Aerospike Monitor.
Mar 02 20:19:59 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:19:59 starting asprom. listening on :9145
Mar 02 20:20:05 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:05 read tcp 127.0.0.1:50310->127.0.0.1:3000: i/o timeout
Mar 02 20:20:09 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:09 read tcp 127.0.0.1:50312->127.0.0.1:3000: i/o timeout
Mar 02 20:20:10 asdb12we1.aws.sig aerospike_exporter[75826]: 2018/03/02 20:20:10 read tcp 127.0.0.1:50316->127.0.0.1:3000: i/o timeout
^C
root@asdb12we1:~# while true; do sleep 10; (curl --silent localhost:9145/metrics | head) ;done
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 25
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 26
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_node_scrapes_total Total number of times Aerospike was scraped for metrics.
# TYPE aerospike_node_scrapes_total counter
aerospike_node_scrapes_total 28
# HELP aerospike_node_up Is this node up
# TYPE aerospike_node_up gauge
aerospike_node_up 1
# HELP aerospike_latency_read read latency histogram
# TYPE aerospike_latency_read gauge
aerospike_latency_read{namespace="profiledata",threshold=">1ms"} 0
aerospike_latency_read{namespace="profiledata",threshold=">64ms"} 0
aerospike_latency_read{namespace="profiledata",threshold=">8ms"} 0
# HELP aerospike_latency_write write latency histogram
# TYPE aerospike_latency_write gauge
aerospike_latency_write{namespace="mergelog",threshold=">1ms"} 0
aerospike_latency_write{namespace="mergelog",threshold=">64ms"} 0
aerospike_latency_write{namespace="mergelog",threshold=">8ms"} 0
from asprom.
The timeouts are on our end. Even the nodes seem to be timing out when talking to each others.
However, you still probably should try and handle the situation gracefully.
I'm reaching out to aerospike support on my end.
from asprom.
ok, good to know this does explain the problem. I'll see whether I can make either all of the metrics work, or none at all.
from asprom.
Related Issues (14)
- prometheus.Counter's Set method is deprecated HOT 1
- latency: missing measurements line HOT 2
- relationship to asgraphite HOT 4
- Security? HOT 18
- "Missing" statistics HOT 6
- dashboard HOT 7
- Latency metrics for batch reads? HOT 10
- Provision to pass credential via environment variables HOT 4
- Added TLS support configuration HOT 7
- ./main.go:168:34: cannot use hp (type []byte) as type string in argument to conn.Authenticate HOT 5
- add control over export port via env parameters HOT 1
- No binaries in release? HOT 2
- aerospike_node_up 0 HOT 25
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asprom.