Comments (15)
Hi,
the python based check_tc4400 is as fast as the perl based one, most of the time the script waits for the modem's cmconnectionstatus page. It has a timeout of 60s which was sufficiently high in the past. It might happen that the page takes longer than a minute, in that case the script terminates with "something went horribly wrong" and an "UNKNOWN" return code (3).
You seem to have configured your nagios setup with service_check_timeout
of 60 which kills the check script prematurely. Try increasing that value and/or do manual runs. If the script terminates with the above exception, try increasing the script's timeout in
https://github.com/philfry/check_tc4400/blob/master/check_tc4400.py#L96
from check_tc4400.
Hi Phil,
i changed the service_check_timeout
to 120
and will check if the service check will now work without the timeout.
I still don´t understand why this worked with 60
until i changed to the latest version.
Edit: Ok that didn´t last long :-(
UNKNOWN: Connecting to webinterface failed with 'timed out'
kind regards
Michael
from check_tc4400.
Hi Michael,
(unfortunately I don't get notified on comment edits)
please try curl'ing for a while to check your modem's response times:
for i in {1..100}; do
time curl -s -o /dev/null -u 'admin:yourpassword' http://yourmodemip/cmconnectionstatus.html
done
and adapt the script to your needs. Depending on your firmware you might get better response times on the management ip address of your modem (that one the provider assigns) or even 192.168.0.1. Also it might happen that the modem randomly ignores 192.168.100.1. I've set up a haproxy that tries all three ip addresses because of that.
from check_tc4400.
Hi Phil,
my TC4400 is running with SR70.12.33-180327
Again, what i don´t understand is, why does it stopped working reliable with the new version?
The check was running for over a year without a problem.
I checked the management ip vs 192.168.100.1 and there was no difference.
192.168.0.1 isn´t working at all, but this could be because i didn´t configured a route for this.
kind regards
Michael
from check_tc4400.
Hi Phil,
i used your three lines and most of the time it says:
real 0m21.840s
user 0m0.004s
sys 0m0.003s
Sometimes the first test took 1 minute and 40 seconds.
So the 120 second timeout should be enough.
Here´s my Grafana chart from the last 6 hours
kind regards
Michael
from check_tc4400.
Hi Phil,
do you still have the old perl version?
I would like to rollback to this version as it was working reliable and the python3 version isn´t.
It´s now hanging in the unknown state for over 50 minutes.
Current Status: | UNKNOWN (for 0d 0h 51m 0s)
-- | --
Status Information: | UNKNOWN: Something went horribly wrong: <class 'socket.timeout'>
Performance Data: |
Current Attempt: | 3/3 (HARD state)
Last Check Time: | 09-02-2020 09:21:55
Check Type: | ACTIVE
Check Latency / Duration: | 0.000 / 60.140 seconds
Next Scheduled Check: | 09-02-2020 09:31:55
Last State Change: | 09-02-2020 08:40:55
Last Notification: | N/A (notification 0)
Is This Service Flapping? | YES (62.43% state change)
In Scheduled Downtime? | NO
Last Update: | 09-02-2020 09:31:50 ( 0d 0h 0m 5s ago)
kind regards
Michael
from check_tc4400.
Hi Michael,
sure: https://github.com/philfry/check_tc4400/tree/v0.6
To investigate further, would you mind sending me a tcpdump of such a time-outed request?
- Phil
from check_tc4400.
Hi Phil,
thanks for the old version, i´m not that familiar with Github and couldn´t find it.
Sure i could provide you a tcpdump, if you tell me how :-)
kind regards
Michael
from check_tc4400.
Hi Michael,
try
tcpdump -nn -i yourinterfacetotc4400 -s 0 -w tc4400.pcap host 192.168.100.1 and tcp and port 80
from check_tc4400.
Hi Phil,
i just created an tcpdump with one succesfull and one failed attempt from nagios.
Please have a look in your inbox.
kind regards
Michael
from check_tc4400.
Hi Phil,
thanks for the review of the tcpdump.
After changing back to the perl version, the check_tc4400 is running without any problem again:
You can see the difference between python3 until 21:00 and perl after 21:00.
I´m going to send you another tcpdump of the interface with the perl script running.
Maybe there´s anything you can see to debug this.
kind regards
Michael
from check_tc4400.
Hi Michael,
the pcaps shows that the modem sometimes hangs on the python client's request. It ACKs the PSH (GET /cmconnectionstatus.html
) but then – nothing. Once the python client reaches its timeout it sends a FIN/ACK which is not getting FIN/ACKed by the modem. If the modem is responsive it usually takes about 20s to answer.
The perl client's request is slightly different from the python's request. First of all, the python client sends two requests:
An unauthorized one:
GET /cmconnectionstatus.html HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 192.168.100.1
User-Agent: nagios/check_tc4400
which the modem rejects with
401 Unauthorized
then an authorized one:
GET /cmconnectionstatus.html HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Authorization: Basic base64encodedauthstring
Host: 192.168.100.1
User-Agent: nagios/check_tc4400
Below is the python client's request:
GET /cmconnectionstatus.html HTTP/1.1
Accept-Encoding: identity
Host: 192.168.100.1
User-Agent: Python-urllib/3.6
Authorization: Basic base64encodedauthstring
Connection: close
The second difference is that the perl client requests a compressed response (TE: deflate,gzip
) which the modem cannot handle, though.
The third difference is the (perfectly valid) Accept-Encoding
header.
Anyway. I can only make those requests match roughly. Python's urllib
doesn't allow modifying the Connection
header and I cannot remove Accept-Encoding
.
I could imagine the prior unauthorized request is some kind of "hey modem, wake up, someone might ask for data soon". My modem behaves differently here, but if you like you might try out https://github.com/philfry/check_tc4400/tree/bredmich_tworeqs which sends an unauthorized dummy request before requesting the real data.
from check_tc4400.
Hi Phil,
your test version works fine and i´ll check Grafana tomorrow if there´s any problem with the 10 minute interval.
kind regards
Michael
from check_tc4400.
Hi Phil,
the test version works reliable again :-)
kind regards
Michael
from check_tc4400.
Hi Michael,
nice. I'll add a commandline switch to use this workaround.
from check_tc4400.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_tc4400.