Coder Social home page Coder Social logo

Comments (15)

philfry avatar philfry commented on July 20, 2024

Hi,

the python based check_tc4400 is as fast as the perl based one, most of the time the script waits for the modem's cmconnectionstatus page. It has a timeout of 60s which was sufficiently high in the past. It might happen that the page takes longer than a minute, in that case the script terminates with "something went horribly wrong" and an "UNKNOWN" return code (3).

You seem to have configured your nagios setup with service_check_timeout of 60 which kills the check script prematurely. Try increasing that value and/or do manual runs. If the script terminates with the above exception, try increasing the script's timeout in
https://github.com/philfry/check_tc4400/blob/master/check_tc4400.py#L96

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,

i changed the service_check_timeout to 120 and will check if the service check will now work without the timeout.
I still don´t understand why this worked with 60 until i changed to the latest version.

Edit: Ok that didn´t last long :-(
UNKNOWN: Connecting to webinterface failed with 'timed out'

kind regards
Michael

from check_tc4400.

philfry avatar philfry commented on July 20, 2024

Hi Michael,
(unfortunately I don't get notified on comment edits)
please try curl'ing for a while to check your modem's response times:

for i in {1..100}; do
  time curl -s -o /dev/null -u 'admin:yourpassword' http://yourmodemip/cmconnectionstatus.html
done

and adapt the script to your needs. Depending on your firmware you might get better response times on the management ip address of your modem (that one the provider assigns) or even 192.168.0.1. Also it might happen that the modem randomly ignores 192.168.100.1. I've set up a haproxy that tries all three ip addresses because of that.

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
my TC4400 is running with SR70.12.33-180327
Again, what i don´t understand is, why does it stopped working reliable with the new version?
The check was running for over a year without a problem.

I checked the management ip vs 192.168.100.1 and there was no difference.
192.168.0.1 isn´t working at all, but this could be because i didn´t configured a route for this.

kind regards
Michael

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
i used your three lines and most of the time it says:

real    0m21.840s
user    0m0.004s
sys     0m0.003s

Sometimes the first test took 1 minute and 40 seconds.
So the 120 second timeout should be enough.

Here´s my Grafana chart from the last 6 hours
Bildschirmfoto 2020-09-01 um 22 34 02

kind regards
Michael

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
do you still have the old perl version?
I would like to rollback to this version as it was working reliable and the python3 version isn´t.
It´s now hanging in the unknown state for over 50 minutes.


Current Status: | UNKNOWN   (for  0d  0h 51m  0s)
-- | --
Status Information: | UNKNOWN: Something went horribly wrong: <class 'socket.timeout'>
Performance Data: |  
Current Attempt: | 3/3  (HARD state)
Last Check Time: | 09-02-2020 09:21:55
Check Type: | ACTIVE
Check Latency / Duration: | 0.000 / 60.140 seconds
Next Scheduled Check: | 09-02-2020 09:31:55
Last State Change: | 09-02-2020 08:40:55
Last Notification: | N/A (notification 0)
Is This Service Flapping? | YES   (62.43% state change)
In Scheduled Downtime? | NO
Last Update: | 09-02-2020 09:31:50  ( 0d  0h  0m  5s ago)

kind regards
Michael

from check_tc4400.

philfry avatar philfry commented on July 20, 2024

Hi Michael,

sure: https://github.com/philfry/check_tc4400/tree/v0.6

To investigate further, would you mind sending me a tcpdump of such a time-outed request?

  • Phil

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
thanks for the old version, i´m not that familiar with Github and couldn´t find it.

Sure i could provide you a tcpdump, if you tell me how :-)

kind regards
Michael

from check_tc4400.

philfry avatar philfry commented on July 20, 2024

Hi Michael,
try

tcpdump -nn -i yourinterfacetotc4400 -s 0 -w tc4400.pcap host 192.168.100.1 and tcp and port 80

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
i just created an tcpdump with one succesfull and one failed attempt from nagios.
Please have a look in your inbox.

kind regards
Michael

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
thanks for the review of the tcpdump.
After changing back to the perl version, the check_tc4400 is running without any problem again:
Bildschirmfoto 2020-09-02 um 23 43 45
You can see the difference between python3 until 21:00 and perl after 21:00.

I´m going to send you another tcpdump of the interface with the perl script running.
Maybe there´s anything you can see to debug this.

kind regards
Michael

from check_tc4400.

philfry avatar philfry commented on July 20, 2024

Hi Michael,

the pcaps shows that the modem sometimes hangs on the python client's request. It ACKs the PSH (GET /cmconnectionstatus.html) but then – nothing. Once the python client reaches its timeout it sends a FIN/ACK which is not getting FIN/ACKed by the modem. If the modem is responsive it usually takes about 20s to answer.

The perl client's request is slightly different from the python's request. First of all, the python client sends two requests:
An unauthorized one:

GET /cmconnectionstatus.html HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 192.168.100.1
User-Agent: nagios/check_tc4400

which the modem rejects with

401 Unauthorized

then an authorized one:

GET /cmconnectionstatus.html HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Authorization: Basic base64encodedauthstring
Host: 192.168.100.1
User-Agent: nagios/check_tc4400

Below is the python client's request:

GET /cmconnectionstatus.html HTTP/1.1
Accept-Encoding: identity
Host: 192.168.100.1
User-Agent: Python-urllib/3.6
Authorization: Basic base64encodedauthstring
Connection: close

The second difference is that the perl client requests a compressed response (TE: deflate,gzip) which the modem cannot handle, though.
The third difference is the (perfectly valid) Accept-Encoding header.

Anyway. I can only make those requests match roughly. Python's urllib doesn't allow modifying the Connection header and I cannot remove Accept-Encoding.

I could imagine the prior unauthorized request is some kind of "hey modem, wake up, someone might ask for data soon". My modem behaves differently here, but if you like you might try out https://github.com/philfry/check_tc4400/tree/bredmich_tworeqs which sends an unauthorized dummy request before requesting the real data.

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,

your test version works fine and i´ll check Grafana tomorrow if there´s any problem with the 10 minute interval.

kind regards
Michael

from check_tc4400.

bredmich avatar bredmich commented on July 20, 2024

Hi Phil,
the test version works reliable again :-)
Bildschirmfoto 2020-09-04 um 09 20 29

kind regards
Michael

from check_tc4400.

philfry avatar philfry commented on July 20, 2024

Hi Michael,

nice. I'll add a commandline switch to use this workaround.

from check_tc4400.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.