Coder Social home page Coder Social logo

hm-diag's Issues

Add reporting functionality to dashboard

In order to enrich the data in the dashboard, we need to send in some data that is only available on the device(s). We do however need to do this in a secure fashion. Here's how I am thinking this would work as of right now:

  • We create a new API end-point in the dashboard using DRF
  • We then do a simple POST to this end-point periodically from the device (TBD how we best run this)

While this all sounds simple, the devil is in the details. Credentials will be the trickiest part to solve, but here's how I'm currently envisioning it working:

  • We create a new authentication method for DRF that can take the RPI serial as the username and a token as the password
    • This set of credentials will only be able to update the status for itself (i.e. match on serial number)
  • We then create a Celery task that generates these sets for all devices
  • As part of the above Celery task, we also create a new environment variable for the particular device with the token (e.g. DASHBOARD_TOKEN).
    • If we ever wanted to rotate the credential for a given device, we simply remove the environment variable and the device Celery task will then auto-populate it again.
  • On the device side, we simply use the RPI serial and the new environment variable as the credentials when submitting the POST

Pairing shows incorrect firmware version

According to the local diagnostics the hotspot is running on hotspot verison 2021.06.10.0 but when pairing in the app it is still showing firmware version 2021.06.09.4

Add miner diagnostics output to diagnostics system

Request the same diagnostics information from the miner that the diagnostics function in the app uses.

This would then include:

If the hotspot thinks it is connected to the helium network.
Weather the hotspot is "dialable" (I'm guessing not relayed)
The height of the hotspot
And internet conneection type.

Error logs

Was talking to Kevin and we had an idea that maybe we could add some error log view/download to the diagnostics to help with debugging, specifically the miner error log...

NebraLtd/hm-miner#2 (comment)

Ability to turn off UPnP?

We had a customer who says they were having issues when port forwarding on a ubiquiti router due to port forwarding and UPnP being active. Thought maybe it might be useful to have a setting in diagnostics to deactivate upnp on the miner.

Percentage error on relayed device?

This is a device that had been online for some time (many days), but is stuck syncing. Shows really high percentage and that blockchain is only on block 1

Screenshot 2021-06-03 at 23 31 13

Diagnostics crashing

I deployed master to testnet, and it's now crashing with the following error:

 diagnostics  Traceback (most recent call last):
 diagnostics    File "/opt/utils.py", line 122, in writing_data
 diagnostics      with open(path, 'w') as file:
 diagnostics  FileNotFoundError: [Errno 2] No such file or directory: '/opt/nebraDiagnostics/html/diagnostics.json'
 diagnostics  
 diagnostics  During handling of the above exception, another exception occurred:
 diagnostics  
 diagnostics  Traceback (most recent call last):
 diagnostics    File "/opt/main.py", line 254, in <module>
 diagnostics      main()
 diagnostics    File "/opt/main.py", line 245, in main
 diagnostics      write_info_to_files(prod_diagnostics, diagnostics)
 diagnostics    File "/opt/main.py", line 128, in write_info_to_files
 diagnostics      utils.writing_data(path, data)
 diagnostics    File "/opt/utils.py", line 125, in writing_data
 diagnostics      raise FileNotFoundError(
 diagnostics  FileNotFoundError: Directory does not exist in the path: /opt/nebraDiagnostics/html/diagnostics.json

I have since reverted the update.

Improve error handling for wlan

It looks like the diagnostics tool is running just fine, but we're getting this error:

 diagnostics  ERROR:root:[Errno 2] No such file or directory: '/sys/class/net/wlan0/address'
 diagnostics  grep: /sys/bus/usb/devices/*/idVendor: No such file or directory
 diagnostics  grep: /sys/bus/usb/devices/*/idVendor: No such file or directory

We need to improve the error handling for this.

Related to #20

division by zero error

also saw this one...

03.06.21 02:43:06 (+0100)  diagnostics  Traceback (most recent call last):
03.06.21 02:43:06 (+0100)  diagnostics    File "/opt/nebraDiagnostics/main.py", line 154, in <module>
03.06.21 02:43:06 (+0100)  diagnostics      diagnostics['BSP'] = round(((int(diagnostics['MH'])/int(diagnostics['BCH']))*100),3)
03.06.21 02:43:06 (+0100)  diagnostics  ZeroDivisionError: division by zero
03.06.21 02:43:10 (+0100)  diagnostics  rm: can't remove '/opt/nebraDiagnostics/html/index.html': No such file or directory
03.06.21 02:43:10 (+0100)  diagnostics  rm: can't remove '/opt/nebraDiagnostics/html/initFile.txt': No such file or directory
03.06.21 02:43:12 (+0100)  diagnostics  Diag Loop

RPi Zero serial number fails

Error log:

21.04.21 21:17:59 (+0100)  diagnostics  Traceback (most recent call last):
21.04.21 21:17:59 (+0100)  diagnostics    File "/opt/nebraDiagnostics/diagnosticsProgram.py", line 55, in <module>
21.04.21 21:17:59 (+0100)  diagnostics      diagnostics["RPI"] = open("/proc/cpuinfo")\
21.04.21 21:17:59 (+0100)  diagnostics  IndexError: list index out of range

I think I got the serial number by going to a specific line which works on the 3, 3+, CM3s and 4s but as the Zero is only single core it fails.

Need to tweak so it gets it slightly differently.

https://github.com/NebraLtd/hm-diag/blob/master/diagnostics-program/diagnosticsProgram.py#L55

Rewrite service to use Flask

Let's rewrite the diagnostics page to use Flask, which gives us a lot of free features out of the box. We should then serve this app using Gunicorn.

To keep things simple, let's ditch the Nginx container for the time being and serve Gunicorn on port 8000 with Docker exposing it on port 80. That way we don't need to run the container as root.

This is a per-requisite for #39.

Relayed status

Is it possible to show the "Miner Relayed" row in amber if the value is true and also indicate this in the main status "All Ok" is not an accurate reflection of this state as beaconing will not work.

I have found that the diagnostic page is the only place that shows the real time relayed state and that the explorer, Helium App, and dashboard all indicate that there is no relayed state even after many hours/days.

It may be also worth considering adding a link if the hotspot is relayed to a page that shows how to fix it.

dbus warning message

Saw this on a unit today...

03.06.21 02:32:24 (+0100)  diagnostics  Diag Loop
03.06.21 02:32:25 (+0100)  diagnostics  ERROR:dbus.proxies:Introspect error on :1.94:/: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NotSupported: org.freedesktop.DBus.Introspectable.Introspect
03.06.21 02:32:26 (+0100)  diagnostics  Frequency: US915
03.06.21 02:32:26 (+0100)  diagnostics  

Possibly change webserver software

To possibly tighten down security it could be worth changing to a better webserver software.

Currently this is using simplehttpd which has the benefit of being extremely lightweight.

However if it is secure enough needs to be quickly looked into as it might be perfectly fine.

It is just a HTTP Server, no post or get requests are being processed either.

Alternatives include:
Nginx
Lighttpd
Apache.

ConnectionError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/he...

Sentry Issue: HM-DIAG-A

TimeoutError: [Errno 110] Operation timed out
  File "urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out
  File "urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "urllib3/connectionpool.py", line 976, in _validate_conn
    conn.connect()
  File "urllib3/connection.py", line 308, in connect
    conn = self._new_conn()
  File "urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(

MaxRetryError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/height (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out'))
  File "requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "urllib3/connectionpool.py", line 724, in urlopen
    retries = retries.increment(
  File "urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))

ConnectionError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/height (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out'))
(1 additional frame(s) were not displayed)
...
  File "requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)

Fix master

Some of the recent changes broke master and we are no longer writings the files properly:

/opt/html # ls
bootstrap.min.css
/opt # ls diagnostics.json 
diagnostics.json
/opt # ls initFile.txt 
initFile.txt

Most likely, this was introduced in #68 as this was a large rewrite.

Send to support functionality?

Perhaps a button we can get people to press that downloads a txt file with diagnostics info to send us.

Similar to what's in the app

Add more features / match with dashboard

I think it would be nice if the remote dashboard and the local dashboard converge to have similar features from a device info / device control perspective...

Things we could add:

  • reboot, shutdown, restart
  • relayed status
  • network statistics
  • Eth / WiFi status
  • WiFi network search and update
  • mDNS / NetBIOS search (nebra.local or macaddress.local or similar)
  • local / wan IP displayed for all interfaces
  • explore miner features
    • peer book
    • connect to new peers
    • take snapshot
    • restore snapshot
  • Customisable lora RF settings (rssi offset / antenna gain / tx power)

Probably a bunch more we could add, just wanted to get the conversation started ๐Ÿ‘Œ

Add "synced" call to show all ok

Currently we only show the sync percentage:
Screenshot_20210703-091908_Chrome

This can be confusing to people new to Helium as it's actually considered synced if it's within 500 blocks and this is how Helium display it on the explorer as shown here:
https://github.com/helium/explorer/blob/6c5539bc121b49f443838fad811d35100c3772a9/components/Hotspots/StatusPill.js#L33

We should add something similar so that we don't have people worrying that they aren't quite synced

Maybe "99.999% (Synced within 500 blocks)" or similar

Rework "connected to blockchain" status

Helium are updating the "connected" metric in p2p_status to change "connected" to "well-connected" to reflect the fact that in principle 1 connection means it is connected but this check is really meant to determine if it has a good connection.

They will also add a "sessions" count as well.

We need to update the diagnostics to accommodate these changes once they are merged helium/miner#868

Redesign diagnostics page

Blocked by #60.

I want to give the diagnostics page a fresh new look.

Here's how it currently looks:
Screen Shot 2021-06-16 at 9 50 42 AM

@vladimirpoleshko can you start mocking up a new design that goes in line with the dashboard design? Doesn't have to have the exact same design but at least have a similar feel to it to make it consistent.

Height status higher than blockchain height - Miner still loading

image

Since a few releases I have noticed that the height status sometimes gets higher than the displayed blockchain height and shows as "Miner Is Still Loading" in the sync percentage. This has confused a few customers.

Can we still show a 100% sync status when the height status is higher than the reported blockchain height, at least if it is less than 20 blocks?

Password protect?

Like with WiFi routers I think we should password protect this page. Had a few people exposing it to the internet and being surprised to know that it's got sensitive info on it.

We can password protect perhaps with the last 6 digits of Mac address or RasPi serial.

And we could allow people to change the password but with a factory reset option to put it back to the default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.