Coder Social home page Coder Social logo

hm-diag's Introduction

hm-diag: Helium Miner Diagnostics Container

hm-diag is a small website that displays diagnostic information about a hotspot. The website is only accessible if you are on the same network as the hotspot. Some people have exposed their devices publicly but this is not generally advised.

Quick start

Find the IP address of the hotspot using Balena's dashboard or network scanner. The website is available on port 80 so you can simply input the hotspot's IP address in the browser.

Diagnostics JSON Layout

As part of the code the system produces a JSON file which then is used to carry the data over easily to other parts of the system.

Variable Description
AN The Animal Name of the miner
APPNAME The name advertised on BTLE
BA Balena Application Name
BCH Current blockchain height
BN Balena Name, used to identify on balena
BSP Sync percentage
BT If the bluetooth module is detected
BUTTON The GPIO pin of the button on the miner
CELLULAR Whether the device has optional cellular capability
E0 MAC Address of the ETH0 interface
ECC If the ECC Key is detected over I2C
ECCOB If the miner should have an ECC chip on board
FR The hardware frequency
FRIENDLY The Friendly name of the hotspot
FW Firmware running on the unit
ID Balena UUID
LOR If a fault has been found with the LoRa Module
LTE If the LTE Module is detected
MAC Which mac address to print on labels in production
MC If the miner is connected to the Helium Network
MD If the miner is "Dialable" on the network
MH The sync height of the miner
MN NAT Type of the miner
MR Whether the miner is relayed or not
MS If miner is synced within 500 blocks
OK The onboarding key of the miner
PF If overall diagnostics have passed
PK The public key of the miner
RE The detected region plan from the miner (or override)
RESET The reset pin to use for the LoRa Module
serial_number The serial number of the onboard Raspberry Pi or other SBC
SPIBUS The SPI Bus to use for the LoRa Module
STATUS The GPIO Pin of the status LED
TYPE If it is a Full or Light Hotspot
VA ID of the hardware variant
W0 Mac Address of the WLAN0 interface
last_updated When this JSON was last updated (UTC timezone)
firmware_short_hash The related commit short hash of currently running firmware

Local development environment

Because the stack is tightly intertwined with Balena, the easiest way to test the code base on your own Raspberry Pi in your own Balena project.

  • Create a new Balena application (in a personal org):
    • Default device type: Raspberry Pi 3 (using 64 bit OS)
    • Application type: Starter
  • Add a device:
    • Select newest version
    • Development (required for local mode)
    • Click Download Balena OS
  • Use Etcher to flash the downloaded image
  • Insert flash drive into the Raspberry Pi and boot (don't forget to plugin ethernet if necessary)
  • Set env vars for the application in Balena:
    • FREQ: 868, 915, etc.
    • VARIANT: Choose from here
  • Deploy changes to:
    • All devices in application: balena push BALENA_APPLICATION
    • Single device in local mode: balena push UUID.local (this will build on the device and )

If you are on the same network as the Raspberry Pi, enter LOCAL IP ADDRESS from Balena into the browser.

Testing

poetry install --with dev
poetry run pytest --cov=hw_diag --cov=bigquery --cov-fail-under=80
poetry run ruff check hw_diag

Deprecated deployment

This is no longer the recommended way of doing Balena deployments.

  • Add the remote Balena repo:git remote add balena [email protected]:BALENA_USERNAME/BALENA_PROJECT.git
  • Deploy changes: git push balena YourLocalBranch:master

Access from other networks

Balena will generate a public URL for a device if PUBLIC DEVICE URL is toggled from the Balena device dashboard. This is not generally recommended, except for debugging.

Pre built containers

This repo automatically builds docker containers and uploads them to two repositories for easy access:

The images are tagged using the docker long and short commit SHAs for that release. The current version deployed to miners can be found in the helium-miner-software repo.

Light hotspot notes

Helium network is transitioning to light hotspot mode. During the transition, there is a possibility of hotspot going back and forth between light and full blockchain sync mode. We support a environment variable (DISPLAY_MINER_INFO) to disable showing blockchain mining information so that administrators can enable/disable its visibility to avoid confusion among their customers.

The envvar default to false. If set and set to anything other than true, following mining information will be hidden:

  • Sync Percentage
  • Miner Connected To Blockchain
  • Height Status
  • Miner Relayed

This is a temporary capability that will be removed when hotspots moves to gateway-rs. Instead validator information will be shown.

hm-diag's People

Contributors

ccrisan avatar dependabot[bot] avatar ilyastrodubtsev avatar kashifpk avatar kerrryu avatar kevinwassermann94 avatar louisreed avatar marvinmarnold avatar muratursavas avatar posterzh avatar pritamghanghas avatar robputt avatar ryan-goldstein avatar ryanteck avatar shawaj avatar vpetersson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hm-diag's Issues

Percentage error on relayed device?

This is a device that had been online for some time (many days), but is stuck syncing. Shows really high percentage and that blockchain is only on block 1

Screenshot 2021-06-03 at 23 31 13

ConnectionError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/he...

Sentry Issue: HM-DIAG-A

TimeoutError: [Errno 110] Operation timed out
  File "urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out
  File "urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "urllib3/connectionpool.py", line 976, in _validate_conn
    conn.connect()
  File "urllib3/connection.py", line 308, in connect
    conn = self._new_conn()
  File "urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(

MaxRetryError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/height (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out'))
  File "requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "urllib3/connectionpool.py", line 724, in urlopen
    retries = retries.increment(
  File "urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))

ConnectionError: HTTPSConnectionPool(host='api.helium.io', port=443): Max retries exceeded with url: /v1/blocks/height (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xf65e3058>: Failed to establish a new connection: [Errno 110] Operation timed out'))
(1 additional frame(s) were not displayed)
...
  File "requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)

Add reporting functionality to dashboard

In order to enrich the data in the dashboard, we need to send in some data that is only available on the device(s). We do however need to do this in a secure fashion. Here's how I am thinking this would work as of right now:

  • We create a new API end-point in the dashboard using DRF
  • We then do a simple POST to this end-point periodically from the device (TBD how we best run this)

While this all sounds simple, the devil is in the details. Credentials will be the trickiest part to solve, but here's how I'm currently envisioning it working:

  • We create a new authentication method for DRF that can take the RPI serial as the username and a token as the password
    • This set of credentials will only be able to update the status for itself (i.e. match on serial number)
  • We then create a Celery task that generates these sets for all devices
  • As part of the above Celery task, we also create a new environment variable for the particular device with the token (e.g. DASHBOARD_TOKEN).
    • If we ever wanted to rotate the credential for a given device, we simply remove the environment variable and the device Celery task will then auto-populate it again.
  • On the device side, we simply use the RPI serial and the new environment variable as the credentials when submitting the POST

Pairing shows incorrect firmware version

According to the local diagnostics the hotspot is running on hotspot verison 2021.06.10.0 but when pairing in the app it is still showing firmware version 2021.06.09.4

Height status higher than blockchain height - Miner still loading

image

Since a few releases I have noticed that the height status sometimes gets higher than the displayed blockchain height and shows as "Miner Is Still Loading" in the sync percentage. This has confused a few customers.

Can we still show a 100% sync status when the height status is higher than the reported blockchain height, at least if it is less than 20 blocks?

Improve error handling for wlan

It looks like the diagnostics tool is running just fine, but we're getting this error:

 diagnostics  ERROR:root:[Errno 2] No such file or directory: '/sys/class/net/wlan0/address'
 diagnostics  grep: /sys/bus/usb/devices/*/idVendor: No such file or directory
 diagnostics  grep: /sys/bus/usb/devices/*/idVendor: No such file or directory

We need to improve the error handling for this.

Related to #20

Ability to turn off UPnP?

We had a customer who says they were having issues when port forwarding on a ubiquiti router due to port forwarding and UPnP being active. Thought maybe it might be useful to have a setting in diagnostics to deactivate upnp on the miner.

Redesign diagnostics page

Blocked by #60.

I want to give the diagnostics page a fresh new look.

Here's how it currently looks:
Screen Shot 2021-06-16 at 9 50 42 AM

@vladimirpoleshko can you start mocking up a new design that goes in line with the dashboard design? Doesn't have to have the exact same design but at least have a similar feel to it to make it consistent.

Error logs

Was talking to Kevin and we had an idea that maybe we could add some error log view/download to the diagnostics to help with debugging, specifically the miner error log...

NebraLtd/hm-miner#2 (comment)

Password protect?

Like with WiFi routers I think we should password protect this page. Had a few people exposing it to the internet and being surprised to know that it's got sensitive info on it.

We can password protect perhaps with the last 6 digits of Mac address or RasPi serial.

And we could allow people to change the password but with a factory reset option to put it back to the default.

division by zero error

also saw this one...

03.06.21 02:43:06 (+0100)  diagnostics  Traceback (most recent call last):
03.06.21 02:43:06 (+0100)  diagnostics    File "/opt/nebraDiagnostics/main.py", line 154, in <module>
03.06.21 02:43:06 (+0100)  diagnostics      diagnostics['BSP'] = round(((int(diagnostics['MH'])/int(diagnostics['BCH']))*100),3)
03.06.21 02:43:06 (+0100)  diagnostics  ZeroDivisionError: division by zero
03.06.21 02:43:10 (+0100)  diagnostics  rm: can't remove '/opt/nebraDiagnostics/html/index.html': No such file or directory
03.06.21 02:43:10 (+0100)  diagnostics  rm: can't remove '/opt/nebraDiagnostics/html/initFile.txt': No such file or directory
03.06.21 02:43:12 (+0100)  diagnostics  Diag Loop

Relayed status

Is it possible to show the "Miner Relayed" row in amber if the value is true and also indicate this in the main status "All Ok" is not an accurate reflection of this state as beaconing will not work.

I have found that the diagnostic page is the only place that shows the real time relayed state and that the explorer, Helium App, and dashboard all indicate that there is no relayed state even after many hours/days.

It may be also worth considering adding a link if the hotspot is relayed to a page that shows how to fix it.

Add miner diagnostics output to diagnostics system

Request the same diagnostics information from the miner that the diagnostics function in the app uses.

This would then include:

If the hotspot thinks it is connected to the helium network.
Weather the hotspot is "dialable" (I'm guessing not relayed)
The height of the hotspot
And internet conneection type.

Add more features / match with dashboard

I think it would be nice if the remote dashboard and the local dashboard converge to have similar features from a device info / device control perspective...

Things we could add:

  • reboot, shutdown, restart
  • relayed status
  • network statistics
  • Eth / WiFi status
  • WiFi network search and update
  • mDNS / NetBIOS search (nebra.local or macaddress.local or similar)
  • local / wan IP displayed for all interfaces
  • explore miner features
    • peer book
    • connect to new peers
    • take snapshot
    • restore snapshot
  • Customisable lora RF settings (rssi offset / antenna gain / tx power)

Probably a bunch more we could add, just wanted to get the conversation started ๐Ÿ‘Œ

Diagnostics crashing

I deployed master to testnet, and it's now crashing with the following error:

 diagnostics  Traceback (most recent call last):
 diagnostics    File "/opt/utils.py", line 122, in writing_data
 diagnostics      with open(path, 'w') as file:
 diagnostics  FileNotFoundError: [Errno 2] No such file or directory: '/opt/nebraDiagnostics/html/diagnostics.json'
 diagnostics  
 diagnostics  During handling of the above exception, another exception occurred:
 diagnostics  
 diagnostics  Traceback (most recent call last):
 diagnostics    File "/opt/main.py", line 254, in <module>
 diagnostics      main()
 diagnostics    File "/opt/main.py", line 245, in main
 diagnostics      write_info_to_files(prod_diagnostics, diagnostics)
 diagnostics    File "/opt/main.py", line 128, in write_info_to_files
 diagnostics      utils.writing_data(path, data)
 diagnostics    File "/opt/utils.py", line 125, in writing_data
 diagnostics      raise FileNotFoundError(
 diagnostics  FileNotFoundError: Directory does not exist in the path: /opt/nebraDiagnostics/html/diagnostics.json

I have since reverted the update.

Add "synced" call to show all ok

Currently we only show the sync percentage:
Screenshot_20210703-091908_Chrome

This can be confusing to people new to Helium as it's actually considered synced if it's within 500 blocks and this is how Helium display it on the explorer as shown here:
https://github.com/helium/explorer/blob/6c5539bc121b49f443838fad811d35100c3772a9/components/Hotspots/StatusPill.js#L33

We should add something similar so that we don't have people worrying that they aren't quite synced

Maybe "99.999% (Synced within 500 blocks)" or similar

dbus warning message

Saw this on a unit today...

03.06.21 02:32:24 (+0100)  diagnostics  Diag Loop
03.06.21 02:32:25 (+0100)  diagnostics  ERROR:dbus.proxies:Introspect error on :1.94:/: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NotSupported: org.freedesktop.DBus.Introspectable.Introspect
03.06.21 02:32:26 (+0100)  diagnostics  Frequency: US915
03.06.21 02:32:26 (+0100)  diagnostics  

Rework "connected to blockchain" status

Helium are updating the "connected" metric in p2p_status to change "connected" to "well-connected" to reflect the fact that in principle 1 connection means it is connected but this check is really meant to determine if it has a good connection.

They will also add a "sessions" count as well.

We need to update the diagnostics to accommodate these changes once they are merged helium/miner#868

RPi Zero serial number fails

Error log:

21.04.21 21:17:59 (+0100)  diagnostics  Traceback (most recent call last):
21.04.21 21:17:59 (+0100)  diagnostics    File "/opt/nebraDiagnostics/diagnosticsProgram.py", line 55, in <module>
21.04.21 21:17:59 (+0100)  diagnostics      diagnostics["RPI"] = open("/proc/cpuinfo")\
21.04.21 21:17:59 (+0100)  diagnostics  IndexError: list index out of range

I think I got the serial number by going to a specific line which works on the 3, 3+, CM3s and 4s but as the Zero is only single core it fails.

Need to tweak so it gets it slightly differently.

https://github.com/NebraLtd/hm-diag/blob/master/diagnostics-program/diagnosticsProgram.py#L55

Fix master

Some of the recent changes broke master and we are no longer writings the files properly:

/opt/html # ls
bootstrap.min.css
/opt # ls diagnostics.json 
diagnostics.json
/opt # ls initFile.txt 
initFile.txt

Most likely, this was introduced in #68 as this was a large rewrite.

Possibly change webserver software

To possibly tighten down security it could be worth changing to a better webserver software.

Currently this is using simplehttpd which has the benefit of being extremely lightweight.

However if it is secure enough needs to be quickly looked into as it might be perfectly fine.

It is just a HTTP Server, no post or get requests are being processed either.

Alternatives include:
Nginx
Lighttpd
Apache.

Rewrite service to use Flask

Let's rewrite the diagnostics page to use Flask, which gives us a lot of free features out of the box. We should then serve this app using Gunicorn.

To keep things simple, let's ditch the Nginx container for the time being and serve Gunicorn on port 8000 with Docker exposing it on port 80. That way we don't need to run the container as root.

This is a per-requisite for #39.

Send to support functionality?

Perhaps a button we can get people to press that downloads a txt file with diagnostics info to send us.

Similar to what's in the app

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.