trusat / trusat-backend Goto Github PK

View Code? Open in Web Editor NEW

32.0 5.0 6.0 12.91 MB

License: Apache License 2.0

Python 99.95% Shell 0.05%

satellite satellite-data satellite-tracking satellites space python flask-backend flask ethereum open-data

trusat-backend's People

Contributors

Stargazers

Watchers

Forkers

vikashkodati mainnebula cameronfr zzlljj rbozydar awon14

trusat-backend's Issues

Catalog queries could be much faster

Currently, some of these large queries take ~1s. To get times, run pytest -s --durations 20.

Query times could doubtless be improved (rough target: take 90% less time) with some schema and query optimisation.

Time formatting should probably happen closer to the user

Currently, we have a few database queries that format timestamps into dates, like 'observation_time': 'September 18, 2016'

I get that we have to serialise to some format to send over JSON, but compared with something like:
'observation_time': '2016-09-18T10:22:44.411Z' the current format seems like a weird choice for several reasons:

It loses precision; the time of day is thrown away for no clear reason.
IMHO it's an unusual way of writing dates
It needlessly introduces English words for months, when numeric dates would be language agnostic and better for a global audience
It doesn't sort easily like ISO 8601 would (https://xkcd.com/1179/)
It makes it harder to adapt how dates are displayed for a given user's locale

Surely it's better to pass the date in a format that can be automatically formatted back into a JavaScript date, so that the front end can format as it sees fit without back-end changes. (E.g. there's a fashion for using "moments", like "5 hours ago" or "6 months ago".)

TLE queries/endpoints can return a small number of duplicates

If there is more than one TLE with the joint-max epoch for a given satellite, then the epoch queries will return duplicates.

(On my test DB, there are two such objects.)

We should probably return only one TLE for each object. Not clear how to choose...

FROM TLE
GROUP BY satellite_number)
select epoch, satellite_number, count(*) as c
from TLE inner join max_epochs on (epoch = max_epochs.max_epoch and satellite_number = max_epochs.satnum)
group by epoch, satellite_number
order by c desc;```

Encrypt RSA Private Key

To improve security, the RSA private key should be encrypted files. In order to decrypt the private key, there should be an environmental variable of the key used to decrypt this file.

Queries must use bind variables

For security (https://www.explainxkcd.com/wiki/index.php/327:_Exploits_of_a_Mom). Otherwise malicious users can probably delete all our data.

Perhaps also for performance (in an Oracle database, using bind variables allows query execution plans to be shared between queries that are identical apart from the bind variables, which can significantly speed up execution; not sure about mariadb.)

See https://github.com/consensys-space/sathunt-database/pull/23/commits/9c861dc562dbac15629b54bb0ba64f59feb8c7ac for an example.

Error handling / monitoring on the API may be deficient

There are many places where the API code catches exceptions, then prints details of an error and returns "200 OK" regardless.

I'm not sure how this works with the front-end but my suspicion is that errors will often go undetected. In some cases we probably just show no data rather than an error message. In other cases it may appear that actions were successful when they really failed.

Related to this, do we (i.e. TruSat as an organisation, rather than our users) have a way of detecting and investigating errors that happen during API calls?

selectObjectInfo_JSON returns incorrect results

The user count, user name and last seen date all appear to be wrong. (E.g. see this Slack thread).

Query just seems not quite right. GROUP BY is grouping by users but isn't picking the max observation date for each user so the choice of user ends up being random. And the "user count" seems to reflect the number of observations made by this randomly selected user, rather than the number of distinct users.

I haven't checked the other data.

ParsedIOD.user_string needs to be removed from database.py and frontend

The following reference should be taken from Observer.name field (query needs to be update)

https://github.com/consensys-space/trusat-backend/blob/f59f48b0c079da5260b1a93922be8c46a61d13aa/database.py#L2675

Celestrak SATCAT Source countries need a lookup function

We should create a lookup function to facilitate "origin" info when UCS does not have the data but Celestrak SATCAT does.

The fields in the celestrak_SATCAT.source field are based on the fields here: https://www.celestrak.com/satcat/sources.php

These fields don't exactly map to the UCS "obj_country_owner" field, but are closer to "owner / operator"

selectObjectInfluence_JSON may duplicate some results

15d0697 fixed the biggest selectObjectInfluence_JSON problem (#16), but at least one other still exists.

Observations from stations with multiple entries in the Observer table will be duplicated

Update README

read_seesat_mbox.py and read_seesat_hypermail.py stall on import

Following unit test updates, both SeeSat archive import scripts appear to be "paused," in a Ctrl-S terminal way, in the process of importing. Ctrl-C in the terminal allows them to continue, but it is unclear what is the underlying cause, or if data loss happens because of the Ctrl-C.

Debugger sessions have yet to refine the location of the problem itself.

From terminal feedback with the -V flag:
https://github.com/consensys-space/trusat-backend/blob/257bf4606147dbd927f3c3b9acf453b475c22656/database_tools/read_seesat_mbox.py#L346

...the script appears to consistently "pause" after the following lines:

Found  67 IOD obs in msg: 2015-03-19 23:23:00+01:00 LB Obs 2015 Mar 19
Found 162 IOD obs in msg: 2018-04-21 10:05:40+02:00 LB Obs 2018 Apr 20-21 night
Found   2 IOD obs in msg: 2019-03-27 22:56:50+01:00 Obs 2019 Mar 27 pm
Found   3 IOD obs in msg: 2019-09-20 07:28:25-04:00 slow moving unid seen on Sept 19

Adopt Proper Response Status Codes

Currently the frontend does not support Response Status Codes for error handling. This needs to be fixed concurrently.

Prevent decayed/re-entered objects from being in Catalog / TLE results

Reference Celestrak_SATCAT database to prevent objects with "ops_status_code=D" from being returned in query results and TLE files.

No error handling when sending emails

From a code read, it looks like any failure to send emails to new users will happen silently. send_message prints an error message, but then the API will return 200 and the front-end will think everything is OK.

The email we send says "This email is the only time we can send you this code" so if that's true (is it? what makes it true?) then failed emails could leave users without any access to their accounts.

Suspcted bugs on grouped catalog queries

Some catalog queries are missing the GROUP BY OBJECT_NUMBER altogether. See https://github.com/consensys-space/sathunt-database/issues/20.

The catalog queries that do have a GROUP BY clause should be checked to make sure that they select the correct observation details (obs time, user, eth address and obs quality). I cannot see any attempt to choose the most recent observation so I suspect they do not.

addTLE() vulnerable to duplicate entries in the same file causing INSERT infinite loop

Move the following functionality into database.py, similar to addParsedIOD()'s vulnerability with executemany() inserts. https://github.com/consensys-space/trusat-tle/blob/48bac766bbcc8db23c043393d3b7034c24a807de/tle_util.py#L968-L973

https://github.com/consensys-space/trusat-backend/blob/cab129cac3cc6cfae2d4fd2f9b66a155f94027d8/database_tools/db_import_tles.py#L243

Create indexes for celestrak_SATCAT

Add indexes for the following columns:

name
norad_num
intl_desg

Move sensitive information to environmental variables

Currently, sensitive information is stored in files which is less secure than moving to environmental variables.

Code maintainability/robustness improvements/refactors

I'm using this issue to note some misc questions/concerns about some coding patterns in this repo, such as:

General

Catch-all exceptblocks, where most should only catch known, anticipated errors to avoid swallowing something truly unexpected that needs our attention
except blocks that silently swallow and hide errors.
some functions have no error handling and it's not clear how they will behave in error cases. This should be documented. The act of documenting it might force us to think about what really happens, and perhaps to change it.
More generally, many functions have missing, minimal or incomplete docstrings.

`server.py`

The use of login.txt where environment variables might be more appropriate
Lots of duplicate or boilerplate code (e.g. self.send_header('Access-Control-Allow-Origin', '*') is repeated in many places, and all will need to change in the same way if/when we update our CORS policy)

`databse.py`

The extensive use of prepared statements and prepared cursors, even where they add little benefit and make the code more verbose
Executing multiple queries where one would do (e.g. d710031, b02510e)
Use of except where it may be preferable to check for a non-exceptional (e.g. row count = 0) edge case in advance (e.g. d6ab307)
Lots of unnecessary code for "sqllite"; maybe this is needed and I just haven't grasped why we need this?

A lot of this stuff can be fixed as and when the code is updated or refactored for other reasons.

Update "Latest TLE" query to only select TruSat-generated TLEs

This will prevent from re-distribution of other TLEs which may be in the catalog for test purposes.

https://github.com/consensys-space/trusat-backend/blob/4ea9290eef5f016937554ca8b6892a6798b769fa/database.py#L322-L327

Move away from built in Python HTTP Server due to lack of security checks built in

Python's built in HTTP Server only implements basic security checks as stated in their documentation.

User queries do the wrong thing

Similar to #10, user queries selectUserObservationHistory_JSON, selectObjectUserSightings_JSON, selectUserObjectsObserved_JSON seem like they probably limit results to a single station (untested).

Some of the queries also seem like they return duplicate results on inconsistent info.

This bug may be most visible to our most important users, some of whom seem to use 10 or 11 different stations.

Prevent Email Spamming by Limiting Signup and Claim

Put a timeout on Signup and Claim endpoints to prevent users from sending excess emails to target address.

CSRF Token Should be implemented

Recovery password consent check-box on mobile rendering is too small

Check-box used to confirm no password reset #mobile rendering is too small @johngribbin

Invalidate Cache When User Submits Observation

When a user submits an observation, the results are not visible until they refresh the page.

"Observations" tab does not appear to be showing intended results

Example:
https://www.trusat.org/object/22970

Observations shows "old" results, whereas the History and "My Observations" tab show newer, fitted results - and newer TLEs exist than the observations on "Observations."

Also, it seems to favor observations by SolarLyra on 31 October.

The mapping from an IOD to its user (i.e. observer) should be unambiguous

Some station_nums have multiple entries in the Station table (see below). This is unsatisfactory, as it theoretically leaves us unable to link an observation to a user, and when linking observations to users the resulting need for de-duplication makes queries more complex and slower (or, more commonly, where there is currently no de-duplication, it just makes them wrong).

Two possibilities:

A Station can have only ever have one user. (This seems to be the reality, for now?) We should enforce this in our processes, and also in the database by making station_num a unique key on the Station table. When we know there is only one row per station in the Station table, queries become easier (or, in some cases, the existing query syntax becomes correct rather than bugged :).
A station can have multiple users. In this case, which user should be assumed to have made an observation from a multi-user station? Can we save info about this with the observation? If not, what heuristic should we use to decide on the user? If the heuristic gives us a constant answer for any given station, we may wish to save this answer as a view or in its own table. Existing queries should be updated to use this new table or view, to ensure that observations from the 4 stations below are not duplicated.

Note: some queries attempt a LIMIT 1 operation to choose a single user, but do so in a bugged way: #10.

station_num	user	initial	latitude	longitude	elevation_m	name	preferred_format	source_url	notes
8835	31		35.1477	90.0135	75	IDSat		IDSat COSPAR.TXT
8835	31	JN	35.1477	-90.0135	75	Jim Nix	IOD	http://www.satobs.org/seesat/Jan-2002/0276.html
8935	31	JN	35.3166	-89.886	94	Jim Nix	IOD	http://www.satobs.org/seesat/Jun-1999/0356.html	Added
8935	31		35.3166	89.886	94	IDSat		IDSat COSPAR.TXT
8936	31	JN	35.1231	-89.9354	90	Jim Nix	IOD	http://www.satobs.org/seesat/Sep-2000/0311.html	Added
8936	31		35.2131	89.9354	90	IDSat		IDSat COSPAR.TXT
9127	39		-27.591	-153.05	19	IDSat		IDSat COSPAR.TXT
9127	39	JM	-27.591	153.05	19	Jim McManus	IOD	http://www.satobs.org/seesat/Aug-2001/0200.html

Set up filtered object info on Category, SatCat or UCS import

The following query produces an on-demand relation of object info from three sources - Celestrak Categories, Celestrak SATCAT info, and UCS DB.

https://github.com/consensys-space/trusat-backend/blob/5a10afea6bb862469f713dcb40621cc0be714112/database.py#L344-L360

It would be more efficient to just perform this query ONCE, anytime when one of those data sources are updated, and store it in a table/static result.

Rename RDS / AWS instances

As suggested in #9:

opensatcat -> trusat_production
opensatcat_dev -> trusat_dev

More observations showing than present in database

Example:
https://www.trusat.org/profile/0x9489ebC9a5f5Cfb17247709F9dF29B2064d1FE1e

Shows 3 Observations, when only two are present in the database. Most likely scenario is that it is grabbing the number of IODs that have been used in TLEs.

Delete Database Snapshot and old Tests

User information is available in the snapshots and tests that need to be removed.

Object Info fails if there is no observation

Objects like https://devvymcdevface.trusat.org/object/96041 will fail due to no recorded observation.

Refactor google_email.py

google_email.py has repeated and deprecated code. Error handling should also be improved.

Celestrak Website/Table format changed - breaking categorize.py

It looks like the Celestrak website format has been updated (perhaps since adding new visualization option). As result, the categorize.py script no longer finds the links on the page in the following code section:

https://github.com/consensys-space/trusat-backend/blob/aa54588d5f9815596f2a84b19bf2dd2657368546/database_tools/categorize.py#L113-L123

Catalog queries return many results per object rather than one per object

Only the /catlog/debris and /catalog/undisclosed group results by object ID before returning.

Other /catalog/... queries will return many results per object, perhaps more than one per IOD because observations from stations with more than one user will list an observation per user rather than per station.

The intention seems to be to return one result per object?

If we introduce a GROUP BY clause to the other catalog queries, we must take care to select the correct observation details (obs time, user, eth address and obs quality). (Similarly to the "multiple users per station" problem, we must take care not to return many results per object if an object has multiple IODs with the same "latest" timestamp.)

Implement code to acknowledge opt_out users/stations

In order to prevent re-processing of data for Stations (users) who have opt-ed out of TruSat, a number of enhancements should be implemented to prevent this data from being processed in TruSat.org and potentially other future decentralized implementations.

At present, this would appear to apply to:

SQL Station Table (a place to store the flag per station)
addParsedIOD() - Check for flagged stations before submitting
Any queries which JOIN with the Station table - the station_num will need to be stored, but location data will be nulled out (currently a bug that needs to be filed). This is probably not an issue because there will also be no ParsedIOD records for the station.

If the data is manually removed from the database, this should prevent re-introduction of the data in future import sessions.

observer_id switched to their ethereum address

        observer_id    INTEGER,

Cache API results in the browser

I think the API should return headers that encourage browsers to cache API results, even if it's only for 5mins*. This is pretty trivial to implement, and could greatly improve UX. (It should also reduce database/API load significantly, which isn't an issue right now but could be in the future).

For bonus points, some data (e.g. historical observations) could be safely cached for much longer periods.

* I chose 5mins because I saw user stories like "changes must be seen within 5mins".

Object influence query seems to limit to a single user

Similar to #10, the selectObjectInfluence_JSON query seems like it probably limit results to a single station (untested).

Prevent opt_out stations from being accepted in addParsedIOD()

Make acceptance observations contingent on opt-in of station owner.

RDE submissions via website will only succeed for the first observation.

https://github.com/consensys-space/trusat-backend/blob/d847b0188f81d4bd68bb3e435f50241759d959a4/database.py#L991-L996

As-written, this only allows:
Common Data Block
Date of Observation
First record

Where there can be multiple observation dates, and records per submission.

Recommended solution is to update iod.py IOD/RDE/UK parsing routines to also return an array of lines indicating which lines succeeded, and which lines failed to import.

merge sathunt-database into -backend repo

Also suggest renaming the AWS / RDS database name instances at the same time:
opensatcat -> trusat_production
opensatcat -> trusat_dev

Does `selectUserObjectsObserved_JSON` return the right mix of info?

This was doing the wrong thing so I had to take a best guess at the requirements when rewriting it.

As of my fix:

notably observation_quality refers to the observation made by the selected user.
notably username_last_tracked, time_last_tracked, and address_last_tracked all refer to the most recent observation of the same object by any user.
object_secondary_purpose was returning a placeholder string mentioning team members, so I took it out and we now return the empty string until we decide what to do with this field.
observation_quality was returning a code like "E", which I updated to a short description like "excellent".

Not 100% clear that I've got those requirements right?

If I have, is it weird that we don't tell the selected user the date/time at which they last observed each object?

/signup endpoint throwing a CORS error

Attempted to follow the sign up flow and received the email with my secret. But as the endpoint returned a CORS error, I didn't receive the "success" message on the front end to inform the user to check their email for further details on how to log in.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://api.consensys.space:8080/signup. (Reason: CORS request did not succeed).

Re-open TLE.epoch is slightly changed in round-trip from TLE file->DB->...

https://github.com/consensys-space/trusat-tle/issues/4

Unit tests from @nmclrn would suggest that this is not entirely resolved.

Catalog queries incorrectly restrict to a single station_number

The catalog queries all make use of the following subquery:

     JOIN (SELECT
                Station.station_num as station_num,
                Station.user as station_user,
                Observer.id as obs_id,
                Observer.eth_addr as eth_addr,
                Observer.name as user_name
                FROM Station,Observer
                WHERE Station.user = Observer.id
                LIMIT 1) Obs ON ParsedIOD.station_number = Obs.station_num

I think the LIMIT 1 here is a bug. It has the effect of limiting the whole catalog result set to a single station number (on my DB, ParsedIOD.station_number = 2420), which I don’t think is what we want. I believe I’ve inadvertently fixed some cases as part of query optimization, but other cases need attention.

TLE queries do not return the latest TLE for each object

For any given object, they pick a TLE arbitrarily.

We must not use an integer "nonce" as the JWT signing key

Several probable issues with this:

the domain of the integers used is far (dozens of orders of magnitude?) too small. It's currently easy for anyone to find valid JWTs for any user by brute force, and anyone reading the code will know this.
I am not an expert, but I've not heard of a signing key being called a "nonce" before
if we rename the "nonce" to "key", some problems with the way we are generating and storing it, unencrypted, become more obvious. For a start, anyone with read database access can see all keys and (even if we expand the key domain to something more sensible) can trivially log in as any user without being detected.

More generally, it's worth being acutely aware of how difficult security is in JS. Much of the following article does not apply to us, and some of it is out of date, but some of it is relevant and it's a fun read. :)
https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2011/august/javascript-cryptography-considered-harmful/

trusat / trusat-backend Goto Github PK

trusat-backend's People

Contributors

Stargazers

Watchers

Forkers

trusat-backend's Issues

General

server.py

databse.py

Recommend Projects

Recommend Topics

Recommend Org

`server.py`

`databse.py`