trusat / trusat-backend Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Currently, some of these large queries take ~1s. To get times, run pytest -s --durations 20
.
Query times could doubtless be improved (rough target: take 90% less time) with some schema and query optimisation.
Currently, we have a few database queries that format timestamps into dates, like 'observation_time': 'September 18, 2016'
I get that we have to serialise to some format to send over JSON, but compared with something like:
'observation_time': '2016-09-18T10:22:44.411Z'
the current format seems like a weird choice for several reasons:
Surely it's better to pass the date in a format that can be automatically formatted back into a JavaScript date, so that the front end can format as it sees fit without back-end changes. (E.g. there's a fashion for using "moments", like "5 hours ago" or "6 months ago".)
If there is more than one TLE with the joint-max epoch for a given satellite, then the epoch queries will return duplicates.
(On my test DB, there are two such objects.)
We should probably return only one TLE for each object. Not clear how to choose...
FROM TLE
GROUP BY satellite_number)
select epoch, satellite_number, count(*) as c
from TLE inner join max_epochs on (epoch = max_epochs.max_epoch and satellite_number = max_epochs.satnum)
group by epoch, satellite_number
order by c desc;```
To improve security, the RSA private key should be encrypted files. In order to decrypt the private key, there should be an environmental variable of the key used to decrypt this file.
For security (https://www.explainxkcd.com/wiki/index.php/327:_Exploits_of_a_Mom). Otherwise malicious users can probably delete all our data.
Perhaps also for performance (in an Oracle database, using bind variables allows query execution plans to be shared between queries that are identical apart from the bind variables, which can significantly speed up execution; not sure about mariadb.)
See https://github.com/consensys-space/sathunt-database/pull/23/commits/9c861dc562dbac15629b54bb0ba64f59feb8c7ac for an example.
There are many places where the API code catches exceptions, then prints details of an error and returns "200 OK" regardless.
I'm not sure how this works with the front-end but my suspicion is that errors will often go undetected. In some cases we probably just show no data rather than an error message. In other cases it may appear that actions were successful when they really failed.
Related to this, do we (i.e. TruSat as an organisation, rather than our users) have a way of detecting and investigating errors that happen during API calls?
The user count, user name and last seen date all appear to be wrong. (E.g. see this Slack thread).
Query just seems not quite right. GROUP BY is grouping by users but isn't picking the max observation date for each user so the choice of user ends up being random. And the "user count" seems to reflect the number of observations made by this randomly selected user, rather than the number of distinct users.
I haven't checked the other data.
The following reference should be taken from Observer.name field (query needs to be update)
We should create a lookup function to facilitate "origin" info when UCS does not have the data but Celestrak SATCAT does.
The fields in the celestrak_SATCAT.source field are based on the fields here: https://www.celestrak.com/satcat/sources.php
These fields don't exactly map to the UCS "obj_country_owner" field, but are closer to "owner / operator"
Following unit test updates, both SeeSat archive import scripts appear to be "paused," in a Ctrl-S terminal way, in the process of importing. Ctrl-C in the terminal allows them to continue, but it is unclear what is the underlying cause, or if data loss happens because of the Ctrl-C.
Debugger sessions have yet to refine the location of the problem itself.
From terminal feedback with the -V flag:
https://github.com/consensys-space/trusat-backend/blob/257bf4606147dbd927f3c3b9acf453b475c22656/database_tools/read_seesat_mbox.py#L346
...the script appears to consistently "pause" after the following lines:
Found 67 IOD obs in msg: 2015-03-19 23:23:00+01:00 LB Obs 2015 Mar 19
Found 162 IOD obs in msg: 2018-04-21 10:05:40+02:00 LB Obs 2018 Apr 20-21 night
Found 2 IOD obs in msg: 2019-03-27 22:56:50+01:00 Obs 2019 Mar 27 pm
Found 3 IOD obs in msg: 2019-09-20 07:28:25-04:00 slow moving unid seen on Sept 19
Currently the frontend does not support Response Status Codes for error handling. This needs to be fixed concurrently.
Reference Celestrak_SATCAT database to prevent objects with "ops_status_code=D" from being returned in query results and TLE files.
From a code read, it looks like any failure to send emails to new users will happen silently. send_message
prints an error message, but then the API will return 200 and the front-end will think everything is OK.
The email we send says "This email is the only time we can send you this code" so if that's true (is it? what makes it true?) then failed emails could leave users without any access to their accounts.
Some catalog queries are missing the GROUP BY OBJECT_NUMBER
altogether. See https://github.com/consensys-space/sathunt-database/issues/20.
The catalog queries that do have a GROUP BY clause should be checked to make sure that they select the correct observation details (obs time, user, eth address and obs quality). I cannot see any attempt to choose the most recent observation so I suspect they do not.
Move the following functionality into database.py, similar to addParsedIOD()'s vulnerability with executemany() inserts. https://github.com/consensys-space/trusat-tle/blob/48bac766bbcc8db23c043393d3b7034c24a807de/tle_util.py#L968-L973
Add indexes for the following columns:
Currently, sensitive information is stored in files which is less secure than moving to environmental variables.
I'm using this issue to note some misc questions/concerns about some coding patterns in this repo, such as:
except
blocks, where most should only catch known, anticipated errors to avoid swallowing something truly unexpected that needs our attentionexcept
blocks that silently swallow and hide errors.server.py
login.txt
where environment variables might be more appropriateself.send_header('Access-Control-Allow-Origin', '*')
is repeated in many places, and all will need to change in the same way if/when we update our CORS policy)databse.py
except
where it may be preferable to check for a non-exceptional (e.g. row count = 0
) edge case in advance (e.g. d6ab307)A lot of this stuff can be fixed as and when the code is updated or refactored for other reasons.
This will prevent from re-distribution of other TLEs which may be in the catalog for test purposes.
Python's built in HTTP Server only implements basic security checks as stated in their documentation.
Similar to #10, user queries selectUserObservationHistory_JSON
, selectObjectUserSightings_JSON
, selectUserObjectsObserved_JSON
seem like they probably limit results to a single station (untested).
Some of the queries also seem like they return duplicate results on inconsistent info.
This bug may be most visible to our most important users, some of whom seem to use 10 or 11 different stations.
Put a timeout on Signup and Claim endpoints to prevent users from sending excess emails to target address.
Check-box used to confirm no password reset #mobile rendering is too small @johngribbin
When a user submits an observation, the results are not visible until they refresh the page.
Example:
https://www.trusat.org/object/22970
Observations shows "old" results, whereas the History and "My Observations" tab show newer, fitted results - and newer TLEs exist than the observations on "Observations."
Also, it seems to favor observations by SolarLyra on 31 October.
Some station_num
s have multiple entries in the Station table (see below). This is unsatisfactory, as it theoretically leaves us unable to link an observation to a user, and when linking observations to users the resulting need for de-duplication makes queries more complex and slower (or, more commonly, where there is currently no de-duplication, it just makes them wrong).
Two possibilities:
station_num
a unique key on the Station
table. When we know there is only one row per station in the Station
table, queries become easier (or, in some cases, the existing query syntax becomes correct rather than bugged :).Note: some queries attempt a LIMIT 1
operation to choose a single user, but do so in a bugged way: #10.
station_num | user | initial | latitude | longitude | elevation_m | name | MPC | details | preferred_format | source_url | notes |
---|---|---|---|---|---|---|---|---|---|---|---|
8835 | 31 | 35.1477 | 90.0135 | 75 | IDSat | IDSat COSPAR.TXT | |||||
8835 | 31 | JN | 35.1477 | -90.0135 | 75 | Jim Nix | IOD | http://www.satobs.org/seesat/Jan-2002/0276.html | |||
8935 | 31 | JN | 35.3166 | -89.886 | 94 | Jim Nix | IOD | http://www.satobs.org/seesat/Jun-1999/0356.html | Added | ||
8935 | 31 | 35.3166 | 89.886 | 94 | IDSat | IDSat COSPAR.TXT | |||||
8936 | 31 | JN | 35.1231 | -89.9354 | 90 | Jim Nix | IOD | http://www.satobs.org/seesat/Sep-2000/0311.html | Added | ||
8936 | 31 | 35.2131 | 89.9354 | 90 | IDSat | IDSat COSPAR.TXT | |||||
9127 | 39 | -27.591 | -153.05 | 19 | IDSat | IDSat COSPAR.TXT | |||||
9127 | 39 | JM | -27.591 | 153.05 | 19 | Jim McManus | IOD | http://www.satobs.org/seesat/Aug-2001/0200.html |
The following query produces an on-demand relation of object info from three sources - Celestrak Categories, Celestrak SATCAT info, and UCS DB.
It would be more efficient to just perform this query ONCE, anytime when one of those data sources are updated, and store it in a table/static result.
As suggested in #9:
opensatcat -> trusat_production
opensatcat_dev -> trusat_dev
Example:
https://www.trusat.org/profile/0x9489ebC9a5f5Cfb17247709F9dF29B2064d1FE1e
Shows 3 Observations, when only two are present in the database. Most likely scenario is that it is grabbing the number of IODs that have been used in TLEs.
User information is available in the snapshots and tests that need to be removed.
Objects like https://devvymcdevface.trusat.org/object/96041 will fail due to no recorded observation.
google_email.py has repeated and deprecated code. Error handling should also be improved.
It looks like the Celestrak website format has been updated (perhaps since adding new visualization option). As result, the categorize.py script no longer finds the links on the page in the following code section:
Only the /catlog/debris
and /catalog/undisclosed
group results by object ID before returning.
Other /catalog/...
queries will return many results per object, perhaps more than one per IOD because observations from stations with more than one user will list an observation per user rather than per station.
The intention seems to be to return one result per object?
If we introduce a GROUP BY
clause to the other catalog queries, we must take care to select the correct observation details (obs time, user, eth address and obs quality). (Similarly to the "multiple users per station" problem, we must take care not to return many results per object if an object has multiple IODs with the same "latest" timestamp.)
In order to prevent re-processing of data for Stations (users) who have opt-ed out of TruSat, a number of enhancements should be implemented to prevent this data from being processed in TruSat.org and potentially other future decentralized implementations.
At present, this would appear to apply to:
If the data is manually removed from the database, this should prevent re-introduction of the data in future import sessions.
observer_id INTEGER,
I think the API should return headers that encourage browsers to cache API results, even if it's only for 5mins*. This is pretty trivial to implement, and could greatly improve UX. (It should also reduce database/API load significantly, which isn't an issue right now but could be in the future).
For bonus points, some data (e.g. historical observations) could be safely cached for much longer periods.
* I chose 5mins because I saw user stories like "changes must be seen within 5mins".
Similar to #10, the selectObjectInfluence_JSON
query seems like it probably limit results to a single station (untested).
Make acceptance observations contingent on opt-in of station owner.
As-written, this only allows:
Common Data Block
Date of Observation
First record
Where there can be multiple observation dates, and records per submission.
Recommended solution is to update iod.py IOD/RDE/UK parsing routines to also return an array of lines indicating which lines succeeded, and which lines failed to import.
Also suggest renaming the AWS / RDS database name instances at the same time:
opensatcat -> trusat_production
opensatcat -> trusat_dev
This was doing the wrong thing so I had to take a best guess at the requirements when rewriting it.
As of my fix:
notably observation_quality
refers to the observation made by the selected user.notably username_last_tracked
, time_last_tracked
, and address_last_tracked
all refer to the most recent observation of the same object by any user.object_secondary_purpose
was returning a placeholder string mentioning team members, so I took it out and we now return the empty string until we decide what to do with this field.observation_quality
was returning a code like "E", which I updated to a short description like "excellent".Not 100% clear that I've got those requirements right?
If I have, is it weird that we don't tell the selected user the date/time at which they last observed each object?
Attempted to follow the sign up flow and received the email with my secret. But as the endpoint returned a CORS error, I didn't receive the "success" message on the front end to inform the user to check their email for further details on how to log in.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://api.consensys.space:8080/signup. (Reason: CORS request did not succeed).
https://github.com/consensys-space/trusat-tle/issues/4
Unit tests from @nmclrn would suggest that this is not entirely resolved.
The catalog queries all make use of the following subquery:
JOIN (SELECT
Station.station_num as station_num,
Station.user as station_user,
Observer.id as obs_id,
Observer.eth_addr as eth_addr,
Observer.name as user_name
FROM Station,Observer
WHERE Station.user = Observer.id
LIMIT 1) Obs ON ParsedIOD.station_number = Obs.station_num
I think the LIMIT 1
here is a bug. It has the effect of limiting the whole catalog result set to a single station number (on my DB, ParsedIOD.station_number = 2420
), which I don’t think is what we want. I believe I’ve inadvertently fixed some cases as part of query optimization, but other cases need attention.
For any given object, they pick a TLE arbitrarily.
Several probable issues with this:
More generally, it's worth being acutely aware of how difficult security is in JS. Much of the following article does not apply to us, and some of it is out of date, but some of it is relevant and it's a fun read. :)
https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2011/august/javascript-cryptography-considered-harmful/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.