agrc / auditor Goto Github PK
View Code? Open in Web Editor NEWYou... are AWESOME
License: MIT License
You... are AWESOME
License: MIT License
Add another org-wide checker to identify duplicate groupnames. Long-term solution to #45.
Once we get our authoritative datasets figured out, add a check/fix to set the authoritative flag. May require extending the metatables to add an authoritative flag.
Add method to remove duplicate tags.
Either add a new cli argument or create some other logic for capturing summary log info to pass to supervisor's messengers.
Sharing an item with SGID Open Data is currently a by-product of sharing it with an SGID group in AGOL. Auditor currently changes the group based on the fully-qualified table name in the meta table. Thus, getting Auditor to remove a dataset from Open Data would require either not processing that row (by changing the itemid field in the metatable) or shelving the dataset. Auditor would also overwrite any manual attempt to unshare the item from the SGID group.
There are two potential solutions to decouple this, and a combination of both may be desirable:
SGID Cadastre
and SGID Open Data Cadastre
?Add a check and fix for default visibility:
https://gist.github.com/stdavis/a6fd2716db37e6a62887ffde28572035
Either create the log directory or handle a FileNotFoundError.
Somehow summarize report_dict
for forklift logging. Don't want to pollute the main forklift log with all the report entries, but still need to keep them somewhere if we need to know what exactly what it changed.
Use the data from the stewardship doc to add a 'Last updated on ' line to the end or beginning of the AGOL description.
Read the stewardship doc in and add its info to the metatable_dict.
Can we check other org's data to see if they've got downloads enabled?
This would also entail somehow adding a different way to specify which items to check instead of just checking everything in the org (because we're looking at something outside the org).
Look for an AGRC tag in any location and replace it with UGRC, but putting this here I hope it will catch the ones in AGOL, I have put it in Sweeper too to catch ones in Metadata
Via supervisor, auditor includes credentials.REPORT_BASE_PATH
as an attachment in the email. However, this file won't be updated if there is an error that causes the checks/fixes to bomb out before they're finished. Thus, an error email will have an out-of-date report that causes confusion when troubleshooting.
One solution is to atomize the report writing so that auditor writes to the report for every item instead of after everything is finished (see TODO before def log_report()
).
List of questions for the larger group about tags:
Association of Governments
, Shelved
, Bioscience
, etc) (code is currently written to enforce this)SGID
(will also be tagged with Shelved
, Utah
, and AGRC
)?(More can be added later...)
Search AGOL for items with duplicate titles (either search for each one in the metatable, or do a site-wide search).
Most fixes use either item.update() or manager.update_definition() calls to perform the actual fix. Create a helper method that allows the object to build a list of fixes and then gets called after all the _fix methods have been run to perform the actual fixes.
This allows us to reduce the number of REST calls per item, hopefully reducing the time to run.
The creation process should include some informal time profiling to verify time savings.
Pass it an AGOL item id and have it audit that item.
Would need to restructure the pre check/fix logic to conditionally make the list of items to check from folders.
auditor needs a version 3 refactor and proper engineering.
Things to fix:
check_items
and fix_items
set the max time before clients check for new data to 23 hours for forklifted items (ie, in metatables), as forklift updates at most once every day.
manager = arcgis.features.FeatureLayerCollection.fromitem(item).manager
manager.update_definition({'cacheMaxAge': 82800}) #: 23 hours of seconds
Modify logging so that it automatically deletes old logs.
logging
's rotate features.Porter template was updated with a few things from the post (agrc/porter#154). Auditor could double check a lot of them.
Implement a more robust while true try except sleep logic for connection errors.
Will the utah code that you have be ok with Utah Utah County Parcels?
I thought this might be a goofy edge case or something that you might want to be aware of.
Current process:
metadata.tags
propertymetadata.tags
(see SGID.Water.UtahMajorLakes
' <SearchKeys>
XML tags), this then overwrites whatever tag fixes were done in steps 1 & 3.Need to fix the order so that the metadata overwrite doesn't stomp earlier changes.
You can't Ctrl-C to stop the fixes. Probably due to overly-aggressive try/finally statement.
The ItemID field may have non-GUID text to indicate layers that are hosted in others' AGOL orgs. Modify the read metatable function to handle these rows.
If you don't install via pip install -e .
, apparently the metadata export to xml doesn't work.
It throws an exception about No such file or directory c:\path\to\arcpy\scratch\folder\auditor\[agol_id].xml
Check the namespaces and how the different modules access each other.
Merge the two meta tables to remove confusion and complexity.
Update following projects/scripts after schema is solidified/updated:
Does it make more sense to save logs to a cloud storage bucket so that a person doesn't need access to the full GCVE machine to check the logs?
Would we want to keep the local log in place?
Could we do incremental logging to the same file (ie, append mode) to GCS for future auditor updates that write out incrementally?
Is there a better logging mechanism/pattern in GCP than just dumping files into buckets?
The title, group, and folder checks rely on the information in the metatable. Right now the script iterates over every item in the organization's folders. Item's that don't have a corresponding itemid in the metatable are still checked against tags, downloads, and delete protection.
Should these three checks still be done on non-metatable items, or should it only validate items found in the metatable?
It would be nice, at a glance, to be able to tell the status of a dataset without having to look at the tags. Using our brand thumbnail with static or shelved text on it would be a nice enhancement.
e.g. https://utah.maps.arcgis.com/home/item.html?id=110d43fbfacc411b8e74f4d12b63d881
Auditor is failing to successfully share the new Utah Wilderness layer to the 'Utah SGID Boundaries' group. It will properly share if done manually either through python or the web ui.
Need to debug the code while running the update on that one item to see if all the parameters are correct.
On line 32 in cli.py
, from .auditor import Auditor, credentials
, auditor.py
then imports arcpy. If arcpy can't get a license, it raises a RuntimeError or ValueError (don't remember which).
Because these imports happen before our supervisor object is set up, it appears to bomb out silently.
Either create the supervisor before the rest of the imports, or move the imports that result in the arcpy import after the supervisor has been created.
It would be beneficial to have all shelved and static data following the same naming convention.
Program is not catching HTTPErrors properly, possibly because try/except is in validate.py
instead of within the individual instance methods of checks.py
?
Checking Utah USGS 3DEP 1K Grid...
Traceback (most recent call last):
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 852, in get
resp = opener.open(url)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 498: 498
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\gis\git\agol-validator\__main__.py", line 47, in <module>
main()
File "c:\gis\git\agol-validator\__main__.py", line 40, in main
org_validator.check_items(report_dir)
File "c:\gis\git\agol-validator\validate.py", line 186, in check_items
checker.metadata_check()
File "c:\gis\git\agol-validator\checks.py", line 363, in metadata_check
if self.arcpy_metadata and self.arcpy_metadata.xml != self.item.metadata:
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\gis\__init__.py", line 6993, in __getattribute__
return super(Item, self).__getattribute__(name)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\gis\__init__.py", line 7462, in metadata
return self._portal.con.get(metadataurlpath, try_json=False)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 927, in get
return self.get(newpath, ssl, try_json, is_retry=True)
File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 816, in get
params['f'] = 'json'
TypeError: 'bool' object does not support item assignment
AGOL appears to be very picky about the xml you feed it as part of item.metadata = some_metdata_xml
(or item.update(metadata=path_to_xml
). The xml needs to be in the internal ArcGIS Metadata format (https://doc.arcgis.com/en/arcgis-online/manage-data/metadata.htm#ESRI_SECTION1_CE02409EE61D4A51A2BB943A2D8D982F, https://desktop.arcgis.com/en/arcmap/latest/manage-data/metadata/the-arcgis-metadata-format.htm). One possible way to make sure the xml is in this format is to export it using ArcGIS Desktop's exact copy of.xslt
template (found in <install dir>\Metadata\Stylesheets\gpTools
) and then update the AGOL metadata from this file.
Need to adjust the metadata fixer method to use this procedure. May also need to handle converting the SDE's metadata to the ArcGIS Metadata format (either in-place or make a copy.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.