Coder Social home page Coder Social logo

auditor's People

Contributors

dependabot[bot] avatar jacobdadams avatar steveoh avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

auditor's Issues

Authoritative check/fix

Once we get our authoritative datasets figured out, add a check/fix to set the authoritative flag. May require extending the metatables to add an authoritative flag.

Include an explicit indicator for SGID group/Open Data sharing in the meta table(s)

Sharing an item with SGID Open Data is currently a by-product of sharing it with an SGID group in AGOL. Auditor currently changes the group based on the fully-qualified table name in the meta table. Thus, getting Auditor to remove a dataset from Open Data would require either not processing that row (by changing the itemid field in the metatable) or shelving the dataset. Auditor would also overwrite any manual attempt to unshare the item from the SGID group.

There are two potential solutions to decouple this, and a combination of both may be desirable:

  1. Explicitly define the AGOL group in the metatable. This involves a duplication of information (group is encoded in both the table name and the new field) and could lead to out-of-sync issues between the database group and the AGOL group, but it gives us more fine-grained control over the AGOL Group.
  2. Create a single "Open Data" group to control Open Data sharing and remove the "Share with Open Data" from the SGID AGOL groups. This may require changing how our Open Data landing page icons work (I think they're currently based on groups), but I need to dig into this more.
    • Also, this could change how external orgs share their data with our open data groups. Maybe we need two groups per category, an SGID group and an SGID Open Data Group? So SGID Cadastre and SGID Open Data Cadastre?

Empty tag

image

I'm not sure what is causing this but I would assume the validator did this or if not, it should be able to fix this.

Add latest update date from stewardship doc

Use the data from the stewardship doc to add a 'Last updated on ' line to the end or beginning of the AGOL description.

Read the stewardship doc in and add its info to the metatable_dict.

Check third-party services on Open Data for enabled downloads

Can we check other org's data to see if they've got downloads enabled?

This would also entail somehow adding a different way to specify which items to check instead of just checking everything in the org (because we're looking at something outside the org).

Replace AGRC tags anywhere in a system

Look for an AGRC tag in any location and replace it with UGRC, but putting this here I hope it will catch the ones in AGOL, I have put it in Sweeper too to catch ones in Metadata

Auditor always includes the last copy of the report, even if it doesn't correspond to the latest run

Via supervisor, auditor includes credentials.REPORT_BASE_PATH as an attachment in the email. However, this file won't be updated if there is an error that causes the checks/fixes to bomb out before they're finished. Thus, an error email will have an out-of-date report that causes confusion when troubleshooting.

One solution is to atomize the report writing so that auditor writes to the report for every item instead of after everything is finished (see TODO before def log_report()).

Tag Questions

List of questions for the larger group about tags:

  • Should all tags be proper cased? (ie, Association of Governments, Shelved, Bioscience, etc) (code is currently written to enforce this)
  • Should shelved data be tagged with SGID (will also be tagged with Shelved, Utah, and AGRC)?

(More can be added later...)

Find duplicate AGOL titles

Search AGOL for items with duplicate titles (either search for each one in the metatable, or do a site-wide search).

Group all .update() and .update_definition() calls in fixes

Most fixes use either item.update() or manager.update_definition() calls to perform the actual fix. Create a helper method that allows the object to build a list of fixes and then gets called after all the _fix methods have been run to perform the actual fixes.

This allows us to reduce the number of REST calls per item, hopefully reducing the time to run.

The creation process should include some informal time profiling to verify time savings.

Run for single item

Pass it an AGOL item id and have it audit that item.

Would need to restructure the pre check/fix logic to conditionally make the list of items to check from folders.

Refactor code

auditor needs a version 3 refactor and proper engineering.

Things to fix:

  • Creation of summary report for notification- should this be done via logging? Is there a better way?
  • General OOP design- all the properties!
  • verbose: wrap into cli_handler? some other logging?
  • Creation of audit report - single point, rather than calling at end of both check_items and fix_items
  • org-wide checks- where do they best live?
  • proper unit tests

Check and fix cacheMaxAge

set the max time before clients check for new data to 23 hours for forklifted items (ie, in metatables), as forklift updates at most once every day.

manager = arcgis.features.FeatureLayerCollection.fromitem(item).manager
manager.update_definition({'cacheMaxAge': 82800}) #: 23 hours of seconds

Rotate check/fixes logging

Modify logging so that it automatically deletes old logs.

  • Every 2 weeks?
  • Investigate logging's rotate features.

Improve tags

image

I don't think the name should be in the tags but paths should be. I did a search for trails and paths agrc and it showed something from wfrc

Any updates to AGOL tags (and maybe other fields) resulting from updating the metadata aren't checked

Current process:

  1. Check existing tags, save to dictionary
    • AGOL tags
    • Any tags exposed by arcpy's metadata.tags property
  2. Check if metadata needs to be overwritten
  3. Fix tags based on saved tags from first step
  4. Overwrite AGOL metadata with XML from SGID
    • If there are any tags hiding in the XML that don't get exposed via metadata.tags (see SGID.Water.UtahMajorLakes' <SearchKeys> XML tags), this then overwrites whatever tag fixes were done in steps 1 & 3.

Need to fix the order so that the metadata overwrite doesn't stomp earlier changes.

Merge AGOLItems_shelved into AGOLItems

Merge the two meta tables to remove confusion and complexity.

  • Extend AGOLItems schema to include all AGOLItems_shelved info
  • Add Disclaimer/Use Limitations column (for signifying which datasets should use the standard disclaimer)

Update following projects/scripts after schema is solidified/updated:

feat: Save logs to cloud storage?

Does it make more sense to save logs to a cloud storage bucket so that a person doesn't need access to the full GCVE machine to check the logs?

Would we want to keep the local log in place?

Could we do incremental logging to the same file (ie, append mode) to GCS for future auditor updates that write out incrementally?

Is there a better logging mechanism/pattern in GCP than just dumping files into buckets?

Validate items not in metatable?

The title, group, and folder checks rely on the information in the metatable. Right now the script iterates over every item in the organization's folders. Item's that don't have a corresponding itemid in the metatable are still checked against tags, downloads, and delete protection.

Should these three checks still be done on non-metatable items, or should it only validate items found in the metatable?

Group Sharing may not be working

Auditor is failing to successfully share the new Utah Wilderness layer to the 'Utah SGID Boundaries' group. It will properly share if done manually either through python or the web ui.

Need to debug the code while running the update on that one item to see if all the parameters are correct.

Catch arcpy import failing to get a license

On line 32 in cli.py, from .auditor import Auditor, credentials, auditor.py then imports arcpy. If arcpy can't get a license, it raises a RuntimeError or ValueError (don't remember which).

Because these imports happen before our supervisor object is set up, it appears to bomb out silently.

Either create the supervisor before the rest of the imports, or move the imports that result in the arcpy import after the supervisor has been created.

Not catching HTTPErrors properly

Program is not catching HTTPErrors properly, possibly because try/except is in validate.py instead of within the individual instance methods of checks.py?

traceback
Checking Utah USGS 3DEP 1K Grid...
Traceback (most recent call last):
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 852, in get
    resp = opener.open(url)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 498: 498

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\gis\git\agol-validator\__main__.py", line 47, in <module>
    main()
  File "c:\gis\git\agol-validator\__main__.py", line 40, in main
    org_validator.check_items(report_dir)
  File "c:\gis\git\agol-validator\validate.py", line 186, in check_items
    checker.metadata_check()
  File "c:\gis\git\agol-validator\checks.py", line 363, in metadata_check
    if self.arcpy_metadata and self.arcpy_metadata.xml != self.item.metadata:
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\gis\__init__.py", line 6993, in __getattribute__
    return super(Item, self).__getattribute__(name)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\gis\__init__.py", line 7462, in metadata
    return self._portal.con.get(metadataurlpath, try_json=False)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 927, in get
    return self.get(newpath, ssl, try_json, is_retry=True)
  File "C:\Users\jdadams\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\validate\lib\site-packages\arcgis\_impl\connection.py", line 816, in get
    params['f'] = 'json'
TypeError: 'bool' object does not support item assignment

New Metadata Upload Logic

AGOL appears to be very picky about the xml you feed it as part of item.metadata = some_metdata_xml (or item.update(metadata=path_to_xml). The xml needs to be in the internal ArcGIS Metadata format (https://doc.arcgis.com/en/arcgis-online/manage-data/metadata.htm#ESRI_SECTION1_CE02409EE61D4A51A2BB943A2D8D982F, https://desktop.arcgis.com/en/arcmap/latest/manage-data/metadata/the-arcgis-metadata-format.htm). One possible way to make sure the xml is in this format is to export it using ArcGIS Desktop's exact copy of.xslt template (found in <install dir>\Metadata\Stylesheets\gpTools) and then update the AGOL metadata from this file.

Need to adjust the metadata fixer method to use this procedure. May also need to handle converting the SDE's metadata to the ArcGIS Metadata format (either in-place or make a copy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.